Coding for Distributed Storage and Flash Memories
- Author(s): Huang, Pengfei
- Advisor(s): Siegel, Paul H
- et al.
A modern large-scale storage system usually consists of a number of distributed storage nodes, each of which is made up of many storage devices, like flash memory chips. To maintain the data integrity in the system, two independent layers of data protection mechanisms are deployed. At the system level, erasure codes, e.g., maximum distance separable (MDS) codes, are used across a set of storage nodes. At the device level, error-correcting codes (ECCs), e.g., Bose-Chaudhuri-Hocquenghem (BCH) codes, are employed in each flash memory chip. The main research goal of this dissertation is to design new erasure codes for distributed storage and new ECCs for flash memories. The first part of this dissertation is devoted to studying a new class of erasure codes called locally repairable codes (LRCs) for distributed storage. We focus on LRCs over small fields; in particular, the binary field. We investigate the locality of classical binary linear codes, e.g., BCH codes and Reed-Muller codes, and their modified versions. Then, we derive bounds for LRCs with availability and present several new code constructions for binary LRCs. In addition, we study erasure codes that can locally correct multiple erasures. Such codes are referred to as multi-erasure locally repairable codes (ME-LRCs). Our constructions based on generalized tensor product codes generate several families of optimal ME-LRCs over small fields. The second part of this dissertation aims to construct new ECCs and analyze the fundamental performance limits for flash memories. We propose a general framework for constructing rate-compatible ECCs which are capable of adapting different error-correcting capabilities to the corresponding bit error rates at different program/erase (P/E) cycles. Next, we present a new family of shared-redundancy ECCs called ladder codes. Using ladder codes, multiple codewords from good and bad pages in a flash memory block can share some common redundancy. Finally, based on the channel models obtained from empirical data, the performance of multilevel flash memories is studied by using multi-user information theory. The results provide qualitative insight into effective coding solutions.