As data explodes in modern applications, such as the Internet of Things (IoT), the needs for storage devices drastically increase. Solid state drives (SSDs) and hard disk drives (HDDs) are two main data storage devices. SSDs store data in flash memories, while HDDs store data in magnetic disks. To extend the lifetime and enhance the reliability of data storage devices, we utilize machine learning as the fundamental tool for three data storage modules: modeling in flash memory systems, error correction and constrained coding schemes, and detection method in magnetic recording channels.
The modeling part of the dissertation is devoted to proposing a novel data-driven approach, referred to as Flash-Gen, to generating NAND flash memory read voltages in both space and time using conditional generative networks. This generative modeling method reconstructs read voltages from an individual memory cell based on the program levels of the cell and its surrounding cells, as well as the time stamp, in a time-efficient, resource-saving, and function-comprehensive manner. As the needs for data-dependent channel models, we further extend the generative modeling approach to the coded storage channel. We train the generative models via transferring knowledge from models pre-trained with pseudo-random data. This technique can accelerate the training process and improve model accuracy in reconstructing the read voltages induced by constrained input data throughout the flash memory lifetime.
The coding part of the dissertation designs practical coding workflow and proposes new constrained and shaping coding schemes for flash memories. We propose a flash system optimization procedure, referred to as the Flash-Gen coding workflow, that leverages reconstructed read voltages from Flash-Gen for the development of error correction codes (ECCs) and constrained codes. Flash-Gen coding workflow can effectively address a range of important tasks, including threshold determination, coding performance estimation, and pattern characterization. We then formulate inter-cell interference (ICI)-mitigation constrained codes and distribution-matching shaping codes. The proposed coding schemes both achieve remarkable lifetime improvement.
The detection part of the dissertation builds recurrent neural network (RNN)-based detection for magnetic recording channels with partial-response equalization, which is referred as Partial-Response Neural Network (PR-NN). PR-NN could beat classical detection methods, such as the Viterbi detector, under multiple ``realistic'' environments and preserve the detection performance across different channel conditions.