MDS Codes with Progressive Engagement Property for Cloud Storage Systems
- Author(s): Hajiaghayi, M
- Jafarkhani, H
- et al.
Fast and efficient failure recovery is a new challenge for cloud storage systems with a large number of storage nodes. A pivotal recovery metric upon the failure of a storage node is repair bandwidth cost which refers to the amount of data that must be downloaded for regenerating the lost data. Since all the surviving nodes are not always accessible, we intend to introduce a class of maximum distance separable (MDS) codes that can be re-used when the number of selected nodes varies yet yields close to optimal repair bandwidth. Such codes provide flexibility in engaging more surviving nodes in favor of reducing the repair bandwidth without redesigning the code structure and changing the content of the existing nodes. We call this property of MDS codes progressive engagement. This name comes from the fact that if a failure occurs, it is shown that the best strategy is to incrementally engage the surviving nodes according to their accessing cost (delay, number of hops, traffic load or availability in general) until the repair-bandwidth or accessing cost constraints are met. We argue that the existing MDS codes fail to satisfy the progressive engagement property. We subsequently present a search algorithm to find a new set of codes named rotation codes that has both progressive engagement and MDS properties. Furthermore, we illustrate how the existing permutation codes can provide progressive engagement by modifying the original recovery scheme. Simulation results are presented to compare the repair bandwidth performance of such codes when the number of participating nodes varies as well as their speed of single failure recovery.
Many UC-authored scholarly publications are freely available on this site because of the UC Academic Senate's Open Access Policy. Let us know how this access is important for you.