Lawrence Berkeley National Laboratory
DataMover: robust terabyte-scale multi-file replication over wide-area networks
- Author(s): Sim, Alex
- Gu, Junmin
- Shoshani, Arie
- Natarajan, Vijaya
- et al.
Typically, large scientific datasets (order of terabytes) are generated at large computational centers, and stored on mass storage systems. However, large subsets of the data need to be moved to facilities available to application scientists for analysis. File replication of thousands of files is a tedious, error prone, but extremely important task in scientific applications. The automation of the file replication task requires automatic space acquisition and reuse, and monitoring the progress of staging thousands of files from the source mass storage system, transferring them over the network, archiving them at the target mass storage system or disk systems, and recovering from transient system failures. We have developed a robust replication system, called DataMover, which is now in regular use in High-Energy-Physics and Climate modeling experiments. Only a single command is necessary to request multi-file replication or the replication of an entire directory. A web-based tool was developed to dynamically monitor the progress of the multi-file replication process.