Skip to main content
eScholarship
Open Access Publications from the University of California

Performance Prediction for Data Transfers in LCLS Workflow

Published Web Location

https://sdm.lbl.gov/oapapers/snta19-jin-final.pdf
No data is associated with this publication.
Abstract

In this work, we study the use of decision tree-based models to predict the transfer rates in different parts of the data pipeline that sends experiment data from Linac Coherent Light Source (LCLS) at SLAC National Accelerator Laboratory (SLAC) to National Energy Research Scientific Computing Center (NERSC). The system monitoring the data pipeline collects a number of characteristics such as the file size, source file system, start time and so on, all of which are known at the start of the file transfer. However, these static variables do not capture the dynamic information such as current state of the networking system. In this work, we explore a number of different ways to capture the state of the network and other dynamic information. We find that in addition to using static features, using these dynamic features can improve the transfer performance predictions by up to 10-15%. We additionally study a couple of different well-known decision-tree based models and find that Gradient-Tree Boosting algorithm performs better overall.

Item not freely available? Link broken?
Report a problem accessing this item