- Blaschke, Johannes P;
- Brewster, Aaron S;
- Paley, Daniel W;
- Mendez, Derek;
- Bhowmick, Asmit;
- Sauter, Nicholas K;
- Kröger, Wilko;
- Shankar, Murali;
- Enders, Bjoern;
- Bard, Deborah
X-ray scattering experiments using Free Electron Lasers (XFELs) are a
powerful tool to determine the molecular structure and function of unknown
samples (such as COVID-19 viral proteins). XFEL experiments are a challenge to
computing in two ways: i) due to the high cost of running XFELs, a fast
turnaround time from data acquisition to data analysis is essential to make
informed decisions on experimental protocols; ii) data collection rates are
growing exponentially, requiring new scalable algorithms. Here we report our
experiences analyzing data from two experiments at the Linac Coherent Light
Source (LCLS) during September 2020. Raw data were analyzed on NERSC's Cori
XC40 system, using the Superfacility paradigm: our workflow automatically moves
raw data between LCLS and NERSC, where it is analyzed using the software
package CCTBX. We achieved real time data analysis with a turnaround time from
data acquisition to full molecular reconstruction in as little as 10 min --
sufficient time for the experiment's operators to make informed decisions. By
hosting the data analysis on Cori, and by automating LCLS-NERSC
interoperability, we achieved a data analysis rate which matches the data
acquisition rate. Completing data analysis with 10 mins is a first for XFEL
experiments and an important milestone if we are to keep up with data
collection trends.