Predicting Resource Requirement in Intermediate Palomar Transient Factory Workflow
Published Web Locationhttps://doi.org/10.1109/ccgrid49817.2020.00-31
Quickly identifying astronomical transients from synoptic surveys is critical to many recent astrophysical discoveries. However, each of the data processing pipelines in these surveys contains dozens of stages with highly varying time and space requirements. Properly predicting the resources required to run these pipelines is critical for the allocation of computing resources and reducing the discovery response time. We propose a machine learning strategy for this prediction task and demonstrate its effectiveness using a set of timing measurements from the intermediate Palomar Transient Factory (iPTF) workflow. The proposed model utilizes the spatiotemporal correlation of astronomical images, where nearby patches of the sky (space) are likely to have a similar number of objects of interest and workflows executed in the recent past (time) are likely to use a similar amount of time because the machines and data storage systems are likely to be in similar states. We capture the relationship among these spatial and temporal features in a Bayesian network and study how they impact the prediction accuracy. This Bayesian network helps us to identify the most influential features for predictions. With proper features, our models achieve errors close to the random variance boundary within batches of images taken at the same time, which can be regarded as the intrinsic limit of prediction accuracy.