Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks
Published Web Locationhttps://doi.org/10.1109/indis.2018.00004
Research networks are designed to support high volume scientific data transfers that span multiple network links. Like any other network, research networks experience anomalies. Anomalies are deviations from profiles of normality in a research network's traffic levels. Diagnosing anomalies is critical both for network operators and users (e.g., scientists). In this paper we present Flowzilla, a general framework for detecting and quantifying anomalies on scientific data transfers of arbitrary size. Flowzilla incorporates Random Forest Regression(RFR) for predicting the size of data transfers and utilizes an adaptive threshold mechanism for detecting outliers. Our results demonstrate that our framework achieves up to 92.5% detection accuracy. Furthermore, we are able to predict data transfer sizes up to 10 weeks after training with accuracy above 90%.