Unified Summaries for Internet traffic
Traffic analysis is important to the operation of IP networks. The input to the analysis is raw data such as packet header traces or NetFlow records and the output is often the size aggregates such as the traffic generated by various applications or by individual customers. Storing the raw data allows the flexibility of running arbitrary new analyses in the future, but the sheer amount of raw data is often a challenge. Sampling based techniques such as smart sampling aim at reducing the amount of raw data while preserving the ability of future analyses to accurately estimate the traffic of any large aggregate. There are three important measures of the traffic of an aggregate: the number of bytes, the number of packets and the number of flows. Current data reduction solutions allow estimating only one of these measures. In this paper we propose the idea of unified summaries that allow the analyses to get unbiased estimates for all three measures. Our unified summary that takes as input flow records is based on smart sampling and the one that reads in packet header traces is based on sample and hold. The most important contributions of this paper are the development of novel unbiased statistical estimators for the number of flows, the development of methods for combining summaries measuring bytes and packets using less memory than separate summaries, and experimental evaluation of the proposed solutions based on traces of traffic.
Pre-2018 CSE ID: CS2004-0793