Intelligent SDN based traffic (de)Aggregation and Measurement Paradigm (iSTAMP)

Fine-grained traffic flow measurement, which provides useful information for network management tasks and security analysis, can be challenging to obtain due to monitoring resource constraints. The alternate approach of inferring flow statistics from partial measurement data has to be robust against dynamic temporal/spatial fluctuations of network traffic. In this paper, we propose an intelligent Traffic (de)Aggregation and Measurement Paradigm (iSTAMP), which partitions TCAM entries of switches/routers into two parts to: 1) optimally aggregate part of incoming flows for aggregate measurements, and 2) de-aggregate and directly measure the most informative flows for per-flow measurements. iSTAMP then processes these aggregate and per-flow measurements to effectively estimate network flows using a variety of optimization techniques. With the advent of Software-Defined-Networking (SDN), such real-time rule (re)configuration can be achieved via OpenFlow or other similar SDN APIs. We first show how to design the optimal aggregation matrix for minimizing the flow-size estimation error. Moreover, we propose a method for designing an efficient-compressive flow aggregation matrix under hard resource constraints of limited TCAM sizes. In addition, we propose an intelligent Multi-Armed Bandit based algorithm to adaptively sample the most “rewarding” flows, whose accurate measurements have the highest impact on the overall flow measurement and estimation performance. We evaluate the performance of iSTAMP using real traffic traces from a variety of network environments and by considering two applications: traffic matrix estimation and heavy hitter detection. Also, we have implemented a prototype of iSTAMP and demonstrated its feasibility and effectiveness in Mininet environment.


I. INTRODUCTION
Fine-grained traffic flow measurements provide essential information that is central to network design, operation, management, accounting and security. Direct flow-based measurements such as NetFlow [1] and sFlow [2] offer finegrained measurements that can support different measurement tasks. However, such an approach not only requires dedicated hardware and specialized algorithms, but it is also often challenging, inefficient or even infeasible to monitor each and every flow due to exploding traffic volume and limited monitoring resources (e.g., the number of Ternary Content Addressable Memory (TCAM) entries, storage capacity and processing power). Therefore, intelligent sampling and streaming algorithms have been proposed to estimate statistics or answer specific queries of, e.g., flow size distributions [3] and approximate size of elephant flows [4]. These solutions are task-specific and lack the full flexibility to dynamically choose which traffic sub-population to measure, and how, depending on application requirements.
An alternate approach is estimating the internal attributes of interest in a network (e.g. traffic intensity between nodes) based on a limited set of measurements using network inference/tomography methods. However, most Network Inference (NI) problems are naturally formulated as ill-posed Under-Determined Linear Inverse (UDLI) problems where the number of measurements are not sufficient to uniquely and accurately determine the solution. Hence, side information from different sources must be incorporated into the problem formulation to improve the estimation precision [5].
On the other hand, many recent network monitoring and security applications require timely estimates of both large and small traffic flows with high precisions [6], [7]. The flows of interest can be sparse, or highly fluctuating over time/space. Hence, the network measurement infrastructure must be agile enough to cope with the dynamic network and traffic conditions. Such a flexible architecture can be achieved due to the recent advent of Software-Defined-Networking (SDN). In fact, SDN enabler, such as OpenFlow, nicely separates the measurement data plane and control plane functions, and provides a capability to control/re-program the internal configurations of switches in dynamic environments. Consequently, SDN allows for more complex network monitoring and management applications [8], [9].
In this paper, we propose an intelligent SDN based Traffic (de)Aggregation and Measurement Paradigm (iSTAMP), in which the measurement modules can be configured on-thefly to collect fine-grained measurements of specific traffic sub-populations of interest that directly reflect the monitoring application requirements. Based on the philosophy that not all attribute of interests are equally important, iSTAMP utilizes an intelligent sampling algorithm to select the most informative traffic flows, using information gathered throughout the measurement process. It also exploits Compressive Sensing (CS) inference methods that are effective for estimating highly fluctuated sparse unknown quantities from a set of well-defined compressed linear measurements [10], [11].
iSTAMP leverages OpenFlow to dynamically partition the TCAM entries of a switch/router into two parts. In the first part, a set of incoming flows are optimally aggregated to provide well-compressed aggregated flow measurements that can lead to the best estimation accuracy via network inference process. The second portion of TCAM entries are dedicated to track/measure the most rewarding flows (defined as flows with the highest impact on the ultimate monitoring application performance) to provide accurate per-flow measurements. These flows are selected and "stamped" as important (or rewarding from monitoring's perspective) using an intelligent Multi-Armed Bandit (MAB) based algorithm. These two sets of measurements (aggregated and sampled flows) are then jointly processed to estimate the size of all network flows using different optimization techniques.

A. Related Work
There is a rich literature on improving the accuracy of traffic flow measurements and estimation. We will briefly discuss the most relevant work here. In [12], ProgMe proposes a re-programmable architecture allowing statistics collection based on the notion of flowsets (arbitrary traffic subpopulations) using a flexible flowset composition language. iSTAMP, in contrast, leverages off-the-shelf components and existing OpenFlow API to configure the TCAM entries and associated counting rules. In particular, it leverages "statistics" field in OpenFlow to collect traffic measurements. Hence, it does not impact the packet forwarding behavior or performance. In addition, from SDN perspective, [8] describes a reconfigurable measurement architecture for hierarchical heavy hitter detection, and then, [9] proposes a re-programmable structure (called OpenSketch) where a variety of sketches for different measurement tasks can be defined and installed by the operator. Furthermore, Duffield [13] develops a theory of fairly distributing a sampling budget for estimating sub-populations flow records. However, SDN-based fine-grained traffic flow measurement under monitoring resource constraints has not been addressed previously. iSTAMP leverages the flexibility provided by SDN to push the boundary of obtaining accurate flow measurements/estimation by combining both coarse (aggregate) and fine-granularity (per-flow) measurements. We build upon the recent achievements in the theory of compressed sensing and on-line learning to develop a theoretical foundation for guiding the design of such an adaptive flow measurement framework under hard resource constraints.

B. Our contributions
iSTAMP is simple, generic, and efficient with the ability to intelligently track and measure the most "rewarding" flows, and optimally aggregate others. iSTAMP adaptively allocate(re-allocate) TCAM entries amongst the constantly evolving flows to achieve the highest flow estimation accuracy. Therefore, it is robust in dynamic environments (with highly fluctuating flows over time/space) and under hard resource constraints where, TCAM entries, storage capacity, and network bandwidth are severely limited. In fact, iSTAMP can be easily deployed on commodity OpenFlow-enabled routers/switches to enhance the performance of various network monitoring applications, including Traffic Matrix (TM) estimation, Traffic Engineering (TE), Heavy Hitter Identification (HHI) and different security applications [7]. It also offers timely estimates of network flows with low computation and communication overhead between control and data planes. Furthermore, it significantly reduces the required storage for monitoring tasks.
Our main contributions are summarized as follow: • In the context of compressed sensing, we, for the first time, formulate and solve the problem of designing an optimal binary aggregation (or observation) matrix to maximize the estimation accuracy while providing a compressed form of measurements. Moreover, we propose a method for designing an efficient-compressive flow aggregation matrix under hard resource constraints of limited TCAM sizes.
• We propose a simple and efficient MAB based algorithm to adaptively track and measure the most rewarding flows.
• We evaluate the performance of iSTAMP using real traffic traces from a variety of network environments and by considering two applications: Origin-Destination Flow (ODF) estimation and heavy hitter identification. Furthermore, we implement a prototype of iSTAMP and demonstrate its feasibility and effectiveness in Mininet environment.
The rest of this paper is organized as follows. Section II provides an overview of iSTAMP and different network inference techniques that we have used. Section III describes our optimal aggregation matrix design procedure along with our intelligent MAB based flow sampling algorithm. In Section IV, we evaluate the performance of iSTAMP considering two main applications: ODF estimation and heavy hitter detection. Section V summarizes our most important results. Figure 1 shows the general block diagram of the iSTAMP framework where the TCAM at the switch or router is partitioned into two parts for (i) aggregate measurements, and (ii) per-flow monitoring of selected flows, respectively. The optimal aggregate statistics from the first part provide a set of well-formed compressed linear measurements as input to a network inference process to perform flow size estimation. The second part of TCAM is allocated for tracking individual flows that are deemed "important" (or rewarding) for the monitoring application in question. This process is called intelligent sampling. For illustrative purpose, we consider Origin-Destination Flow (ODF) size estimation, and more generally, traffic matrix (as a measure of origin-destination traffic intensity between nodes) estimation, as the driving application in this section. In this context, ODFs with the largest volume will be the 'important' candidates for per-flow monitoring. Note that a single router may only observe a subset of ODFs (or partial TM) in the network. We will first discuss how iSTAMP can be deployed on a single router/switch to estimate the ODFs it observes. The framework can then be extended to a distributed monitoring case, where multiple routers running iSTAMP can coordinate to estimate the complete TM for the network.

II. SYSTEM DESCRIPTION AND PROBLEM STATEMENT
Assume there are n ODFs in the network and T entries of TCAM at the switch which can be used for network measurement where n >> T . At each measurement interval τ , K out of T entries are used to track and measure the most rewarding ODFs and m entries are used to optimally aggregate other ODFs (i.e. T = m + K). For this purpose, the flexibility of OpenFlow is used to install TCAM wildcard matching rules (prefix keys) c i and collect associated statistical counts y i in each measurement interval τ t . At the controller, which can be co-located on the switch or reside on a separate machine, the measurement statistics/counts are processed. In the next epoch τ t+1 , the best ODFs (determined based on the application requirement) for direct measurement are selected and their corresponding prefix keys are installed in K TCAM entries. This process is called de-aggregation and the TCAM lookup mechanism, which is based on highest wildcard matching rule, facilitates the implementation of this process. In addition, m entries of TCAM is used for optimal aggregation of n − K flows. The controller can poll the statistic counts periodically or in different measurement intervals, the frequency of which is limited by practical switch/network constraints.
At each epoch t, the set of measurement statistics Y t is represented by the under-determined linear system of equation shown in Eq.(1) where Y t g denotes the aggregated measurements and Y t k denotes the direct per-flow measurements. In this equation, A t is an (T × n) measurement matrix consisting of an (m × n) binary aggregation matrix A t g , where non-zero entries in each row of A t g represents the set of flows aggregated (or mapped) to that entry, and an (k × n) binary matrix A t k , where the only one non-zero entry in each row demonstrates the selected flow for direct measurement in t th measurement interval τ t . Also, X t (= X t g X t k ) denotes the t th vector of flows where X t g is the set of ODFs that are being mapped to m groups for aggregate measurements and X t k is the set of directly measured ODFs (i.e. X k = Y k ). Having these measurements, the set of unknown flows X t can be estimated using the following general optimization formulations Eq.(2) and Eq.(3), where parameters (p, q, λ) are defined based on different optimization techniques. The particular choice p = 2, q = 1 and λ = λ 0 (∈ R) define a compressed sensing inference technique [10] that is effective for estimating highly fluctuating network flows. Furthermore, Eq.(3) incorporates side information Y S into the optimization problem to improve the performance of the overall TM estimation process (in the extended multi-point/distributed monitoring case). Here, side information Y t S is the SNMP link load measurements Y t S = HX t g where H is the routing matrix.
Considering the above optimization formulations, iSTAMP can be used in different single-point and multi-point measurement scenarios. Accordingly, the vector of ODFs X t represents different notion of flows at different levels. For example, in single-point scenario, an ODF (defined as a sequence of packets that share the same source/destination IP addresses and that are observed within a given time interval at an observation point in the network) can be traffic between servers in different racks in a data center. In the multi-point scenario, ODFs can be traffic between routers in an ISP network. Under single point measurement, we consider two sub-cases: (a) when OpenFlow Switch (OFS) is used for both routing and measurement, and hence the aggregation matrix A g cannot be arbitrary in order to preserve routing of the aggregate flows, and (b) the OpenFlow switch is used primarily for flow measurement, and routing does not have to be preserved, as in the case of random or hash-based routing. These two cases are compatible with both formulations Eq.(2) and Eq. (3). In the former case A t g is the local routing matrix and part of the TCAM entries is used to directly measure a set of the most informative flows. Note that reserved-empty TCAM entries are always available at switches [14]. In the later case, all TCAM entries of the OFS are used for flow measurement; accordingly, along with directly measured flows, A t g can be optimally designed to enhance the accuracy of estimation. Likewise, there are two possible cases in multi-point measurement scenario where installation rules are transmitted from Central Controller (CC) to local OFSs and their measurements are transmitted to the CC. If all TCAM entries of OFSs are used for measurement, then both Eq.(2) and Eq.(3) are applicable and A t g can be optimally designed. However, if local OFSs are used for routing and measurement, then only Eq.(2) can be applied because, in this case, A t g ∼ = H.
In iSTAMP, the communication overhead between the controller and switch is low because only a few flows are directly measured and other entires are used to aggregate a large number of flows (hence, T instead of n measurement records) which is important under hard resource constraints and in multi-point measurement scenario. iSTAMP is also computationally efficient because it exploits existing lowcomplexity network inference techniques [10].

III. OPTIMAL FLOW AGGREGATION AND SAMPLING
In this section, we explain how to design the optimal aggregation matrix A t g and sample the most rewarding flows, via direct(/per-flow) sampling matrix A t k to maximize the performance within the iSTAMP framework. We also discuss the various challenges involved and how we address them.

A. Optimal Compressive Aggregation
To cope with the under-determined nature of Eq.(1) and the fact that today's network traffics are highly fluctuating over time and/or space, compressive sensing inference methods can be effectively applied to estimate the sparse characteristics of network flows. For example, it has been shown that if the coherency of the observation matrix A g , defined as µ in Eq. (4), is sufficiently small, then the convex optimization program Eq.(5), can exactly recover a sparse signal with l non-zero quantities (in some basis) from m = O(l log(n/l)) properly designed linear observations [10]. Interestingly, it has been shown that random observation matrices with i.i.d. Gaussian or random ±1 entries, and sufficient number of rows can achieve small coherence with overwhelmingly high probability. Also, there are efficient algorithms with low computational complexity, such as Orthogonal Matching Pursuit (OMP), for recovery of sparse signals [10]. In the following, for simplicity we drop the notion t and assume A g is written as Studies, in [15], [16], demonstrated that a carefully designed observation matrix can achieve sufficient small coherency to improve the performance and efficiency of CS inference methods. In accordance to this, continues optimization techniques have been used to design the optimal observation matrix by minimizing all off-diagonal elements of the cor- . The motivation is that minimizing the sum of the squared inner products of all columns(/atoms) of A g results in an observation matrix with more orthogonal columns which may improve the performance of the CS recovery process [16].
Here, we use the same formulation for designing the optimal flow aggregation (i.e. observation) matrix. However, network switches/routers typically classify the incoming packets (i.e. forward, measure or drop) based on the longest prefix match, i.e., each flow is matched to at most one TCAM entry depending on how the wildcard fields are configured. As a result, our aggregation matrix is typically a binary matrix. Therefore, we introduce the following generic integer optimization program Eq.(6) to design the optimal aggregation matrix where the first constraint emphasizes that A g is a binary matrix. The second constraint controls the redundancy between aggregated measurements where redundancy is achieved by observing a flow at multiple TCAM entries. The non-zero entries of diagonal matrix φ is chosen properly to force the optimization engine to focus on minimizing the sum of off diagonal elements of the objective function F , and to control the number of ones's in each column.Note that, C l and C u are important parameters that are constrained by TCAM lookup limitations. Also, the third constraint represents the feasibility of aggregation process where F i denotes the set of candidate flows that can be aggregated at i th TCAM entry (e.g. to preserve routing); this set is known as the i th set of feasible aggregated flows. Note that this constraint can be relaxed in the case of random or hash-based routing, where arbitrary subset of flows can be aggregated into the same TCAM entry. The fourth constraint controls the number of aggregated flows per TCAM entry.
Assuming a ij 's are binary variables, we have shown that such a complex, non-convex and non-linear objective function in Eq.(6) can be reformulated as Eq. (7) that is still a nonlinear integer program. Thus, we effectively converted Eq. (7) to a linear integer optimization program introduced in Eq.(8) that can be solved using standard integer optimization tools, such as CPLEX, to design the optimal aggregation matrix. In Eq.(8), C j and R i denote the set of z j and z i where corresponding a ij satisfies 2 and 4 in Eq.(6), respectively. Note that, if C l = C u = 1, then the objective function in Eq.(8) is simplified by reducing K u to mn(n − 1) and removing Kv i=1 v i ; in this case φ = I. It should be noted that, our optimization framework in Eq. (8) can also be used for the design of optimal binary observation matrices which are of particular importance in other network monitoring and sensor networks applications where the entries of A indicates the ability(a ij = 1)/inability(a ij = 0) of measuring the j th attribute of interest (x j ) at i th observation and redundant measurements (C l > 1) are permitted. a) for each multiplication z j z k : To evaluate the performance of this optimization technique we describe two examples. First, in our formulation in Eq.(8), we set m = 4, n = 6 and C l = C u = 1 (i.e. each flow can only be mapped to one TCAM entry and redundant measurements are not permitted) to compute the optimal aggregation matrix A Opt g1 = [1, 1, 0, 0, 0, 0; 0, 0, 1, 1, 0, 0; 0, 0, 0, 0, 1, 0; 0, 0, 0, 0, 0, 1; ] where its coherency is µ = 1. Then, using the same set up, we initialize C l = C u = 2 (i.e. each flow can be mapped to more than one TCAM entry and redundant measurements are permitted) and compute the optimal aggregation matrix A Opt g2 = [0, 1, 1, 1, 0, 0; 0, 0, 0, 1, 1, 1; 1, 0, 1, 0, 0, 1; 1, 1, 0, 0, 1, 0] where its coherency is µ = 0.5. To measure the real performances of these two aggregation matrices, we ran Monte-Carlo simulations on randomly selected real traffic traces X g , from Geant network (Table I), to compute the measurement vectors Y g1 = A Opt g1 X g and Y g2 = A Opt g2 X g . Next, we use the compressive sensing inference technique (i.e. Eq.(2) with (p, q, λ)=(2, 1, 0.01) and without considering direct measurements) to estimateX g and compute Normalized-Mean-Square-Error (Eq.(10)). Accordingly, N M SE is computed as N M SE 1 = 0.4547 and N M SE 2 = 0.1416 which shows that our optimization formulation in Eq.(8) is successful in generating optimal binary aggregation matrices with low coherency that has important affect on the estimation accuracy.

C. Challenges in Optimal Aggregation Matrix Design and Solutions
The wildcard matching rule in current TCAM lookup technology does not allow a particular flow to be mapped into more than one TCAM entry, that is, C l = C u = 1. This imposes hard constraints on our optimization program Eq.(8), which otherwise could have benefited from redundant measurements when there can be multiple ones in each column and the accuracy of the estimation can be improved by reducing the coherency of A g .
To cope with this practical constraint, two solutions are proposed. First, we make use of auxiliary data, such as SNMP link counts, as redundant measurements (observations) of these flows. In fact, we observed that network routing matrices have small average coherency, for example,μ Abilene = 0.10 andμ Geant = 0.04 (see Table I). Therefore SNMP link counts, that are easily provided, can be incorporated into our flow estimation formulation where feasible optimal aggregated statistics can be used as side information to improve the accuracy of the inference algorithm. Accordingly, having routing matrix H, the SNMP link counts are provided using Y t S = HX t in our estimation framework in Eq.(3). This approach is compatible with both of our single-point and multi-point measurement scenarios. Second, we introduce an efficient-compressive aggregation method in Section.III-E.

D. Optimal Flow Sampling under Hard Resource Constraints
The notion of sampling here denotes the process of sequential allocation of K out of T entries of TCAM at epoch τ t to target flows, which are selected based on the information collected up to that time. The main goal is to adaptively track and measure the most rewarding(/informative) traffic flows that, if measured accurately, can yield the best improvement of overall measurement utility (the exact performance metric is dependent on the monitoring application). For this purpose, multi-armed bandit sequential resource allocation algorithms are used. A classic MAB problem involves a number of independent arms, each of which, when played, offers random reward drawn from a distribution with unknown mean. At each time, a player chooses a subset of arms to play, aiming to maximize reward or minimize regret over some time horizon T c [17]. The Restless Multi-Armed Bandit (RMAB) optimal policies [18], [19] can be applied effectively to our framework for intelligent flow sampling in dynamic networks where: a) flows are independent; b) the dynamics of flows are not-known, and c) flow sizes can vary stochastically with time. In general, the optimal policies consists of two phases: exploration or learning phase, in which system dynamics is learned, and exploitation phase, where the most rewarding flow(s) (i.e. arm(s)) are measured (i.e. played).
In our problem, we would like to identify, track, and measure the K largest flows amongst n flows using K out of T entries of TCAM for two purposes. First, accurate measurements of the largest flows can improve the ability of our optimization techniques in Eq.(2) and Eq.(3) for fine-grained flow estimation. Second, this can enhance the capability of the monitoring system in Heavy Hitter Identification (HHI) where Heavy Hitters (HH) are flows with a flow size larger than a threshold θ. Note that HHI is of particular importance in both network management and security applications. For a flow sampling strategy, to function effectively and efficiently in dynamic environments and under hard resource monitoring constraints (where T << n), the duration of the learning phase must be short. This is important because in many applications measurement intervals τ t s are long, for example 5-15 minutes (see Table I) which leads to very long learning durations. Therefore, after a comprehensive survey, we adopted and modified the upper confidence bound algorithm in [18] based on our application requirements and propose our Modified Upper Confidence Bound algorithm (Alg. 1).
In MUCB algorithm, we modify the original UCB algorithm [18] to choose the K most rewarding flows. For this purpose, first, the time horizon T c is determined. Then all T entries of TCAM are used to measure all n incoming flows over n T measurement interval and the indicies of incoming flows, sorted in descending order, are reported including the indicies of K heaviest flows (I t k ) and the indicies of flows that must be aggregated (I t g ). Accordingly, the direct flow measurement matrix A t k is defined as: A t k (:, I t k ) = 1.In computing flow indicies the first termx j favors measurement resources toward flows that are historically larger. In fact, larger flows that are measured more frequently are dominated by the first term and smaller flows that are observed less frequently are dominated by the second term. To improve the agility of the algorithm in dynamic environment, for observing a variety of flows, we also modify the original UCB algorithm [18] by multiplying the first termx j by a coefficient α where α ≤ 1. In this algorithm, the parameter T c can be adjusted by the user. We also propose the following method (Eq.(9)) to use the temporal auto-correlation of flows to compute T c where Typically, β is set to 0.5, but smaller β leads to larger T c .
This algorithm is simple to implement and learning phase is very short which is an important factor in network monitoring applications under hard resource constraints. It is also effective because the set I t k indicates highly possible large flows; accordingly, it is very efficient for heavy hitter detection in dynamic environments. Our initial exploration using real traffic traces shows that the average probability of error in selecting

E. Efficient-Compressive Aggregation in Practice
Now lets suppose the set I t g is given where its elements declare the indicies of network flow sizes in descending order. Therefore, using the fact that grouping attributes with the same quantity can improve estimation accuracy in UDLI problems, then, an efficient-compressive aggregation matrix can be designed to aggregate n − K flows using m entries of TCAM under hard resource constraints. For this purpose, the following exponential aggregation algorithm is proposed (Alg. 2) where more measurement resources (TCAM entries) are allocated to the earlier indicies in the set I t g , indicating larger flows. Then, a large number of smaller flows with more stable behaviors are aggregated using smaller number of TCAM entries. Through more detailed exploration using real traffic traces, we have measured and found that flow indicies reported by Alg. 1 ( I t g Tc t=1 ) exhibit a well spatio-temporal stability. Accordingly, applying our optimization formulation in Eq.(2), provides a very precise estimates for both large and small flows. Hence, our framework is effective not only for heavy hitter detection but also for estimating sub-population flow sizes which is of particular importance to network security applications such as DDOS attack detection. Parameters ρ and δ, which controls the number of flows aggregated per entries, are defined by the user.

IV. iSTAMP PERFORMANCE EVALUATION
In this section, the effectiveness of our network measurement and inference framework in Figure 1 is justified. For this purpose, three real networks (Table I)  Input: Aggregation parameters ρ and δ. Output: Aggregation Matrix A t g . Initialization: Set ic = 0 and A t g = 0m×n. for i = m to 1 do r = ρ(n − K) 1 δ i + 1 for j = 1 to r do -A t g (i, I t g (ic + j)) = 1 end for ic = ic + r end for varying flows, we processed the publicly available data center packet traces in [22] and randomly chose a subset of rapidly fluctuating flows over time/space. We also assumed a Fat-Tree topology where the number of servers communicating among different racks (n Ext ) vary and ECMP routing is used between aggregation and core levels. In this case, flows are defined as traffic between servers in different racks.
Each configuration is defined by selecting different values for T and K and α, and by choosing an appropriate aggregation matrix. Also, Alg. 1 is used to directly measure the highest possible large flows. In addition, we evaluate the performance of three different aggregation techniques: 1) Block Aggregation Technique (BAT); 2) Random Aggregation Technique (RAT), and 3) Exponential Aggregation Technique (EAT). In BAT, m entries of TCAM are equally shared among (n − K) flows where each block contains a set of n−K m consecutive aggregable flows which models the mapping of flows into TCAM entries in many practical cases. RAT is an abstract model that we used, through a Monte-Carlo process, to show the performance of our measurement framework where the routing/aggregation structure is not under our control. In EAT, our aggregation method in Alg. 2 is used to efficiently allocate TCAM entries to flows, and to produce a well compressed form of linear aggregated measurements. We also use different network inference methods in Eq.(2)-Eq. (3) to verify the compatibility of our framework with a variety of optimization techniques which can be efficiently solved by existing low-complexity algorithms [10]. In the following, using Eq.(2) for flow estimation based on aggregated statistics and without Side Information, is denoted by "w/o SI". We use "w/ SI" to denote the case where SNMP link loads are incorporated into our formulation using Eq.(3).
Eq.(10) defines the metrics used in our performance evaluation. N M SE is a metric that we used to measure the accuracy of ODF estimation for each configuration. To justify the effectiveness of our framework for HH detection, we first set the threshold θ as a fraction of the link capacity C L , and then, the average probability of detection (P d HH ) and the average probability of false alarm (P f a HH ) are computed as in Eq. (10). Given n, these metrics are measured based on two  quantities by varying T and K. First, T /n represents the ratio of the number of TCAM entries, used for both aggregation and direct measurement in our framework, and the number of flows. Note that low T /n asserts the hard resource constraint regime where a large number of flows (n) has been represented by T per-flow and aggregated measurements. Thus, T /n also indicates the Flow Compression Ratio (FCR). Second, K/T represents the ratio of the number of TCAM entries used for direct measurement and the total number of available TCAM entries.
Furthermore, we measure the effectiveness of our measurement framework in Sub-Population (SP) flow size estimation, which is of particular importance in some security applications such as DDoS detection. In Eq.(10), SP t denotes the subset of I t g where X(I t g ) < θ l and SP t l denotes the l th disjoint subset of SP t where the sum of flows in SP t l can be large (note that θ l < θ). Also, π t 0 and π t 1 are prior probabilities which are computed experimentally. In our Monte-Carlo evaluation process, |SP t l | is randomly chosen from interval [α l (n − K) α u (n − K)] where parameters α l and α u are selected properly to cover a wide range of SP sizes. Due to space limitation, we also summarize our results by averaging our performance criteria in Eq.(10) over different choices of T and K (e.g. N M SE Avg ). Figure 2 shows the N M SE for different configurations on Geant network where Eq.(2) and Eq.(3) with (p, q, λ)=(2,1,0.01), as CS inference techniques, are used for flow size estimation. In the absence of side information a better accuracy is achieved at higher K/T 's. However, in general, the optimal choice for the number of direct flow measurements K which yields to the minimum N M SE (indicated by solid black points) is obtained at K/T < 1. In other words, the best accuracy is often achieved by allocating at least part of the TCAM entries for direct measurement and the remaining for flow aggregation. Among these, EAT can enhance the estimation accuracy, for example, N M SE BAT Avg = 0.38 and N M SE EAT Avg = 0.29; this improvement is significantly higher at low T /n's. Incorporating side information SNMP link counts and using Eq.(3) can remarkably improve the precision of the estimation using BAT, RAT and EAT methods. In this case, EAT still can improve the accuracy (N M SE BAT Avg = 0.19   and N M SE EAT Avg = 0.16); however, since aggregated measurements are interpreted as side information in Eq.(3), the difference between BAT and EAT is low. This is an important fact that is used to address the aggregation feasibility in iSTAMP (see Section IV-B). In addition, although increasing T /n improves the accuracy, the slope of the improvement is slow after some near-optimal FCR (e.g. T /n ≈ 0.1).
Therefore, iSTAMP achieves high estimation accuracy using a small fraction of available TCAM entries at low FCRs and using different aggregation techniques. Hence, it can produce well-compressed aggregated flow measurements and remarkably decrease the required storage capacity and reduce the communication overhead between control and data planes. Furthermore, Figure 3 represents the effectiveness of iSTAMP for reliably detecting both heavy hitters and heavy sub-populations. Among these, a high probability of detection and low probability of false alarms are achieved for HH detection where threshold θ is set to 10% of link capacity C L = 10M pbs.
This figure also shows that the detection capability for heavy sub-pupolations (with θ l = 0.01C L ) is still high for many choices of T and K. Under very hard resource constraints, although the sub-population detection perfromance is lower, it is still acceptable and it improves, rapidly.
In addition, Table II summarizes our results for Abilene network. These results re-emphasize the flexibility and capability of iSTAMP for different network monitoring applications under hard resource constraints. Note that, for both Geant and Abilene networks, T c is set to the duration of the data in Table I (i.e. small β in Eq.(9)) which is the worst case scenario. Better accuracy can be achieved with smaller T c 's. Figure 4 and Figure 5 show the performance of our measurement framework under very hard resource constraint T /n = 0.0625 and with SNMP side information using Eq.(3) in a data center environment with highly fluctuating flows. Here, the aggregation matrix is a given block aggregation matrix and routing matrix H is computed for the fat-tree topology assuming the ECMP routing between aggregation and core level [22]. As it was explained in Section.III, in dynamic environments we need to explore different flows more frequently, which is achieved by choosing a smaller α in Alg. 1. The direct consequence of this effect is seen in Figure 4 where our performance metrics have been improved by decreasing α. To enhance the accuracy of estimation and improve the capability of the system for HHI in dynamic environments, iSTAMPS needs to repeatedly learn the behavior of flows. This is achieved by shortening the time horizon T c . Figure 5 shows the effect of different choices of T c using different values of β in Eq. (9) where n Ext vary. Here, (β = 0.5, n Ext = 2), (β = 0.5, n Ext = 4), (β = 0.2, n Ext = 2) and (β = 0.2, n Ext = 4) correspond to T c = 186(s), T c = 260(s), T c = 1051(s) and T c = 1085(s), respectively. Also, n Ext = 2 and n Ext = 4 yields the number of flows n = 48 and n = 192. It is clear that, even in a very hard resource constraint (T /n << 1), our measurement framework can obtain a good estimation accuracy and detecting capability by choosing optimum K/T . In practice, the designer of the system must appropriately set the controlling parameters α and β based on the history of the system and the required performances through a trial and error process.

A. Feasibility Study using Mininet
Mininet is a network testbed for developing OpenFlow and SDN experiments [23]. We create two networks that emulate Geant and Data Center environments and feed these two networks with real traffic traces listed in Table I. Table  III summarizes the results of our implementation in Mininet, demonstrating the effectiveness and feasibility of iSTAMP in production environments. Geant(1) and Data Center denote single point measurement scenario and Geant(2) denotes multipoint measurement scheme involving multiple capableness routers running iSTAMP framework.

B. Discussion on Aggregation Feasibility
In iSTAMP the feasibility of the flow aggregation process is important. In our optimal aggregation matrix design (Eq.(8)), the feasibility can be simply modeled as another linear constraint and our CPLEX engine can efficiently solve it. However, in our abstract flow estimation models (Eq.(2)-Eq.(3)), we assume flows can be aggregated without any constraints. In the absence of SNMP link load measurements, BAT is the model used to address this constraint where feasible aggregable flows are grouped together and Eq.(2) is solved for flow size estimation. In the presence of SNMP side information, the feasibility constraint of the aggregation now has less impact on the overall estimation performance. In this case, Eq.(3) is used to provide fine grained flow estimates and aggregated flow measurements act as side information for this optimization problem to improve the estimation accuracy, as shown in Figure 2. In essence, iSTAMP is flexible enough to cope with different aggregation constraints.

V. CONCLUSION
In this paper we introduced iSTAMP, an intelligent network measurement framework, where the flexibility provided by SDN is used to optimally aggregate and sample the most informative flows, simultaneously. We also developed an integer optimization formulation to design the optimal aggregation matrix and introduced an efficient-compressive flow aggregation technique. In addition, we proposed an efficient MAB based flow sampling algorithm to select the most rewarding flows with highest influence on the estimation accuracy. Our results showed that iSTAMP is a simple, generic and efficient framework providing accurate fine-grained flow measurements in dynamic environments and in different applications.

VI. ACKNOWLEDGEMENTS
This work is supported by NSF CNS-1321115 grant and HP Labs Innovation Research Award.