LEISURE: Load-Balanced Network-Wide Traffic Measurement and Monitor Placement

Network-wide traffic measurement is of interest to network operators to uncover global network behavior for the management tasks of traffic accounting, debugging or troubleshooting, security, and traffic engineering. Increasingly, sophisticated network measurement tasks such as anomaly detection and security forensic analysis are requiring in-depth fine-grained flow-level measurements. However, performing in-depth per-flow measurements (e.g., detailed payload analysis) is often an expensive process. Given the fast-changing Internet traffic landscape and large traffic volume, a single monitor is not capable of accomplishing the measurement tasks for all applications of interest due to its resource constraint. Moreover, uncovering global network behavior requires network-wide traffic measurements at multiple monitors across the network since traffic measured at any single monitor only provides a partial view and may not be sufficient or accurate. These factors call for coordinated measurements among multiple distributed monitors. In this paper, we present a centralized optimization framework, LEISURE (Load-EqualIzed meaSUREment), for load-balancing network measurement workloads across distributed monitors. Specifically, we consider various load-balancing problems under different objectives and study their extensions to support both fixed and flexible monitor deployment scenarios. We formulate the latter flexible monitor deployment case as an MILP (Mixed Integer Linear Programming) problem and propose several heuristic algorithms to approximate the optimal solution and reduce the computation complexity. We evaluate LEISURE via detailed simulations on Abilene and GEANT network traces to show that LEISURE can achieve much better load-balanced performance (e.g., 4.75× smaller peak workload and 70× smaller variance in workloads) across all coordinated monitors in comparison to a naive solution (uniform assignment) to accomplish network-wide traffic measurement tasks under the fixed monitor deployment scenario. We also show that under the flexible monitor deployment setting, our heuristic solutions can achieve almost the same load-balancing performance as the optimal solution while reducing the computation times by a factor up to 22.5× in Abilene and 800× in GEANT.


INTRODUCTION
A CCURATE traffic measurement is essential to a variety of network management tasks, including traffic engineering (TE), capacity planning, accounting, anomaly detection, and security forensics.For example, deep packet inspection (DPI) allows for post-mortem analysis of network events and helps operators to understand the payload properties of the transiting Internet traffic.Another solution, Network DVR [2], performs selective flow-based trace collection by matching packets against application-specific signatures.However, doing fine-grained flow level measurements is often an expensive process that requires dedicated hardware (e.g., TCAMs [3]), specialized algorithms, (e.g., Bloom Filters [4]), or vast storage capacity.Given the fast-changing Internet traffic landscape and large traffic volume, a single monitor is not capable of accomplishing the measurement tasks from all applications of interest due to its resource constraint.This calls for coordinated measurements among multiple distributed monitors.Also, network-wide traffic measurement at multiple monitors is key to uncovering global network behavior since traffic measured at any single monitor only provides a partial view and may not be sufficient or accurate.For example, a global iceberg [5] may have high aggregate volume across many monitors, but may not be detectable at any single monitor.
To perform effective network-wide traffic measurement across multiple distributed monitors, a centralized framework that coordinates measurement responsibilities across different monitors is needed.Sekar et al. [6] proposed CSAMP (Coor-This paper is an extended journal version of a conference paper that was presented at ANCS 2011 [1]. dinated Sampling), a centralized hash-based packet selection system as a router-level primitive, to allow distributed monitors to measure disjoint sets of traffic without requiring explicit communications.CSAMP uses an optimization framework to specify the set of flows that each monitor is required to record by considering a hybrid measurement objective that maximizes the total flow-coverage, subject to ensuring that the optimal minimum fractional coverage of the task can be achieved.However, CSAMP does not consider the loadbalancing of workloads for multiple measurement tasks across the available monitors, which can lead to situations in which some monitors carry substantially higher workloads than other monitors.Coupled with ever-increasing link rates and traffic volume, the more heavily loaded monitors can easily become overwhelmed.In addition, existing frameworks (e.g., CSAMP) do not consider flexible monitor deployment scenarios and are agnostic to differentiation in the importance of traffic subpopulations or the cost of individual measurement tasks.
In this paper, we present LEISURE (Load-EqualIzed meaSUREment), a new centralized optimization framework to address the network measurement load-balancing problem in various realistic scenarios.Our solution framework takes as inputs a) a routing matrix, b) the network topology and the monitor infrastructure deployment, and c) the measurement requirements of the different measurement tasks, and it decides which available monitors should participate in each specific measurement task and how much they need to measure to optimize the load-balancing objectives.Ideally the load-balancing objective is to have identical workload for all monitors where workload denotes the normalized traffic amount that each monitor measures.In this work, the load-balancing objective is mainly defined in terms of the following: 1) minimizing the variance of workloads across all monitors or 2) minimizing the maximum workload among them.Optimal solutions are translated into disjoint sets of required-measured flows that each monitor is assigned to measure.In contrast to CSAMP [6] that aims at maximizing the total flow-coverage without considering load-balancing problems, LEISURE distributes traffic measurement tasks evenly across coordinated monitors such that the required fractional coverage of those tasks can be achieved.We summarize our contributions as follows: propose several heuristic algorithms that can approach the performance of the optimal MILP solution, but yet require dramatically shorter computation times.• We provide detailed evaluations based on the Abilene [7] and GEANT [8] network topologies and traces.Our results show that the significant load-balancing improvement (e.g., 4.75X smaller maximum workload and 70X smaller variance in workloads) is achieved by using LEISURE to optimally distribute the measurement tasks across all coordinated monitors when compared with the naive uniform assignments.We also show that our proposed heuristic algorithms can reduce the computation times by a factor up to 22.5X in Abilene and 800X in GEANT while achieving load-balancing performance that are close to the optimal.The paper is structured as follows: Section 2 motivates our network load-balancing problem via illustrative examples.Section 3 presents our problem formulations and solutions, Section 4 considers extensions for limited monitor deployment scenarios.Section 5 describes our simulation setup and evaluation results, and Section 6 concludes the paper.Additional supplemental sections discuss related work, implementation issues, and further extensions for considering multi-path routing and multi-measurement task scenarios.

MOTIVATING EXAMPLE
Consider the toy example shown in Fig. 1 with traffic demands from three OD-pairs: SF→NY, LA→Seattle, and Chicago→Atlanta, each with 120 units of traffic (IP flows).Suppose the measurement task imposed by the network operator is to measure all traffic from these three OD-pairs.One naive approach is to simply always measure the traffic for each OD-pair at the ingress router, as shown in Fig. 1(b).In this case, monitors would only need to be deployed in SF, LA, and Chicago, each with a worst-case (peak) measurement workload of 120 units of traffic.Similarly, the traffic for each OD-pair can be measured at the egress router, as shown in Fig. 1(c).In this case, monitors would instead need to be deployed in NY, Seattle, and Atlanta, each with the same peak measurement workload of 120 units.Both of these approaches only need 3 monitors to accomplish the assigned measurement tasks in this example, but each monitors needs to be able to handle 120 units of traffic.
On the other hand, assume all routers are equipped with monitors that are capable of performing the measurement tasks.In this setting, rather than concentrating the measurement workloads on a small set of monitors, we can instead load-balance the measurement workloads across the monitors so that the peak measurement workload on these monitors is minimized, and thereby minimizing the processing requirements of these monitors.This can be achieved by assigning a fraction of the required measurement traffic to each monitor for which the monitor is responsible.One simple strategy is to simply uniformly distribute the required measurement traffic of each OD-pair to the monitors along its routing path, as depicted in Fig. 1(d).For example, the 120 units of traffic for SF→NY is measured uniformly across the monitors placed in SF, Denver, Kansas City, Indianapolis and NY.Each takes the measurement responsibility for 24 units of traffic.Similarly, the monitors in LA, Denver, Seattle and Chicago, Indianapolis, Atlanta each takes the measurement responsibility for 40 units of traffic for the LA→Seattle and Chicago→Atlanta traffic, respectively.The monitor with the highest measurement workload is therefore most likely be the router with the largest number of OD-pairs passing through it (e.g., 64 units of measurement traffic in Denver/Indianapolis).
Alternatively, another load-balancing method is to distribute the required measurement traffic of each OD-pair to the monitors along the path in inverse proportion to the traffic passing through them, as shown in Fig. 1(e).For example, the traffic passing through SF, Denver, Kansas City, Indianapolis and NY is 120, 240, 120, 240 and 120, respectively.Based on an inverse proportion calculation, SF, Kansas City and NY should measure 30 units of the SF→NY traffic ( , while Denver and Indianapolis should measure 15 units for SF→NY.Similarly, the monitors in LA and Seattle should measure 48 units of the LA→Seattle traffic, while Denver should measure 24 units for LA→Seattle, and the monitors in Chicago and Atlanta should measure 48 units of the Chicago→Atlanta traffic, while Indianapolis should measure 24 units for Chicago→Atlanta. Although both methods achieve significant reduction in the maximum measurement workload, as compared to the naive approaches (e.g., 120→64, 120→48), the maximum measurement workload can actually be further reduced to 40 units by using our proposed LEISURE approach to solve the global load-balancing optimization problem, as shown in Fig. 1(f).In this optimal solution, the SF→NY traffic is measured uniformly by only three monitors (SF, Kansas City, and NY) instead of five, each with 40 units of traffic while Denver and Indianapolis are not involved in the measurement of the SF→NY traffic.This in turn allows the equal splitting of the LA→Seattle traffic and the Chicago→Atlanta traffic across all three routers in each of its respective path, which results in all monitors having the same perfectly load-balanced measurement workload of 40 units.
The above shows that by deciding which monitors should participate in which measurement task, and how much they should measure, LEISURE can load-balance the measurement (f) Optimal load-balancing Fig. 1.Different load-balancing approaches for our toy example, which includes three OD-pair traffic as our measurement task (i.e., SF→NY, LA→Seattle, and Chicago→Atlanta, each with 120 units of traffic).
tasks optimally while the required fractional coverage of those tasks can be fulfilled.Next, we seek to find globally optimal load-balancing solutions by deploying LEISURE under different network conditions (e.g., topology, traffic demand, routing matrix, etc), measurement objectives (e.g., minimize maximum workload, maximize measurement utility, etc), and resource constraints (e.g., subset of routers are capable of monitoring, some monitors have lower capacities, etc).

LEISURE FRAMEWORK
We now present a load-balancing optimization framework to cover network-wide monitoring objectives while respecting router resource constraints.ISPs typically specify their network-wide measurement tasks in terms of OD-pairs.To cover these measurement assignments, LEISURE needs both the traffic demand and routing information, which are readily available to network operators, as described in [9].In general, LEISURE is a centralized architecture that allocates disjoint sets of required-measurement flows in OD-pairs to each router given global network-wide information.The global information includes: a) network topology, monitoring infrastructure deployment, b) traffic demand, routing matrix, and c) measurement requirements and the associated cost for each measurement task.The problem formulation builds up from the simplest case in which we assume the following: 1) all routers are deployed with monitors and capable of measurement; 2) each OD-pair follows a single router-level path by Open Shortest Path First (OSPF); and 3) there is only one measurement task for every monitor.These constraints are gradually relaxed in Section 4.

Basic Model
Let G(V, E) represent our network topology, where V is the set of routers (monitors) and E is the set of directed links.Each router V i (i = 1 . . .M) has two factors that limit its measurement ability: memory and bandwidth.We abstract them into a single resource constraint C vi (i = 1 . . .M), the number of flows that router V i can measure in a given measurement interval.
An OD-pair, OD x , x ∈ |V | × |V − 1|, represents the set of flows between the same pair of ingress/egress routers for which an aggregated routing placement is given.The set of all |V | × |V − 1| OD-pairs is given by Θ: Θ = |V | × |V − 1|.Φ x characterizes the traffic demand (IP flows) of the OD-pair OD x , x ∈ Θ in a given measurement interval (e.g., 5 minutes).P x represents the given routing strategy (router-level path) for every OD-pair OD x , x ∈ Θ. a x denotes the desired coverage fraction of IP flows of OD x that is required to measure, which is imposed by the network operator.Therefore, the total required measurement traffic (number of flows), β, introduced to all routers is simply a summation of traffic demand per OD-pair times a x as denote the fraction of traffic demand (IP flows) of OD x that router V i samples/measures (i.e., d x i = measured flows in Φx

Φx
) while L i denotes the total traffic (number of IP flows) that router V i measures for all OD-pairs, OD x , x ∈ Θ normalized by β.The summation of L i for all routers V i (i = 1 . . .M) then equals 1.We have: Our decision variable is i is bounded between 0 and 1 as Eq. ( 4).The second constraint is that the summation of d x i along the path P i for each OD-pair OD x , x ∈ Θ is a x , as Eq. ( 5).If router V i is not in the routing path The third constraint is that the measured fraction of β for each monitor V i should not exceed its measurement ability (resource constraint) C vi as Eq. ( 6).Notations are also summarized in Table I. i:Vi∈Px x:Vi∈Px  i for each approach with the toy example shown in Fig. 1

Problem Formulation
We define our load-balancing objective in abstract form α, which can be any term as long as it captures load-balancing performance (i.e., identical workload for all monitors).The overall optimization objective of LEISURE is to minimize α that each router operates within its resource constraint by given parameter a x , the required fractional coverage per ODpair imposed by the network operator.In this section, we formulate and study three different optimization problems that correspond to three different load-balancing objective α: min-VAR, min-MAX and min-VAR-given-MAX.

Minimize Variance Problem (min-VAR)
In this problem, we denote α as the variance of L i across all participating routers 1 .The intuition is that with more even workload L i for all routers, the variance is smaller (e.g., variance=0 stands for ideal load-balancing objective where M for all M routers).We have: This optimization problem is formulated as: x:Vi∈Px 1. We use "population variance" instead of "sample variance" as our objective function since we already know the number of monitors m.

Minimize Maximum Problem (min-MAX)
In this problem, we denote α as the maximum value of L i across all routers: The intuition is that when LEISURE keeps minimizing the maximum value of L i for all monitors by adjusting decision variables d x i , other smaller L i will increase, eventually they will reach some equilibrium state that no more adjustments it can do to lower the M AX(L i ) without increasing other L i above M AX(L i ).The problem formulation shares the same constraints as min-VAR problem, Eq.(9-12), except that the objective function is different:

Minimize Variance with Max-Constraint Problem (min-VAR-given-MAX)
This problem involves two phases.In the first step, we formulate the min-MAX problem given in Section 3.2.2 to find the minimum achievable maximum value L max (L max = minimized M AX(L i ), i = 1 . . .M) for all routers to cover the total required-measurement IP flows, β.Then we seek for any opportunity to further re-distribute the measurement task (workload) evenly within this constraint.Therefore in the second step, we introduce additional constraints to the min-VAR problem given in Section 3.2.1 to limit the L i for each router V i to be at most L max .We then minimize the variance of L i across all routers.Specifically, we only need to introduce the following constraint to the min-VAR problem: Therefore the min-VAR-given-MAX problem actually combines the min-VAR and min-MAX problems.

Optimal/Heuristic Solutions
We seek for the optimal d x i assignments for the above three problems.There is a variety of optimization tools that we can leverage.Specifically, the optimal solutions can be found by using a Quadratic Programming (QP) formulation for the min-VAR problem and a Linear Programming (LP) formulation for the min-MAX problem.The combined problem, min-VARgiven-MAX, can be solved in a two-phase manner by using LP first and QP follows.We refer these three optimal solutions of LEISURE as LB(min-VAR), LB(min-MAX), and LB(min-VAR-given-MAX), respectively.
Besides the optimal solutions, we introduce one simple heuristic method called LB(weighted) under the assumption that routers can always fulfill assigned measured tasks (e.g., no resource constraints for all routers in Eq. ( 6)).LB(weighted) calculates d x i in inverse-proportion to the total requiredmeasurement traffic amount (IP flows) passing through router V i .The rationale behind it is that routers with larger requiredmeasurement IP flows passing through should be assigned with fewer IP flows to measure in order to achieve load-balancing.Let β i denote the total required measurement traffic passing through router V i , which can be calculated using Eq.(15).The d x i assignment for LB(weighted) is formulated as: Although LB(weighted) does not necessarily lead to the optimal solution, its computation time is very fast compared to the time required to solve QP or LP optimization problems for LB(min-VAR), LB(min-MAX), and LB(min-VAR-given-MAX).In Section 5, we also compare their performances with the following three simple naive strategies: • LB(ingress): the required measurement traffic Φ x • a x for each OD-pair OD x , x ∈ Θ is only measured at ingress routers.• LB(egress): the required measurement traffic Φ x • a x for each OD-pair OD x , x ∈ Θ is only measured only at egress routers.• LB(uniform): the required measurement traffic Φ x • a x for each OD-pair OD x , x ∈ Θ is measured evenly across the routers on its routing path P x .Table 2 summarizes the corresponding d x i for each approach with the toy example presented in Fig. 1.In this example, LB(min-VAR), LB(min-MAX), and LB(min-VARgiven-MAX) all have the same optimal load-balancing performance (i.e., M AX(L i ) = 40 360 and V AR(L i ) = 0), which we denote as LB(optimal).In comparison, LB(ingress) and LB(egress) have poorest load-balancing performance but with least number of deployed monitors.LB(uniform) outperforms them but needs more monitors (e.g., 9 instead of 3 monitors in our toy example).LB(weighted) and LB(optimal) which consider global required measurement traffic can have better load-balancing performance compared to the local approaches (e.g., LB(ingress), LB(egress) and LB(uniform)), where LB(optimal) has the optimal load-balancing performance but needs much more computation time.

MEASUREMENT WITH LIMITED MONITORS
In this section, we extend previous formulations to consider limited monitor deployment scenarios.In practice, not every router is equipped with monitor and capable of measurement.Suppose only K out of the M routers are deployed with monitors and thus have measurement capability.We assume each OD-pair OD x , x ∈ Θ has at least one router on its routing path P x which is capable of measurement to fulfill the measurement tasks imposed by the network operator.Our formulation includes two problems: 1) measurement with fixed monitor deployment problem, and 2) measurement with flexible monitor deployment problem.

Fixed Monitor Deployment Problem
In the first case, we assume that these K monitors have been deployed in routers and are fixed.Our goal is to distribute required measurement tasks to these limited K routers.It can be simply solved by changing the routing index P x as follows: we exclude router V i from P x if V i is not equipped with monitor and unable to measure (e.g., P * x = P x − {V i } for all OD-pair OD x , x ∈ Θ).Variance calculation should also be modified accordingly since we now only have K monitors instead of M .All constraints remain the same except that P x are replaced by P * x in Eq. ( 9)-( 12).

Flexible Monitor Deployment Problem
In the second case, the location of K monitors have not been decided and they are flexible to be deployed in any router.This problem includes not only the distribution of measurement tasks, but also the placement of monitors.To formulate this problem, we introduce additional decision variables u i , where u i = 1 if router V i is selected to deploy a monitor, and u i = 0 otherwise.The summation of u i is therefore to be K.We assume every monitor has identical limited measurement capability (resource constraint) as C m .The problem is formulated below with load-balancing objective as α = M AX(L i ).Note that it is no longer an LP/QP problem since u i , i ∈ V are Boolean variables.minimize α subject to 1 x:Vi∈Px ∀ i (25) In this model, L i is the summation of the product of Φ x , d x i and u i .Therefore the objective function α is related to the product of two decision variables u i and d x i , and the optimization problem falls into the MIQP (Mix Integer Quadratic Programming) category.In order to avoid quadratic programming, we could introduce z x i to decouple d x i × u i by using Equations ( 26) to (28).It is easy to see their equivalence.
When u i = 0, z x i = 0 from (27); and when u i = 1, z x i = d x i from (28).
Although we could reduce the MIQP problem to the MILP (Mix Integer Linear Programming) problem by introducing z x i , the new MILP problem actually has doubled number of decision variables.This is because the cardinality of u i the cardinality of d x i in practice.Fortunately, the decision variable d x i (for distributing measurement tasks) is highly dependent on the decision variable u i (for monitor placement).If u i = 0 (i.e., router V i is not selected to deploy a monitor), router V i cannot participate in any specific measurement task.It means d x i , the fraction of Φ x (IP flows) for each OD-pair, OD x , x ∈ Θ that router V i measures should be zero (i.e., d x i = 0).On the other hand when u i = 1 (i.e., router V i is selected to deploy a monitor and capable of measurement), d x i could be any decimal bounded between 0 and 1. Therefore we can directly use d x i to substitute d x i × u i in Eq. ( 20)-( 22) to avoid quadratic programming but with a new constraint, 0 ≤ d x i ≤ u i , ∀x, i to replace Eq. ( 24).It is easy to see their equivalence.The formulation now becomes MILP and keeps the original number of decision variables.

Optimal MILP/Heuristic Solutions
The optimal solution searches for the best d x i and u i assignments for the hybrid load-balancing and placement problem under the assumption of using limited flexible K monitors instead of M to minimize the maximum measurement workload across them (e.g., minimize M AX(L i ), i = 1 . . .K).The simplified formulation is MILP problem since u i is a binary decision variable and d x i is a continuous decision variable.There is a variety of optimization tools that we can leverage.In particular, we use an MILP solver (e.g., CPLEX [10]) to find the optimal solution.We refer to this solution as "Optimal".For small to medium size networks, the optimal load-balancing with placement solution can be readily found.However, given that the time-complexity of MILP problems are in general NPhard, the solvers are not fast enough for large networks.
It is easy to see that the hybrid load-balancing and placement problem becomes a LP (Linear Programming) problem if the monitor placement strategy is given (i.e., with fixed u ij ).Therefore, all of our proposed heuristic solutions tend to decide the monitor locations first.In this section, we propose two heuristic solutions to approximate the optimal performance: "LB-Successive Selection" and "LB-Greedy".Both of them iteratively select monitors to disable, based on the different planned monitor placement strategy decided from the previous iteration.They all start from an initial configuration under the assumption that all M routers are fully deployed with monitors.We refer to this initial configuration as the "All-On" stage.The monitor-disable process is repeated until only K out of M monitors are left.
LB-Successive Selection: it starts from the initial All-On configuration where all M routers are assumed to be fully deployed with monitors, and iteratively chooses one monitor to disable after optimization process (i.e., minimize M AX(L i )) until only K out of M monitors are left.The selection of which monitor to disable is based on their ranking of Disable the monitor with minimum L i and least performance-metric 13: end if 14: end while measurement workload (e.g., L i ).We choose the one having least measurement workload across all remaining monitors (e.g.,V i =min(L i ), i=1 . . .M ), where M stands for the set of remaining routers deployed with monitors.The intuition is that the monitors with higher measurement workloads after optimization process (i.e., minimize MAX(L i ),∀i, the maximum workload across all monitors) take more measurement responsibility for the traffic from some OD-pairs which have few monitors deployed in their routing paths.Therefore, those monitors can not be disabled, otherwise their assigned measurement task can not be further redistributed.If more than two routers have the same minimum measurement workload in each iteration, LB-Successive Selection calculates one of the following three metrics which are served as tie-breaker and disables the one with least value: • Least-traffic( x:Vi∈Px Φ x ).The intuition is that the monitors with the least amount of traffic passing through them have less freedom to load-balance the measurement tasks for each OD-pair traffic.• Least-LB(uniform).We use LB(uniform) heuristic mentioned in Section 3.3 to find corresponding measurement workload across all remaining monitors to serve as our second-stage tie-breaker.• Least-LB(weighted).We use LB(weighted) heuristic to find corresponding measurement workload across all remaining monitors to serve as our second-stage tie-breaker.
In particular, it disables monitor based on their ranking calculated from the previous iteration (Line 12).This means we use the information from the previous iteration (i.e., planned measurement fraction d x i ) to calculate the metric for each monitor in the current iteration (Line 5-6).
LB-Greedy Algorithm: similar to LB-Successive Selection, the LB-Greedy algorithm also disables one monitor in each iteration, until K monitors are left.However, it is more timeconsuming since it tests all remaining monitors one-by-one in each iteration.To test the importance of each monitor, LB-Greedy re-computes the minimized α after turning off each monitor alternately (Line 2-7), which essentially involves numerous optimization procedure (Line 4) mentioned in Section 3.3.Based on the testing of every remaining monitor, it disables the one that has least impact on α (Line 8).
Since the LB-Greedy algorithm exhaustively tests individual monitor in each iteration, its performance is expected to be close to the optimal MILP solution.Enable the monitor, V i 7: end for 8: Find monitor, V i , with smallest α ∈ M when they are disabled 9: M ← M/{V i } 10: end while

PERFORMANCE EVALUATION
We evaluated the performance of LEISURE with three optimal solutions, LB(min-VAR), LB(min-MAX) and LB(min-VAR-given-MAX), for different load-balancing objectives in various realistic scenarios on two separate real, large pointof-presence(PoP)-level backbone networks: Abilene [7] and GEANT [8].We also compare them with several simple naive approaches, namely LB(ingress), LB(egress), LB(uniform), and LB(weighted).Our starting point is to conduct a preliminary evaluation on the basic model in Section 5.2 based on three assumptions: (1) all routers are equipped with monitors that are capable of performing the measurement task, (2) traffic from each OD-pair has a single router-level path by OSPF and (3) there is only one measurement task.We relax the first assumption in Section 5.3 to show our load-balancing ability and the computation time complexity in this more general case.We further relax the second and third assumptions and discuss their evaluations in Supplemental Section 10.

Experimental Setup and Performance Metrics
We use two real datasets from the Abilene [7] and GEANT networks [8], both of which have been studied and discussed in the research literature.Their data sets are publicly available, including network topology, routing information.Based on these available data sets, we implemented a flow-based trace-driven simulation to conduct our evaluations.For both networks, we use the real traffic matrices provided by a third party [11].The traffic matrix data sets for the Abilene network are available at [12], and the traffic matrix data sets of the GEANT network are available at [13].
GEANT: It connects a variety of European research and education networks.Our experiments were based on the December 2004 snapshot available at [14], which consists of 23 nodes and 74 links varied from 155 Mbits/s to 10 Gbits/s.The traces we use were collected from April 11-15, 2004.The traffic matrix we use consists of demands for every OD-pair within a certain time interval (5 mins for Abilene and 15 mins for GEANT).We construct OD-pairs by considering all possible pairs of PoPs and calculate their shortest-path routes.In brief, these traffic matrices are derived from flow information collected at key locations of the network, and is transformed into the demand rate for each OD-pair based on the control plane information.
In the following sections, we assume our target is to measure all traffic (i.e., a x = 1, ∀x ∈ Θ).Therefore the workload L i for router R i (i = 1 . . .M) is defined as the traffic amount that router R i measured normalized by the total traffic demand.Theoretically, the ideal load-balancing workload L i for M monitors is 1  M .However, it might be unachievable due to routing limitations from TE or resource constraints on monitors.In our experiments, we are interested in the following three performance metrics: • Maximum Workload: We use the maximum value of each monitor's measurement workload in the entire network to serve as our load-balancing performance metric mainly (e.g., MAX(L i ),i=1 . . .M). • Variance of Workload: The other load-balancing performance metric used in this paper is the variance of workloads across all monitors (e.g., VAR(L i )).• Computation Time: In our experiment, we only collect computation time for the LP or MILP solver since they usually take much longer time compared to normal numerical computation, and therefore dominate the whole computation time of LEISURE.Meanwhile, the computation time for LP or MILP may vary for different solvers.We therefore do not mix them with other numerical computations.

Basic Load-Balancing Comparison
In this section, we compare the load-balancing performance of all approaches based on two assumptions (ubiquitous monitors and single path routing).Table 3 compares MAX(L i ) of all monitors for different approaches.For GEANT, our optimal load-balancing solutions can reduce MAX(L i ) by a factor of 4.75X(= 28.79% 6.06% ) when compared to the naive approach of LB(ingress) and 2.27X(= 13.73% 6.06% ) when compared to LB(uniform).Similar gains can be seen in the results for Abilene as well.Fig. 2 plots in more details the L i values of 11 monitors and 23 monitors for different load-balancing approaches in Abilene and GEANT networks, respectively 2 .
Another relative performance measure is to see how close the maximum workloads are in comparison to the ideal loadbalancing case of L = 1 M , as given by Eq.( 8).For Abilene and GEANT, the ideal L is 9.09%(= 1  11 ) and 4.35%(= 1 23 ), respectively.However, the MAX(L i ) of LB(ingress) for Abilene and GEANT are 19.16% and 28.79%, respectively, which are 2.11X and 6.62X worse than the ideal case.For simple heuristic approaches, they still have large MAX(L i ) values compared to the ideal case: e.g., 21.67% (2.3X worse) for LB(uniform) in Abilene and 10.67% (2.4X worse) for LB(weighted) in 2. Ideally if the network topology is fully-meshed, LEISURE can achieve the ideal load-balance performance.However due to the limitation of network topology and routing strategy(e.g., OSPF), the peripheral routes are not able to share the majority measurement task concentrated in core network (e.g., R9(Seattle) in Abilene [7]).GEANT.On the other hand, our three optimal load-balancing solutions presented in Fig. 3 and Table 3 perform very close to the theoretical ideal case: 10.11%, 9.45%, and 9.45% for LB(min-VAR), LB(min-MAX), and LB(min-VAR given MAX), respectively, as compared to the ideal case of 9.09% for Abilene.Similarly, our three optimal solutions are 6.15%, 6.06%, and 6.06%, respectively, as compared to the ideal case of 4.35% for GEANT.Table 4 compares VAR(L i ) across all monitors for different approaches.For Abilene, our optimal load-balancing solutions can reduce VAR(L i ) by a factor of 70X(= 0.007366 0.000105 ) when compared to the naive approach of LB(egress), and over 30X(= 0.003158 0.000105 ) when compared to LB(uniform).Similar improvements in variance can be seen for GEANT as well.
To better understand why our optimal solutions can achieve more evenly distributed measurement load, we use traffic from only five OD-pairs in Abilene 3 to show the detailed load assignment in Fig. 3 (WAS-DNV, NYC-HST, DNV-IPL, CHI-LOS and ATL-STT with 66.5 MB, 44.9 MB, 44.6 MB, 19.8 MB and 11.7 MB, respectively).In Fig. 3(a), although LB(uniform) distributes each OD-pair traffic to all monitors in the path uniformly (e.g., WAS-DNV with 6 monitors), the aggregated workload for overall measurement task in each monitor is still unbalanced (e.g., L i for all routers R i (i = 1 . . .10) are distributed between 1% to 17%).LB(weighted) in Fig. 3(b) improves the load-balancing performance due to the global view it has but still load-balanced poorly (e.g., L i distributed between 4% to 14%).In contrast, the optimal solutions can achieve much better load-balancing performance (e.g., L i distributed between 5.5% to 10.5%) by excluding some monitors from measuring certain OD-pair traffic (e.g., R4 and R5 do not measure traffic for WAS-DNV OD-pair in Fig. 3(d)).

Limited Number of Monitors
In this section, we relax our first assumption to the case that only a subset of routers are deployed monitors and capable of measurement.We further evaluate LEISURE in the cases of fixed and flexible monitor deployment.

Fixed Monitor Deployment Scenario
In the first case, we assume there are K = 7 out of M = 11 routers are deployed with fixed monitors in Abilene.The routers which are excluded to deployed monitors are R 0 , 3. The notations of these OD-pairs and their routing information could be found in [7], [12].R 5 , R 7 and R 8 4 .Therefore LEISURE can only distribute the measurement task to the remaining 7 monitors.We omit naive approaches and focus on heuristic (i.e., LB(weighted)) and optimal approaches in Fig. 4. Compared with ubiquitous case in Fig. 2(a), the ideal load-balancing workload is increased from 9.09% to 14.29%.For LB(min-VAR), LB(min-MAX) and LB(min-VAR given MAX), the MAX(L i ) is only increased from 9.67% to 17.61%.However, for heuristic approaches, MAX(L i ) increased from 12.12% to 23.33% for LB(weighted), and 21.67% to 35.86% for LB(uniform).We observe that LEISURE with these three optimal solutions for different load-balancing objectives only increased 7.94% workload for MAX(L i ), which are close to 5.2% for the theoretical ideal case and are much better than 11.21% for LB(weighted) and 14.19% for LB(uniform).

CONCLUSION
In this paper, we proposed an optimization framework, LEISURE, for load-balancing network-wide traffic measurements across coordinated monitors in the network.This is an important problem because without judiciously load-balancing measurement tasks across monitors, the existing monitors can be overwhelmed due to competing measurement tasks or 4. The reason to choose those 4 excluded routers is to maintain the fact that at least one capable monitor in each OD-pair's route to fulfill the measurement tasks imposed by the network operator.
5. We omit the result of LB-Successive Selection(traffic)/LB-Successive Selection(uniform) since they perform close to LB-Successive Selection(weighted).

Fig. 4 .
Fig. 4. Measurement load distribution with limited 7 out of 11 monitors in Abilene.

TABLE 1 Notations
Notation Description ODx represent a set of flows between the same pair of ingress/egress routers Θ the set of all |V | × |V − 1| OD-pairs: ODx, x ∈ Θ Φx characterizes the traffic demand (IP flows) of OD-pair ODx, x ∈ Θ Px represents the given routing strategy for OD-pair ODx, x ∈ Θ ax the fraction of Φx (IP flows) of ODx that is required to measure d x i the fraction of Φx (IP flows) of ODx that router V i measures β the total required measurement traffic (number of IP flows) L i the total traffic (number of IP flows) that V i measured normalized by β α load-balancing objective

TABLE 2 d x
Algorithm 1 LB-Successive Selection Algorithm 1: while More than K monitors are left do However it is still suboptimal since LB-Greedy only tests individual monitor instead of every possible combination.Besides, the algorithm remains computationally costly, since it tests O( M ) monitors with O( M ) LP problems in each iteration.For a moderate sized topology, an MILP solver can sometimes work faster than this LB-Greedy approach.Details are shown in Section 5.3.2.

TABLE 4
Comparisons on Variance of L i (d) LB(Min-VAR given MAX) Fig.3.Detailed Abilene results for five OD-pairs.Optimal solutions allow nodes to be excluded from measurement if they are already overloaded.