Routing-as-a-Service (RaaS): A framework For tenant-directed route control in data center

In a multi-tenant data center environment, the current paradigm for route control customization involves a labor-intensive ticketing process, in which tenants submit route control requests to the landlord. This results in a tight coupling between tenants and landlord, extensive human resource deployment, and long ticket resolution time. We propose Routing-as-a-Service (RaaS), a framework for tenant-directed route control in data centers. We show that RaaS-based implementation provides a route control platform for multiple tenants to perform route control independently with little administrative involvement, and for the landlord to set the overall network policies. RaaS-based solutions can run on commercial off-the-shelf (COTS) hardware and leverage existing technologies, so it can be implemented in existing networks without major infrastructural overhaul. We present the design of RaaS, introduce its components, and evaluate a prototype based on RaaS.


I. INTRODUCTION
Data center is a key infrastructure for on-line service providers (OSP) to provide always-on and responsive services to end-users.Typically consist of 1,000's to 100,000's of servers, data centers are designed to handle tremendous computations, large storage, and quick service delivery.However, the computational resources in a data center are not used monolithically.Often the resources are multiplexed between different tenants -clients of the data center resource -so they can simultaneously perform computations, store data, and provide services to end-users.
In this paper we focus on routing as a service to tenants.Recent cloud computing architectures such as Amazon's EC2 [1] show promising direction in tenant-directed control, allowing control of IP-to-Virtual Machine binding without administrative involvement.Extending this notion, routingas-a-service to tenant promotes the idea that tenants can programmatically determine where requests for their services go.For example, instead of a single server serving user traffic, a tenant might want the traffic to her service load-balanced across 10 machines.Or, the tenant would like to reserve subsets of servers as standby in case the primary servers fail.The traditional paradigm for achieving such per-tenant routing customization involves a ticketing process, which we outline below.Figure 1 shows a typical ticketing process for tenants to request routing customization.A tenant first submits a request for routing customization (a "ticket") to a ticket distribution system, upon which a landlord (i.e., data center resource owner and/or manager, in this case a network administrator) is assigned to the ticket.After rounds of clarification between the tenant and landlord, the landlord sets up routing policies based on his understanding.Further clarifications might be required if the installed routing policy is unsatisfactory to the tenant.Finally, when both sides are content with the routing policy, the ticket is considered resolved and the routing customization request is considered fulfilled.
The following problems are common with this paradigm.Labor intensive process: Many of the steps in Figure 1 involve manual intervention, which burdens both the tenants and landlord, but more so the landlord because it takes away time the landlord can spend improving and maintaining the network.While tolerable when the request volume is small, such a system is unsustainable as the volume and variety of customization increases.Tenants lacking automated control: The traditional paradigm takes away tenants' ability to automatically control routing to their services.Therefore, tenants often have to submit routing policies that satisfy a certain class of scenarios (e.g. the average traffic scenario, worst-case scenario).In addition, reacting fast to changes in this paradigm means more tickets inundated to the ticket distribution system, causing overwhelming work for the landlord.Long ticket resolution time: As a byproduct of having a laborintensive process, the landlord might not resolve the tickets quickly.In the simple case that tenants and the landlord communicate via e-mail, this can take days.In the complicated case where an in-person meeting is required for each round of discussion, this can take up to weeks.Such a delay might not be acceptable if tenants desire a quick response to changes in the network environment.
This paper proposes the Routing-as-a-Service (RaaS) framework.RaaS promotes automated route control to tenants while retaining the landlord's authority in setting the network policy.The RaaS architecture consists of MultiSpeakers, Controllers, and Tenant Applications, with the former two under the landlord's control and the latter maintained by the tenants.The MultiSpeakers and Controllers together expose application programming interfaces (APIs) to tenants to perform automated route control, while the landlord can set the policy at the Controllers.Tenants' applications execute customized route control algorithms, allowing for customized and quick routing changes.
Our contributions are threefold: • We propose a framework that provides a programmatic environment for tenants to use routing as a service, while reducing landlord's management effort, resulting in reduced personnel cost (Section II).• We build a prototype of RaaS (described in Section III) based on commercial-off-the-shelf (COTS) components and existing protocols, demonstrating that RaaS is immediately applicable to data center networks.• We conduct detailed performance evaluations of RaaS in terms of its processing delay, memory consumption, network overhead, and success rate in serving requests, showing that it does not cause overwhelming burden on the network (Section IV).The paper proceeds with a system overview in Section II and the implementation in Section III.An implementation based on RaaS, along with a theoretical model for the service availability, are evaluated in Section IV.Related works are discussed in Section V and we conclude the paper in Section VI.

II. RAAS OVERVIEW
This section gives a high-level overview of RaaS, and introduces the components that enable tenant-directed route control.An overview of the system is shown in Figure 2.

A. Design Considerations
In designing the RaaS framework, we task ourselves to come up with a framework that not only allows tenants to customize their routing, but are able to do it safely.This means tenants can control their routing without unintentionally changing the routing policies of other tenants, or even worse, the overall network policies.In addition, components in the RaaS framework should not require major infrastructural overhaul, and should be flexible enough to be assembled in various configurations (e.g., 1+1 redundancy).We tackle these issues by first leveraging current knowledge about the capabilities of existing routing protocols [2]- [4].Then, we design critical components to be lightweight and stateless when possible, so they can be deployed in various configurations.In the end, RaaS is designed to be a modular framework that is capable of giving multiple tenants routing customizations without burdening the existing network infrastructure.

B. MultiSpeaker
MultiSpeakers actively maintain sessions to the router, so it could relay the requests approved by the Controllers.To ensure no fundamental changes are made to routers, MultiSpeakers communicate with routers over well-known protocols.In RaaS, MultiSpeakers use Border Gateway Protocol (BGP) [5] to install tenants' routing policies.MultiSpeakers provide an API for the Controllers to relay approved tenant routing requests to the router.
Deployment of redundant MultiSpeakers is easy in RaaS, since no communication occurs between MultiSpeakers -all the coordinations are orchestrated by the Controllers.Also, MultiSpeakers do not maintain states that would otherwise require a coherence protocol (e.g., BGP messages sent by the MultiSpeakers).This enables MultiSpeakers to be lightweight and stateless agents that simply act as relays for tenants to install their routing policy.
It may seem counterintuitive to use BGP, an inter-domain solution, for routing control within a single administrative domain.Indeed, Interior Gateway Protocols (IGPs) such as Routing Information Protocol (RIP) [6], Open Shortest Path First (OSPF) [7], and Immediate System to Immediate System (IS-IS) [8], [9] are IGPs that are well established and enjoy a wide adoption.However, there are several good reasons for using BGP, and they are outlined below.Simple State Machine: Compared to protocols such as OSPF, the state machine necessary to establish a functional session is simpler in BGP.A simpler state machine not only eases code verification to minimize bugs, it also makes additional augmentation easier, as explored in Section III-D.Flexible placement of MultiSpeakers: While a simple state machine such as RIP is desirable, flexible placement of MultiSpeakers is a desirable trait that RIP cannot satisfy.In RIP, each router exchanging RIP messages must be directly connected.This limits the placement of MultiSpeakers to machines that are one hop away from routers, thus constraining the flexibility of MultiSpeaker placement.BGP has a mode ("Multihop eBGP") that enables two BGP speakers to exchange routing messages even if they are not directly connected.This makes it possible for MultiSpeakers to exchange messages to core routers, without having to connect to them directly.Easy Resource Management: In RaaS, resource management equates to manipulating routing to specific physical resources (to be discussed in more detail in Section II-D).If the routing is manipulated by IGPs such as OSPF, it could affect the data plane and cause route instability for protocols above IGP (e.g.BGP).For example, consider other BGP-learned routes that are also in the routers.If a tenant distributes the traffic over machines in several subnets, IGP would need to change link metrics to ensure the path metric to all servers are equal.Changing the link metric, however, can affect the egress point of BGP-learned routes, causing a ripple-effect to other ASes.Thus, using BGP avoids unintentional changes to the data plane while keeping resource mapping manipulation possible.

C. Controller
Before routing policies are received by MultiSpeakers, they must first pass through the Controller, as shown in Figure 2. The Controller provides an API for tenants to submit routing requests per their routing policy.By providing an API to tenants, RaaS lessens the need to dedicate large amount of landlords's time when tenants need to change routing to their services, since such a task can now be assigned to the Controller.
To prevent tenants from making erroneous routing requests, however, the landlord and tenants need to agree on the set of resources ℜ (i.e., servers) the tenants can host the service.Upon agreeing on ℜ, the landlord can implement policies that reject routing requests for resources not in ℜ.The admission policy can be much more complicated, involving dynamic conditions of the network, and it will be up to the landlord to set up the admission policy.Because of the Controller, the landlord only needs to understand the constraint on tenants' routing policies, and can leave the actual routing policy implementation to the tenants This reduces the amount of manual labor the landlord has to invest in allowing tenant to customize their routing.
In addition to providing an API and policy enforcement, the Controller also coordinates MultiSpeakers.When the Controller accepts tenant's routing requests, it records the requests and to which MultiSpeaker it is destined before forwarding the routing request.This helps the Controller check if duplicate routing requests have been received, a likely indication of tenant application error, and inform the tenant application of such a duplication.Storing the requests also allows Multi-Speakers to be bootstrapped upon restart; this enables the Controller to be the state memory for MultiSpeakers, making the MultiSpeakers lightweight and stateless.

D. Tenant Application
Tenant application is the component that allows tenants to implement their routing policies.Through the APIs provided by the Controller, tenants can choose how to control traffic to their services.In order for tenants to control routing to their services, RaaS requires each tenant to be assigned unique tenant IP addresses (TIAs).These addresses will then be bound to the services tenants develop, and used for subsequent routing requests.
To control routing to their services, tenants issue API calls to the Controller to change the binding between TIAs and the set of resources available to the tenant.Instead of network administrators manually configuring routing policies, tenants can develop programs to automatically change routing to their resources (i.e., changing the TIA-to-resource binding).Tenants can now develop complex programs to install policies without landlord intervention.
With the use of TIA and tenant applications, independent and safe route control is possible.Since TIAs are unique to each tenant, other tenant applications cannot change routing service that are not their own.For each routing request, the Controller checks the owner of a tenant application issuing the call through a security token.If the TIA is not listed under the requesting tenant's control, the request will be rejected.Also, since tenant applications are separated, each tenant can control routing to their resources independent of other tenants.
However, the actual resources being routed to are shared amongst tenants.For example, if ℜ Alice = resources for Alice and ℜ Bob = resources for Bob, |ℜ Alice ℜ Bob | could be greater than 0. This separation of virtual resources (i.e., the TIAs) and physical resources (i.e., servers) enables resource multiplexing amongst different tenants while providing safe route control amongst tenants.

E. TIA-Resource Mapping and BGP
So far the discussion presents tenant routing in the context of changing the TIA-resource mapping, but how is the mapping installed and changed using BGP?In BGP, routing changes are announced via the BGP Update message type, in which a prefix originator (i.e., the entity who owns the IP prefix) announces or withdraws a route to the prefix.In a route announcement, the BGP Update message contains the destination IP prefix and next hop address, where the next hop address indicates the next node the packets should use to reach the IP prefix.In a route withdrawal, the BGP update message simply contains the IP prefix so the routing entry corresponding to the prefix is removed from routers.
In the context of TIA-resource mapping, the TIA address is represented by the IP prefix, and the resource is represented by the next hop address.Thus, to install a TIA-resource mapping, a BGP Update message to the router should be an announcement, with the TIA address being the IP prefix and the IP address of the resource being the next hop address.To change the TIA-resource mapping, one BGP Update message to the router should be a route withdrawal to delete the existing mapping, followed by a second BGP Update message announcing the new TIA-resource mapping.Alternatively, sending just a BGP UPDATE message with the new next hop address will achieve the same effect, as the router will treat it as an implicit withdraw.

III. SYSTEM DESIGN
This section presents the implementation of the Multi-Speaker and Controller.Tenant application will be briefly mentioned, since the actual implementation is tenant-dependent.In addition, enhancements to the MultiSpeaker is possible and is presented here.The MultiSpeaker and Controller components and their overall interactions are shown in Figure 3.

A. Tenant Application
When tenants want to customize their routing policy to the resources (ℜ) they have, their applications can issue calls to the Controller's API, which is shown in Table I.For portions of the policy that involve changing the TIA-to-resources mapping (i.e., changing which resources service user requests), the applications can issue calls to the Controller's API.As mentioned in Section II-E, changing the TIA-to-resource mapping equates to changing the next hop of the destination.So, if a tenant Alice was given ℜ = {server 1 , server 2 , server 4 }, to initialize her service to server 1 , she sets FirstSer-viceRoute = {destination: T IA Alice , next hop: IP machine1 }, and calls AddRoute(FirstServiceRoute, T oken Alice ).To  The tenant API enables on-demand remote procedure calls and reliable messaging exchange via TCP.Setting up the API this way ensures each request can be reliably sent to the Controller without having to implement a reliable service at the application layer.The major API methods exposed by the Controller is shown in Table I.Although the methods provided are few, they are sufficient in producing complicated resource remapping logics.
The validation module takes in tenants' routing requests as input and outputs a binary answer.The output is fed to both the MultiSpeaker management module -for the module to determine whether to forward the request onto the MultiSpeaker -and the tenant API so it can indicate to tenants the success of the operation.The validation module takes the route and token and checks whether the TIA and destination address belong to the tenant who owns the token; if not the validation module marks the request as invalid.Then, the validation module passes the result to the MultiSpeaker management module and the tenant API.
The MultiSpeaker management module manages the communication between the Controller and the MultiSpeaker.In addition to passing routing requests and route inquiries, it also ensures MultiSpeaker states reflect the state memory stored at the Controller.To achieve this, both the MultiSpeaker and Controller maintain an acknowledgement table.Each entry of the table contains a (TIA, destination IP, action type) tuple, denoting an entry that the Controller and MultiSpeakers has to acknowledge as an entry that has been sent to the router.The MultiSpeaker management module also detects MultiSpeaker restart so the Controller can bootstrap MultiSpeakers when they restart; MultiSpeaker management module can detect MultiSpeaker restart by periodically polling the MultiSpeaker.

C. MultiSpeaker
MultiSpeaker consists three components: protected API, BGP module, and transformation module.
The protected API implements methods for MultiSpeaker to exchange messages with Controller's MultiSpeaker management module.The methods are similar to those exposed by the Controller in Table I, and so we omit it here.
The translation module takes tenant requests as input, and outputs well-formed BGP Update messages.The translation module includes different fields in the BGP Update message, depending on the call being a WithdrawRoute or AddRoute.For WithdrawRoute() calls, the translation module generates a BGP Update messages with the WITHDRAWN ROUTES fields filled.For AddRoute() call, the module generates a BGP Update message that includes the NEXT HOP and NLRI fields.In addition to the destination IP prefix and next hop IP address, Update messages for AddRoute() calls also include the AS paths.AS paths is a mandatory attribute that encodes the autonomous system (AS) numbers for which the BGP Update message has traversed from the prefix origin.Even though tenants are the origins in supplying the destination IP prefix, having tenants supply the AS number would imply tenants having knowledge of the innards between routers and MultiSpeakers.To avoid such a burden on tenants, Multi-Speakers act as the origin of tenants' prefixes.Thus, the translation module uses the AS number of the MultiSpeaker as the first AS in the AS path.
The features and attributes implemented by the BGP module is minimized to the set of features necessary to establish BGP sessions, add/withdraw routes, and react to router notifications.Advanced attributes such as AS federation and advanced features such as BGP route reflection are not implemented.Using a BGP module, MultiSpeaker provides information isolation between the tenants and routers, much like BGP MUX [10].For tenants, they are isolated from the interactions between MultiSpeakers and routers, but are still able to perform route control.On the other hand, routers are not exposed to the set-up within the RaaS architecture, and interact with MultiSpeaker as if it is another BGP-capable speaker.This separation provides flexibility for the implementation of RaaS to vary with minimal impact to routers and the tenants.

D. Equal-Cost Multi-Path Enhancement (ECMP)
Discussions on the BGP module thus far assumes each BGP module can only establish one BGP session with each router (as depicted in Figure 3).Such a configuration would be fine if tenants only announce a single TIA-resource mapping at a time.However, in cases where tenants announce one-to-many TIA-resource mappings (e.g., for load balancing), multiple MultiSpeakers would be required.This method would require the number of MultiSpeakers, N , to be k × max ∀t∈tenants mappingSize t , where k is the number of routers MultiSpeaker connects to, and mappingSize is the cardinality of one-to-many TIA-resource mapping.Intuitively, the equation above says the number of MultiSpeakers needed is the number of routers needing a BGP session, multiplied by the maximum count of one-to-many TIA-resource mapping needed by any tenant.If redundancy is required, an unmanageable amount of MultiSpeakers would need to be deployed.A simple extension to the BGP module could be implemented, in which each BGP module instantiates multiple BGP sessions to the router, with each session capable of announcing one TIA-resource mapping per tenant.Implementing this extension simply requires the BGP module to keep separate state machines and data structures for each session.Since there is no need for the instantiated sessions to cross-communicate, MultiSpeaker complexity does not change much.We note that implicit withdraw (Section II-E) will not work, as router will treat it as another equal-cost multi-path (ECMP) route.

IV. EVALUATION
In this section we evaluate the performance of our RaaSbased implementation.We present the methodology in Section IV-A and the evaluation results in Section IV-B.

A. Methodology
The main metrics of interest are i) the time for the Controller and MultiSpeaker to process each request, ii) memory consumptions of various data structures, iii) network overhead incurred by the APIs, and iv) availability of the Controller to serve tenant requests.To demonstrate the utility of RaaS, we developed a prototype based on RaaS using C# and Windows Communication Foundation (WCF) [11] for the remote procedure calls.Our choice of programming language was based on the ease of development and the use of WCF was its seamless integration with C#.The experiments were carried out on COTS hardware that include a dual-core 2.80GHz machine with 4GB of RAM and two single-core 1.7GHz machine with less than 1GB of RAM.The timing experiments were carried out on the dual-core machine, and the network overhead experiments were carried out across the three machines.
To collect detailed memory usage of the various data structures, a custom program loads each data structure, one at a time, and drives realistic loads on the data structures.For example, to collect the memory usage of the acknowledgement table, the program loads an acknowledgement table and inserts various amount of entries to it.The processing time is collected by implementing a tenant application that sends route requests and collect the time taken for the Controller and MultiSpeaker to respond to the requests.Both the memory usage and processing time experiments described above were carried out on a single machine, since they are not affected by the network.A second set of experiments was carried out between two machines to measure the network utilization.Since MultiSpeaker's configuration affects the processing time and the memory consumption of both the Controller(E.g., time: route assignment, memory: MultiSpeaker state  threads), we vary the parameters of the MultiSpeaker's configuration and collect the time and memory metrics.Specifically, for each experiment, we vary the number of routers (denoted as RS) and ECMP sessions to each router (denoted as E).In addition, for the Controller we also vary the number of Mul-tiSpeakers (denoted as S) being managed by the Controller.
Because we do not have many routers for the MultiSpeaker to establish BGP sessions, we implemented a simple router emulator that waits for the MultiSpeaker to initiate BGP sessions and maintains the session by periodically sending KEEPALIVE messages.
We also demonstrate the feasibility by showing the success rate of tenant able to submit to the Controller on the first try is high with a small number of redundancies, given pessimistic settings for equipment uptime and replacement time.To do so, we formulate a theoretical model for the Controller's, and all equipments on the path from tenant to Controller's, availability and evaluate the success rate for a given tenant request.We use alternating renewal process (ARP) [12] to model the up-time and down-time distribution of components in the network, and derive the probability in the stable state that all the equipments along the path from tenant to Controller will be up during a time interval (e.g., maximum TCP retransmission timeout value).We only model the success rate from the tenant to the Controller because tenants only interact with the Controller.Additional details can be found in the appendix.

B. Evaluation 1) Controller-side evaluation:
The Controller processing times for route operations are shown in Table II.We show the configuration that amounts to little over 1,000 total BGP sessions at the MultiSpeaker, corresponding to the maximum memory usage shown in Figure 4b.Assuming the maximum ECMP possible (i.e 16), the MultiSpeaker is connected to 63 routers.
Table II shows the average and standard deviation of Controller's processing time for the AddRoute and RemoveRoute operations.It shows that both operations can respond to the tenant request within milliseconds of receiving the request, and thus can handle close to 1,000 requests per second on average.This processing speed is fast considering that for each tenant request, the Controller has to inspect up to 1,000 sessions to find route assignments for all the routers.Route addition is slightly slower than route removal because it performs one additional check for the case when the route was withdrawn over a session but is still outstanding (i.e., the route removal has not been sent to the router).In this case the AddRoute operation use the same session in order to avoid a temporary and unintended ECMP.
Figure 4a shows the memory usage to store the Multi-Speaker state.We vary the number of MultiSpeakers manages by the Controller (1,2,4).And for each MultiSpeaker we vary the number of routers it connects to (1,10,1000), and the number of ECMP sessions per router (1,4,16).The plot shows that the memory consumption increases noticeably only when the number of routers per MultiSpeaker is 1,000.This is intuitive because when the number of routers is 1,000, each additional ECMP session per route adds 1,000 more total nodes to the MultiSpeaker state table.We note that in the worst case (4 MutiSpeakers, 1000 routers per MultiSpeaker, 16 ECMP sessions per router, 64,000 total sessions), the persession state consumes about 300 bytes of memory.
2) Speaker-only evaluation: Table II shows MultiSpeaker's processing time for route announcing and withdrawing operations.The processing time measures, per BGP session, the time between receiving the route operation request from the Controller and sending the well-formed BGP request out.Since the time of sending the BGP message to the router is partly influenced by the network delay, which we cannot control, we eliminate the network delay by co-locating the router emulator and MultiSpeaker.The result shows that MultiSpeaker can handle Controller's request quickly, often under 1 ms.Factoring in the network delay, the true response time might be over 1 ms, as the MultiSpeaker can establish BGP session with routers via Multi-hop BGP for better MultiSpeaker placement flexibility.Barring network anomaly, given the current network bandwidth in Data Centers and the small size of BGP messages, the network delay should be small.Adding to the fact that MultiSpeakers and Controller communicate to each other independent of tenant requests, this processing delay is only noticeable by tenants if they query for the route operation status immediately after submitting the route request.
Figure 4b shows MultiSpeaker's memory usage with respect to number of BGP sessions established.Here we do not distinguished whether the session is connected to the same or different routers, because the amount of states being kept for each BGP session is the same regardless of the router.This figure shows that the MultiSpeaker can maintain 1,000 sessions with moderate amount of memory, making a single MultiSpeaker process scalable up to thousands of sessions.
Figure 4c shows MultiSpeaker's memory consumption to store outstanding entries with various configurations of router per speaker and session per router.While most configurations consume less than 20MB of memory, memory usage shoots up sharply when the routers per speaker becomes high (i.e., 1,000).The reason is that when the number of routers per speaker becomes high, each additional outstanding entry per session results in 1,000×E outstanding entries maintained by the MultiSpeaker.For example, in the configuration when RS is 1,000 and E is 4, having 10 outstanding entries per session results in 40,000 total outstanding entries and having 1,000 outstanding entries per session results in 4,000,000 total  outstanding entries.In reality we do not expect the total number of outstanding entries to be as high as 4,000,000, unless the outstanding entries are not periodically cleared by Controller.
3) Network Overhead: In this experiment we are interested in observing the network overhead for the communication between client/Controller and Controller/MultiSpeaker.We capture the network traffic at the Controller to record traffic between the client/Controller and Controller/MultiSpeaker, and later filter the traces to separate the two types of traffic.While keeping ECMP configuration, we realized it was difficult to separate traffic from different ECMP sessions.Therefore, we enable only one session and sent a single request to capture serialized conversation between the client/Controller and Controller/MultiSpeaker.The result can be easily scaled to multiple ECMP sessions, as each distinct session will have roughly the same amount of network overhead.
Figure 5 implies that, given a typical 1Gbps edge bandwidth, our prototype will saturate the link at around 12,500 requests/second (Assuming around 4KB per request.4KB incoming request and 4KB outgoing reply).Since our prototype serves around 1,000 requests/second, we will only be using up to 10% of the link capacity.We also see room for improvement, as majority of the overhead comes from the usage of the WCF framework.Additional bandwidth can also be conserved by avoiding the use of the serialization engines in WCF [13], which converts all data into XML format.
4) Service Availability: Based on the derivation made in the appendix, we use R [14] to evaluate the number of distinct network subnets Controller/MultiSpeaker need to be deployed.We perform the evaluation by setting various values for the number of distinct subnets and expected downtime, and record the success rate.We use the Weibull distribution for the uptime distribution in Equation 4, due to its ability to model different hazard rate characteristics with age.Weibull distribution has the shape (k) and scale (λ) parameter, with the former affecting the hazard rate over time and the latter the expected uptime.To understand the effect of the hazard rate parameter, we plot the success rate against varying k, setting path length = 6, expected uptime = 6 months, expected downtime = 3 days, required equipment uptime during request submission (i.e., ∆T ) = 4 minutes.We choose these parameters based on the reference data center topology shown in the Cisco reference [15], a pessimistic estimation of a typical equipment's uptime and time required to replace it, and maximum possible TCP retransmission timeout (RTO) as defined by RFC 1122 [16].We do so because that's the maximum time the client will wait before considering the Controller/MultiSpeaker dead.We found that the request success rate is insensitive to k, with the difference between the maximum and the minimum success rate less than 0.1% across all k.This is due to the fact that the stable-state success rate is dominated by the ratio of the expected up/downtime, and the temporal variation of the uptime distribution becomes insignificant.Following this observation, we set k=1 for subsequent evaluations.Setting k=1 results in exponential distribution, a common distribution used in reliability engineering.Next, we evaluate the effect of having distinct subnets for the request to be served.We assume that each distinct subnet has a completely disjoint path from other subnets.Then, given the number of disjoint paths, we calculate the success rate using the same parameters as the previous experiment.The result is shown in Figure 6a.It shows that, given the same pessimistic setting, the availability of the overall service is above 90% even when the Controller/MultiSpeaker is hosted in only one location, and the overall service availability quickly converges after  having more than 2 distinct paths.To gain more insight in the gain, we increase the expected downtime and obtain the new success rates.We find the similar conclusion holds: the success rate converges to a stable value quickly and overall service remains highly availability.We also see the gain in having multiple deployment can be significant when services are expected to be down for a prolonged period of time.In the case of expected downtime = 2 weeks, adding the service to another subnet increases the overall service availability by 30%.The above generalization makes a somewhat unrealistic assumption that paths to different subnets are completely disjoint, as in reality many components are shared amongst different paths.To investigate the differences, we modified our formulation to take into account the shared path length and reran the experiments.Figure 6b shows the result for the case when expected downtime = 3 days.We find that the relative availability can be affected by up to 10% when shared path is taken into account, and a maximum availability is visibly less when paths are shared.The upside is that the overall service availability is above 90% in all cases.This is an indication that network administrators should be careful in deploying the Controller/MultiSpeakers, and should strive to have as much path diversity as possible.In summary, the processing speed evaluations show that RaaS obviates the need for landlord to deal with individual requests, results in less personnels needed to process tenants' requests.In addition, the memory evaluations show that RaaS components can be implemented on COTS hardware, making it easily deployable into data center servers.The network overhead and availability analysis together demonstrate that placing several RaaS components for redundancy will not cause overwhelming burden to the network, and only a few redundant components are needed for the tenant to submit their requests successfully.This makes RaaS a flexible framework that can be used to reduce personnel cost and increase network programmability to multiple tenants.
V. RELATED WORKS Dynamic and programmable routing platforms are not unique to RaaS, as there are prior works in both academia and industry that have proposed and implemented such systems.
However, RaaS design differs from previous works in its ease of implementation, deployment of common technologies, and flexibility of tenant-directed route control.
Previous academic works such as NIRA [17], Tesseract [18], RAS [19], Morpheus [2], Transit Portal [4], and RCP [3] proposed customizable routing.These works had a similar goal in providing end-users or clients with the ability to choose how their packets would be routed.However, some of these works ( [17], [18]) require a fundamental change to the transport hardwares that they cannot be easily implemented.On the other hand, RaaS leverages widely-available technologies, so it can be implemented without infrastructural overhaul.Other works leverage existing routing technologies, such as BGP, to control routing either within a single AS ( [2], [3]) or to various upstream ISPs ( [4]).RaaS also leverages the same set of technologies to make route-control possible, but it also provides programmatic interface to clients directly, while providing performance isolation and independent route control.These were not discussed at great length or at all in previous works.Then there are proposals that attempt to extract routing purely as a service [19], which is similar to what RaaS is achieving.However, RaaS provides this control directly to clients, instead of going through a third party, providing routing as a first-class service.
On the industry side, services such as Amazon's EC2 [1], Internap's Performance IP [20], RouteScience's(RouteScience has been acquired by Avaya) [21] PathControl offer route control services to end-users.EC2 is an infrastructure-as-aservice (IaaS) system that gives their tenants control over virtual machine (VM) placement, load balancing, and IP-to-VM mapping.RaaS differs from EC2 in that RaaS offers the underlying routing plane as service.Rather than providing IPto-VM mapping, for example, RaaS can support mapping of IP to any entities in the network that is IP-addressable.Internap's Performance IP service offers automatic route control based on network conditions, and would automatically change routing so customers' packets traverse through the optimal ISP links.RouteScience's PathControl solution is similar to Internap's Performance IP, and it is sold as a hardware solution [22].Since both of these solutions are intended for Internet-side route customization, there is no programmable API for tenant to implement their own route-control logic.

VI. CONCLUSION
The traditional paradigm for route customization involves a laborious and lengthy process, in which landlord and tenants are tightly coupled.In this paper we introduced the Routingas-a-Service (RaaS) framework, where the coupling between landlord and tenants are lessened.In the RaaS framework, the landlord only needs to understand the resource set ℜ of the tenants, and tenants can perform route customization independently of other tenants.This results in less dedicated personnel to process tenants' requests and more independent route control for the tenants.We showed that our prototype based on the RaaS framework can process requests quickly, often within a second of submitting the request.In addition, we also showed that it is possible to offer more aspects of the data center as a service without major infrastructure overhaul.With data centers becoming more popular and widespread, we believe RaaS is an important addition to the set of services that can be offered to tenants.

TABLE I :
Controller interface to tenants.Route = resource routing info, token = tenant identity.switch the service-to-resource mapping to server 4 , Alice would create a new route ReplaceServiceRoute = {destination: T IA Alice , next hop: IP machine4 }, and call WithdrawRoute(FirstServiceRoute, T oken Alice ) followed by AddRoute(ReplaceServiceRoute, T oken Alice ).Additional capabilities such as service fault recovery can also be implemented using these primitives.
table) and MultiSpeaker(E.g., memory: BGP sessions, time is largely unaffected because each incoming request is served in separate

TABLE II :
Route Operation Processing Time.