The performance and security of modern data centers and 5G networks are critical for meeting application’s service-level objectives and quality-of-experience. Careful scheduling and resource allocation are needed to minimize power consumption, enable efficient cellular handovers, and avoid performance degradation. Network telemetry can help identify potential security breaches and protect against cyber attacks by detecting anomalies in traffic patterns, suspicious activities, or unauthorized accesses. Monitoring the network helps with both performance optimization and security. However, the resource utilization (e.g., CPU and memory of a monitoring host) for monitoring is a challenge. This dissertation presents designs that use heterogeneous computing infrastructure, including SmartNICs and network switches, to optimize the performance and security of data centers and 5G cellular networks. First, we design SmartWatch, a monitoring platform deployed on SmartNICs to scalably and accurately detect network traffic anomalies with little processing overhead. We then build on the SmartWatch traffic monitoring design to optimize performance and manage power consumption in data centers with pMACH, and capture traffic characteristics in 5G cellular networks and predict mobility patterns, in Synergy. Finally, we develop 5GDMON, a distributed monitoring system to analyze traffic from multiple vantage points in an open radio access network (O-RAN) infrastructure of the evolving software-based 5G network.
The first contribution addresses the challenge of detecting low and slow attacks in networks. Traditional traffic queries deployed on network switches are limited by hardware constraints, leading to undetected attacks during high traffic volumes. SmartWatch proposes a flow-state tracking and flow logging system that leverages SmartNICs to detect stealthy attacks in real-time. SmartWatch’s yields 2.39 times better detection rate compared to existing platforms deployed on programmable switches. SmartWatch can detect covert timing channels and perform website fingerprinting more efficiently compared to standalone programmable switch solutions, relieving switch memory and control-plane processor resources. Compared to host-based approaches, SmartWatch can reduce the packet processing latency by 72.32%. Our subsequent work, namely pMACH, Synergy, and 5DGMon, builds upon the SmartNIC’s packet processing pipeline proposed in SmartWatch.
The second contribution proposes pMACH, a distributed container scheduling system that optimizes power consumption and task completion time in data centers. pMACH leverages affinity between application components for placement-decisions to minimize communication overheads and latency. pMACH extends SmartWatch’s monitoring capabilities in the data center to capture cloud-application communication-patterns and uses it towards making better task placement decisions. It proposes in-network monitoring using SmartNICs to measure communications and perform scheduling in a hierarchical, parallelized framework. Both testbed measurements and large-scale trace-driven simulations show that pMACH saves at least 13.44% more power compared to previous scheduling systems. It speeds task completion, reducing the 95th percentile by a factor of 1.76-2.11 compared to existing container scheduling schemes. Compared to other static graph-based approaches, our incremental partitioning technique reduces migrations per epoch by 82%.
The third contribution focuses on 5G user plane function (UPF), a critical interconnection point between the data network and cellular network infrastructure. UPFs typically run on general-purpose CPUs but are limited in performance due to host-based forwarding overheads. We design Synergy, a novel 5G UPF running on SmartNICs, that provides high throughput and low latency while supporting monitoring functionality for handover prediction and optimization during user mobility. Synergy extends SmartWatch’s monitoring capabilities to capture and predict vehicular mobility patterns. This is then used to prepopulate state, even before the vehicle moves to the next base station, reducing handover latency. Buffering in the SmartNIC, rather than the host, during paging and handover events reduces packet loss rate by at least 2.04×. Compared to previous approaches to building programmable switch-based UPFs, Synergy speeds up control plane operations such as handovers because of the low P4-programming latency leveraging tight coupling between SmartNIC and host. The subsequent paper, 5GDMon proposes a distributed monitoring system that analyzes traffic at multiple vantage points in the 5G ORAN infrastructure. In other words, 5GDMon is the distributed implementation of Synergy designed to detect network-wide anomalies.
The fourth contribution proposes 5GDMon, a distributed cellular monitoring solution that summarizes traffic characteristics monitored in the distributed radio access network (RAN). The summaries are communicated to data analysis engines running in the core of the network and support zooming into traffic subsets. SmartNICs are used for fast and efficient monitoring in the RAN, with query computation distributed to multiple UPFs using graph partitioning to balance the load. Here we leverage SmartWatch to collect the packet matrix while pMACH is used to load balance the query processing tasks. Deploying 5GDMon results in 37.99% fewer infected devices during controlled Mirai Botnet attack experiments and 1.35× higher resource fairness against adversarial heavy hitter attacks. 5GDMon achieves 3.92× lower error in detecting mobile proxies and up to 36% higher accuracy in detecting Tunnel Endpoint Identifier brute-forcing attempts.
In conclusion, our four contributions utilize SmartNICs to optimize performance and security in the data center and cellular networks. By using SmartNICs, we allow for CPU cores to be dedicated towards mission-critical tasks while ensuring performance is not compromised by adversaries or poor scheduling decisions.