How can we make emerging network systems more efficient? We use the term “efficient” to emphasize the following characteristics: (a) resource-efficient, (b) optimized for large scale, and (c) provides low latency and high throughput. We specifically target two key problems that are of great importance in large-scale networks today. First, the proliferation of botnets has rendered networks unsafe, and it is hard to have intrusion detection systems that operate at scale, and effectively detect the presence and activities of such botnets. To alleviate this impact, we can identify C2 servers that essentially help in neutralizing the botnets. Second, while the performance of wired networks provides satisfactory user experiences, cellular networks lag. Specifically, the tight interdependence on control procedures with data plane operation can significantly impact user-experienced delays. To address various challenges, we focus on optimizing the 5G cellular core to reduce latency in control plane operations and their impact on the data plane. We design and develop the following three network and security systems which form the basis of this thesis:
First, we propose C2Store, a definitive capability that provides the most comprehensive information on C2 server profiles. We identify untapped sources (information shared by experts on social media) and employ innovative techniques for mining C2 addresses, resulting in the largest archive of C2 server profiles. Our definitive capability can be described by the following numbers: (a) 335,967 C2 servers, (b) five types of sources with 135 distinct sources, (c) 133 malware families, and (d) spanning 7 years. This can significantly benefit threat analysts in understanding the spatial, temporal, and behavioral properties of C2 servers.
Second, we develop C2Scanner, an intelligent scanning system that focuses on optimizing resource consumption while proactively searching for unknown live C2 servers at scale. Our approach relies on: (a) identifying if the C2 communication of malware is “replayable”, and (b) scanning the IP space efficiently to maximize the “return on investment”, i.e., finding maximum live C2 servers, while constraining the number of probes and compute resources. We show that, despite popular belief, 90% of even recently collected binaries are replayable. Furthermore, we conduct an extensive profiling study on the spatiotemporal properties of C2 servers, which we then use to optimize our scanning strategy. Our search strategy is able to find 6 times more C2 servers than the locality-aware baseline for the same number of probes.
Third, the focus shifts to optimizing the performance of the cellular core network. We propose L25GC, which re-architects the 5G Core (5GC) network to reduce latency in control plane operations and their impact on the data plane. L25GC reduces event completion time by 50% for several control plane events and improves data packet latency (due to improved control plane communication) by 2× compared to free5GC. In addition, L25GC’s integrated failure resiliency transparently recovers from failures of 5GC software network functions and hardware much faster than 3GPP’s reattach recovery procedure, providing a more robust experience for control and data performance to users.