The fundamental problem that motivates this dissertation is the need for better methods and tools to manage and protect large IP networks. In such networks, it is essential for administrators to profile the traffic generated by different applications (e.g., Web, BitTorrent, FTP) and be able to identify the packets of an application in the wild. This enables administrators to effectively accomplish the following key tasks: (a) Manage the network: It allows different policies to be applied to different applications, e.g., rate limit peer-to-peer (P2P) traffic during busy hours. (b) Protect the network: Profiling malicious traffic requires a strong separation from benign traffic; therefore, knowing the behavior of "good" application provides better separation from malicious activity. Despite some significant efforts to solve the traffic profiling problem, none of the existing methods address all relevant problems. The difficulty of the problem comes from the following three factors: (a) The intentions of application writers and users to hide their traffic using obfuscation (e.g., payload encryption); (b) The limited information about flows and IP-hosts when traffic is monitored at the Internet backbone; and (c) The continuous appearance of new applications as well as undocumented changes to existing network protocols.
In this dissertation, we propose a different way of looking at network traffic that focuses on the network-wide interactions of IP-hosts (as seen at a router). To facilitate the analysis of network-wide interactions, we represent traffic as a graph, where each node is an IP address, and each edge represents a type of interaction between two nodes. We use the term Traffic Dispersion Graph or TDG to refer to such a graph. Intuitively, TDGs capture the "social behavior" of network hosts, which, as we show here, it is hard to obfuscate. For example, a P2P protocol cannot function while trying to hide its overlay network, as maintaining a network overlay is a fundamental behavior of a P2P protocol. This dissertation focuses on three key aspects of network-wide interactions: (a) The graph shapes and structures formed by different applications; (b) The distinctive dynamic network-wide behavior of network application (i.e., how the graphs change over time); and (c) The identification of communities formed by IP-hosts over the Internet. Using the traffic analysis techniques we propose here, we develop novel traffic profiling solutions that are robust to obfuscation and can operate at the backbone, which are both very challenging to address with the current state-of-the-art. To evaluate the effectiveness of our methods, we use real-world traffic traces collected from six different networks. This dissertation presents the first work to explore the full capabilities of TDGs for profiling and analyzing traffic. Based on our results, we believe that TDGs can provide the basis for the next generation of traffic monitoring tools.