In today's world, more and more people are managing every aspect of their lives over the Internet. As a result, the study of Internet traffic, which is undergoing constant evolution as new technologies emerge, has attracted much attention from the research community. In this dissertation, we present a three-pronged approach to help ISPs and network administrators: a) gain insight about the applications that generate traffic in their networks, b) understand the Web browsing behaviors of their users, and c) detect in a timely fashion when external malicious entities seek to compromise their websites.
The first component of our approach is SubFlow, a Machine Learning-based tool that classifies traffic flows into classes of applications that generate them, for example P2P or Web. The key novelty of SubFlow is its ability to learn the characteristics of the traffic from each application class in isolation while traditional approaches simply try to assign flows to predefined categories. This allows SubFlow to exhibit very high classification accuracy even when new applications emerge.
The second component is ReSurf, a tool to reconstruct users' web-surfing activities from Web traffic. ReSurf enables the separation of users' intentional web-browsing (such as the click user makes) from the traffic automatically generated when the website is rendered. ReSurf, then, can be an effective method to study the browsing behaviors of users and gain insights into the evolution of modern Web traffic, which accounts for about 80% of Internet traffic.
The last component of our approach is Scanner Hunter, an algorithm to detect HTTP Scanners, external entities that selectively probe websites for vulnerabilities that may be exploited in subsequent intrusion attempts. Our algorithm is developed in response to the fact that HTTP scanners have not received much attention despite the high risk and danger they pose. Scanner Hunter utilizes a novel combination of graph-mining approaches to expose the community structure of scanners. Using Scanner Hunter, we conduct the first extensive study of scanners in the wild during a half-year period, which we also provide novel insight on this little-studied emerging phenomenon.