Spamscatter: Characterizing Internet Scam Hosting Infrastructure
Skip to main content
Open Access Publications from the University of California

Spamscatter: Characterizing Internet Scam Hosting Infrastructure


Few Internet security issues have attained the universal public recognition or contempt of unsolicited bulk email -- SPAM. The engine that drives this enormous activity is not spam itself -- which is simply a means to an end -- but the various money-making ``scams'' (legal or illegal) that extract value from Internet users. In this paper, we focus on the Internet infrastructure used to host and support such scams. Unlike mail-relays or bots, scam infrastructure is directly implicated in the spam profit cycle and thus considerably rarer and more valuable. Our goal is to measure and analyze this scam infrastructure to better understand the dynamics and business pressures exerted on spammers. To identify scam infrastructure, we employ an opportunistic technique called spamscatter. The underlying principal is that each scam is, by necessity, identified in the link structure of associated spams. To this end, we have built a system that mines email, identifies URLs in real time and follows such links to their eventual destination server. We further identify individual scams by clustering scam servers whose rendered Web pages are graphically similar using a technique called image shingling. Using the spamscatter technique on a large real-time spam feed (roughly 150,000 per day) we identify and analyze over 2,000 distinct scams hosted across more than 7,000 distinct servers.

Pre-2018 CSE ID: CS2007-0887

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View