Skip to main content
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Recent Work bannerUC Berkeley

Data Mining and Internet Profiling: Emerging Regulatory and Technological Approaches


The 9/11 terrorists, before their deadly attacks, sought invisibility through integration into the society they hoped to destroy. In a similar fashion, the terrorists who carried out subsequent attacks in Madrid and London attempted to blend into their host lands. This strategy has forced governments, including the United States, to rethink counter-terrorism strategies and tools.

One of the current favored strategies involves data mining. In its pattern-based variant, data mining searches select individuals for scrutiny by analyzing large data sets for suspicious data linkages and pat-terns. Because terrorists do not “stand out,” intelligence and law enforcement agents want to do more than rely exclusively on investigations of known suspects. The new goal is to search “based on the premise that the planning of terrorist activity creates a pattern or ‘sig-nature’ that can be found in the ocean of transaction data created in the course of everyday life.” Accordingly, to identify and preempt terrorist activity, intelligence agencies have begun collecting, retaining, and analyzing voluminous and largely banal transactional information about the daily activities of hundreds of millions of people.

Private organizations have their own reasons for gathering wide-spread information about individuals. With the expansion of internet-based services, companies can track and document a broad range of people’s online activities and can develop comprehensive profiles of these people. Advertisers and marketing firms likewise have strong incentives to identify and reach internet users whose profiles have certain demographic, purchasing behavior, or other characteristics. The construction, storage, and mining of these digital dossiers by inter-net companies pose privacy risks. Additional privacy issues arise when the government obtains this information, which it currently can with-out much legal process.

This essay begins by examining governmental data mining; its particular focus is on pattern-based searches of databases according to a model of linkages and data patterns that are thought to indicate suspicious behavior. In Part I, this essay reviews widely held views about the necessary safeguards for the use of data mining. In Part II, this essay considers “dataveillance” by private corporations and how they have compiled rich collections of information gathered online in the absence of a robust legal framework that might help preserve online privacy.

This essay then discusses some of the techniques that individuals can employ to mask their online activity as well as existing and emerging technological approaches to preventing the private sector or government from linking their personal information and tracing their activities. These technologies permit users to move about the world wide web pseudonymously and to adopt privacy-enhancing identity management systems. This essay concludes by briefly considering three topics: (1) whether and how to regulate the potential impact of identity management systems on counterterrorism efforts; (2) the requirements of transparency and understanding of the underlying models used in either data mining or identity management systems as a necessary prelude to the creation of rules on appropriate access and use; and (3) the need for research in several further areas.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View