Skip to main content
Open Access Publications from the University of California

California Policy Lab

White Papers bannerUC Berkeley
Cover page of A Roadmap for Linking Administrative Data in California

A Roadmap for Linking Administrative Data in California


California needs a centralized authority for linking the state’s administrative data. Legislators are focusing on new datasets and data systems, which is a step in the right direction. But what the state truly needs is a new office with a clear mandate to link the state’s core data assets, a clear set of tools for doing so, and governance that ensures data are used to inform program improvement. Think of it as the state’s Census Bureau – or “Statistics  California.”

We propose here a roadmap toward that goal: (1) create a new, independent office with the mandate and expertise to link data across siloes, (2) sequence the linkage process by starting with education and expanding outward, and (3) establish streamlined governance that makes data available to improve state policies and programs.

This work has been supported, in part, by the University of California Multicampus Research Programs and Initiatives grant MRP-19-600774.

Cover page of Linking Administrative Data: Strategies and Methods

Linking Administrative Data: Strategies and Methods


We review the linking of datasets that contain identifying information (e.g., names, birthdates) but not unique common identifiers for each individual. We discuss strategies for identifying matches in three families: rules-based matching, supervised machine learning, and unsupervised machine learning. These vary in the ways that they combine human knowledge with computing power. We define different measures of accuracy and explore the performance of common algorithms in test data.

Our goal is to de-mystify data linking for non-technical readers. We attempt to explain the criteria that should inform the choice of linking methods, and the decisions that need to be made to implement them.

Additional resources, including code and public data referenced on pp. 26-34 is available at: