Skip to main content
eScholarship
Open Access Publications from the University of California

California Policy Lab

White Papers bannerUC Berkeley

Linking Administrative Data: Strategies and Methods

Published Web Location

https://www.capolicylab.org/linking-administrative-data/
The data associated with this publication are available at:
https://github.com/californiapolicylab/data-linking
Abstract

We review the linking of datasets that contain identifying information (e.g., names, birthdates) but not unique common identifiers for each individual. We discuss strategies for identifying matches in three families: rules-based matching, supervised machine learning, and unsupervised machine learning. These vary in the ways that they combine human knowledge with computing power. We define different measures of accuracy and explore the performance of common algorithms in test data.

Our goal is to de-mystify data linking for non-technical readers. We attempt to explain the criteria that should inform the choice of linking methods, and the decisions that need to be made to implement them.

Additional resources, including code and public data referenced on pp. 26-34 is available at: https://github.com/californiapolicylab/data-linking.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View