Using AI/Machine Learning to Extract Data from Japanese American Confinement Records
Published Web Location
https://doi.org/10.48448/c1rq-qf28Abstract
As part of a Japanese American Confinement Sites (JACS) grant-funded project supported by the National Park Service, UC Berkeley’s Bancroft Library is leading a project to digitize nearly 210,000 pages of War Relocation Authority (WRA) Form 26 individual records of Japanese Americans incarcerated during WWII. During the war, the WRA used this two-page census-type document to collect a wide array of sociological, demographic, and biographical data about the incarcerated population. This data was coded to punch cards by the WRA and deposited at the National Archives and Records Administration (NARA) and the Bancroft Library at the end of the war. The Bancroft Library was involved in transferring the punch card data onto magnetic tape in the 1960s. The resulting data file was used in 1988 to award reparations to Japanese Americans before being transferred to the National Archives, where it is now available online as the Japanese American Internee Data File. The existing data file contains gaps, errors, and inaccuracies, is missing many of the original data fields found in the original forms, and does not represent a comparable level of detail and granularity found in the paper records.
The Bancroft Library is believed to hold the only remaining complete set of over 110,000 Form 26 records, organized by camp, in existence. By leveraging partnerships within and beyond the UC Berkeley community, we are using machine learning to extract data from the digitized forms in order to create a new, more complete and accurate dataset that will serve as a significant resource for survivors and their families, and we are working with community members to determine what constitutes ethical access to these records. In alignment with collections as data principles that encourage computational use of special collections, this project represents a crucial opportunity to explore new methods for enhancing computational access to our vast and growing digital special collections at scale.