Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Previously Published Works bannerUC San Diego

Insights from Adopting a Data Commons Approach for Large-scale Observational Cohort Studies: The California Teachers Study

Abstract

Background

Large-scale cancer epidemiology cohorts (CEC) have successfully collected, analyzed, and shared patient-reported data for years. CECs increasingly need to make their data more findable, accessible, interoperable, and reusable, or FAIR. How CECs should approach this transformation is unclear.

Methods

The California Teachers Study (CTS) is an observational CEC of 133,477 participants followed since 1995-1996. In 2014, we began updating our data storage, management, analysis, and sharing strategy. With the San Diego Supercomputer Center, we deployed a new infrastructure based on a data warehouse to integrate and manage data and a secure and shared workspace with documentation, software, and analytic tools that facilitate collaboration and accelerate analyses.

Results

Our new CTS infrastructure includes a data warehouse and data marts, which are focused subsets from the data warehouse designed for efficiency. The secure CTS workspace utilizes a remote desktop service that operates within a Health Insurance Portability and Accountability Act (HIPAA)- and Federal Information Security Management Act (FISMA)-compliant platform. Our infrastructure offers broad access to CTS data, includes statistical analysis and data visualization software and tools, flexibly manages other key data activities (e.g., cleaning, updates, and data sharing), and will continue to evolve to advance FAIR principles.

Conclusions

Our scalable infrastructure provides the security, authorization, data model, metadata, and analytic tools needed to manage, share, and analyze CTS data in ways that are consistent with the NCI's Cancer Research Data Commons Framework.

Impact

The CTS's implementation of new infrastructure in an ongoing CEC demonstrates how population sciences can explore and embrace new cloud-based and analytics infrastructure to accelerate cancer research and translation.See all articles in this CEBP Focus section, "Modernizing Population Science."

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View