Leveraging AI to Improve Institutional Clinical Trial Data Integrity and Clinicaltrials.Gov Submission
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Leveraging AI to Improve Institutional Clinical Trial Data Integrity and Clinicaltrials.Gov Submission

No data is associated with this publication.
Abstract

Background: Data reported from clinical trials plays a significant role in providing resources for developing evidence-based medicine, and robust and accurate study data must be accessible, reproducible and transparently available to maintain accountability with the public. Discrepancies in how clinical trial data are reflected in national resources such as Clinicaltrials.gov compromise the credibility and reliability of studies conducted under the auspices of a clinical research institution. To improve the process that supports scientific integrity, public trust and adherence to NIH requirements for systematic and timely submission of federally funded clinical trial reporting, we sought to leverage LLMs to complement the manual processes in this extraction and validation of IRB-approved study protocols. We aimed to analyze the data work involved in protocol registration with ClinicalTrials.gov using a new system. We also aimed to assess the accuracy of the system's response against approved data formats and to suggest ways to enhance the quality and consistency of clinical trial registration processes.Methods: Matched data collection from IRB-approved protocols from irbnet.org and corresponding registrations on ClinicalTrials.gov were acquired. Following a scrum-inspired methodology, we designed, developed, and tested a pilot analytic architecture that extracts data summary elements from raw IRB protocol documents and generates responses approximating the Protocol Registration and Results System (PRS) format of ClinicalTrials.gov. Results: Our architecture successfully extracted and formatted the data elements from IRB-approved protocols to meet the specific requirements of the ClinicalTrials.gov registration system. Conclusion: Our architecture with large language models is capable of streamlining data and has the potential to enhance operational efficiency, reduce documentation time, and ensure regulatory compliance. Additionally, it can address persistent issues related to data quality, completeness, and consistency in registering results on ClinicalTrials.gov. Accurate study results will serve as a valuable resource for patient care delivery.

Main Content

This item is under embargo until April 14, 2025.