Skip to main content
Open Access Publications from the University of California


UC San Francisco Previously Published Works bannerUCSF

Precision annotation of digital samples in NCBI's gene expression omnibus.

  • Author(s): Hadley, Dexter
  • Pan, James
  • El-Sayed, Osama
  • Aljabban, Jihad
  • Aljabban, Imad
  • Azad, Tej D
  • Hadied, Mohamad O
  • Raza, Shuaib
  • Rayikanti, Benjamin Abhishek
  • Chen, Bin
  • Paik, Hyojung
  • Aran, Dvir
  • Spatz, Jordan
  • Himmelstein, Daniel
  • Panahiazar, Maryam
  • Bhattacharya, Sanchita
  • Sirota, Marina
  • Musen, Mark A
  • Butte, Atul J
  • et al.

The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application ( to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
Current View