- Hadley, Dexter;
- Pan, James;
- El-Sayed, Osama;
- Aljabban, Jihad;
- Aljabban, Imad;
- Azad, Tej D;
- Hadied, Mohamad O;
- Raza, Shuaib;
- Rayikanti, Benjamin Abhishek;
- Chen, Bin;
- Paik, Hyojung;
- Aran, Dvir;
- Spatz, Jordan;
- Himmelstein, Daniel;
- Panahiazar, Maryam;
- Bhattacharya, Sanchita;
- Sirota, Marina;
- Musen, Mark A;
- Butte, Atul J
The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.