- Chitalia, Rhea;
- Pati, Sarthak;
- Bhalerao, Megh;
- Thakur, Siddhesh Pravin;
- Jahani, Nariman;
- Belenky, Vivian;
- McDonald, Elizabeth S;
- Gibbs, Jessica;
- Newitt, David C;
- Hylton, Nola M;
- Kontos, Despina;
- Bakas, Spyridon
Breast cancer is one of the most pervasive forms of cancer and its inherent intra- and inter-tumor heterogeneity contributes towards its poor prognosis. Multiple studies have reported results from either private institutional data or publicly available datasets. However, current public datasets are limited in terms of having consistency in: a) data quality, b) quality of expert annotation of pathology, and c) availability of baseline results from computational algorithms. To address these limitations, here we propose the enhancement of the I-SPY1 data collection, with uniformly curated data, tumor annotations, and quantitative imaging features. Specifically, the proposed dataset includes a) uniformly processed scans that are harmonized to match intensity and spatial characteristics, facilitating immediate use in computational studies, b) computationally-generated and manually-revised expert annotations of tumor regions, as well as c) a comprehensive set of quantitative imaging (also known as radiomic) features corresponding to the tumor regions. This collection describes our contribution towards repeatable, reproducible, and comparative quantitative studies leading to new predictive, prognostic, and diagnostic assessments.