Skip to main content
eScholarship
Open Access Publications from the University of California

A sample selection strategy to boost the statistical power of signature detection in cancer expression profile studies.

  • Author(s): Jia, Zhenyu
  • Wang, Yipeng
  • Hu, Yuanjie
  • McLaren, Christine
  • Yu, Yingyan
  • Ye, Kai
  • Xia, Xiao-Qin
  • Koziol, James A
  • Lernhardt, Waldemar
  • McClelland, Michael
  • Mercola, Dan
  • et al.
Creative Commons Attribution 4.0 International Public License
Abstract

In case-control profiling studies, increasing the sample size does not always improve statistical power because the variance may also be increased if samples are highly heterogeneous. For instance, tumor samples used for gene expression assay are often heterogeneous in terms of tissue composition or mechanism of progression, or both; however, such variation is rarely taken into account in expression profiles analysis. We use a prostate cancer prognosis study as an example to demonstrate that solely recruiting more patient samples may not increase power for biomarker detection at all. In response to the heterogeneity due to mixed tissue, we developed a sample selection strategy termed Stepwise Enrichment by which samples are systematically culled based on tumor content and analyzed with t-test to determine an optimal threshold for tissue percentage. The selected tissue-percentage threshold identified the most significant data by balancing the sample size and the sample homogeneity; therefore, the power is substantially increased for identifying the prognostic biomarkers in prostate tumor epithelium cells as well as in prostate stroma cells. This strategy can be generally applied to profiling studies where the level of sample heterogeneity can be measured or estimated.

Main Content
Current View