A sample selection strategy to boost the statistical power of signature detection in cancer expression profile studies.
In case-control profiling studies, increasing the sample size does not always improve statistical power because the variance may also be increased if samples are highly heterogeneous. For instance, tumor samples used for gene expression assay are often heterogeneous in terms of tissue composition or mechanism of progression, or both; however, such variation is rarely taken into account in expression profiles analysis. We use a prostate cancer prognosis study as an example to demonstrate that solely recruiting more patient samples may not increase power for biomarker detection at all. In response to the heterogeneity due to mixed tissue, we developed a sample selection strategy termed Stepwise Enrichment by which samples are systematically culled based on tumor content and analyzed with t-test to determine an optimal threshold for tissue percentage. The selected tissue-percentage threshold identified the most significant data by balancing the sample size and the sample homogeneity; therefore, the power is substantially increased for identifying the prognostic biomarkers in prostate tumor epithelium cells as well as in prostate stroma cells. This strategy can be generally applied to profiling studies where the level of sample heterogeneity can be measured or estimated.