Skip to main content
Open Access Publications from the University of California


UC San Francisco Previously Published Works bannerUCSF

Use of Retrospective Data for Comparative Effectiveness Research Yields Mixed Outcomes and Should be Avoided



In oncology, retrospective cohort studies are often used for comparative effectiveness research, studies that compare the efficacy of treatment A vs B. We examine the stability of these estimates using biostatistical methods for bias correction with varying sets of covariates. We hypothesize that retrospective comparative effectiveness research studies are sensitive to biostatistical analytic choices; by varying the methods, there will be significant instability and lack of consistency in conclusions.


We evaluated three disease sites in oncology where the addition of local therapy over systemic therapy alone has been hypothesized to improve survival in the metastatic setting: lung, prostate, and female breast, using multivariable Cox regression analyses. Patient data were extracted from the National Cancer Database, 2004-2014. We employed various statistical techniques to adjust for selection bias and immortal time bias, including propensity score matching, left truncation adjustment, and landmark analysis. Further, we used combinations of covariates in regression models to generate hazard ratios (HRs) with 95% confidence intervals. We constructed plots of -log10(P-value) vs HR to quantify the variability of estimates.


There were 72,549 lung, 14,904 prostate, and 13,857 female breast cancer patients included. We ran > 300,000 regression models, where each model represents a publishable study. Without propensity score matching or immortal time bias adjustment, all multivariable models provided HRs that favored the addition of local therapy for all cancers, with HRs < 1, and all P-values < 0.001. Once propensity score matching was added to our analysis, higher HRs were observed, but most were still < 1. When landmark analysis and covariate combinations were used, we generated HRs that were < 1, equal to 1, and > 1, with 100-fold differences in -log10(P-values).


By altering the biostatistical approach with varying combinations of covariates, we were able to generate contrary outcomes and statistical significance. Our results suggest that some retrospective observational studies may find a treatment helps, and another may find it does not, simply based on analytic choices. This paradox highlights the importance of randomized controlled trials, and may explain the discordance noted in prior studies comparing observational trials and randomized studies.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View