In medicine, retrospective cohort studies are used to compare treatments to one another. We hypothesize that the outcomes of retrospective comparative effectiveness research studies can be heavily influenced by biostatistical analytic choices, thereby leading to inconsistent conclusions. We selected a clinical scenario currently under investigation: survival in metastatic prostate, breast or lung cancer after systemic vs systemic + definitive local therapy. We ran >300 000 regression models (each representing a publishable study). Each model had various forms of analytic choices (to account for bias): propensity score matching, left truncation adjustment, landmark analysis and covariate combinations. There were 72 549 lung, 14 904 prostate and 13 857 breast cancer patients included. In the most basic analysis, which omitted propensity score matching, left truncation adjustment and landmark analysis, all of the HRs were <1 (generally, 0.60-0.95, favoring addition of local therapy), with all P-values <.001. Left truncation adjustment landmark analysis produced results with nonsignificant P-values. The combination of propensity score matching, left truncation adjustment, landmark analysis and covariate combinations generally produced P-values that were >.05 and/or HRs that were >1 (favoring systemic therapy alone). The use of more statistical methods to reduce the selection bias caused reported HR ranges to approach 1.0. By varying analytic choices in comparative effectiveness research, we generated contrary outcomes. Our results suggest that some retrospective observational studies may find a treatment improves outcomes for patients, while another similar study may find it does not, simply based on analytical choices.