BACKGROUND: Given the large number of genes purported to be prognostic for breast cancer, it would be optimal if the genes identified are not confounded by the continuously changing systemic therapies. The aim of this study was to discover and validate a breast cancer prognostic expression signature for distant metastasis in untreated, early stage, lymph node-negative (N-) estrogen receptor-positive (ER+) patients with extensive follow-up times. METHODS: 197 genes previously associated with metastasis and ER status were profiled from 142 untreated breast cancer subjects. A "metastasis score" (MS) representing fourteen differentially expressed genes was developed and evaluated for its association with distant-metastasis-free survival (DMFS). Categorical risk classification was established from the continuous MS and further evaluated on an independent set of 279 untreated subjects. A third set of 45 subjects was tested to determine the prognostic performance of the MS in tamoxifen-treated women. RESULTS: A 14-gene signature was found to be significantly associated (p < 0.05) with distant metastasis in a training set and subsequently in an independent validation set. In the validation set, the hazard ratios (HR) of the high risk compared to low risk groups were 4.02 (95% CI 1.91-8.44) for the endpoint of DMFS and 1.97 (95% CI 1.28 to 3.04) for overall survival after adjustment for age, tumor size and grade. The low and high MS risk groups had 10-year estimates (95% CI) of 96% (90-99%) and 72% (64-78%) respectively, for DMFS and 91% (84-95%) and 68% (61-75%), respectively for overall survival. Performance characteristics of the signature in the two sets were similar. Ki-67 labeling index (LI) was predictive for recurrent disease in the training set, but lost significance after adjustment for the expression signature. In a study of tamoxifen-treated patients, the HR for DMFS in high compared to low risk groups was 3.61 (95% CI 0.86-15.14). CONCLUSION: The 14-gene signature is significantly associated with risk of distant metastasis. The signature has a predominance of proliferation genes which have prognostic significance above that of Ki-67 LI and may aid in prioritizing future mechanistic studies and therapeutic interventions.