BACKGROUND:We characterized the whole transcriptome of circulating tumor cells (CTCs) in stage II-III breast cancer to evaluate correlations with primary tumor biology. METHODS:CTCs were isolated from peripheral blood (PB) via immunomagnetic enrichment followed by fluorescence-activated cell sorting (IE/FACS). CTCs, PB, and fresh tumors were profiled using RNA-seq. Formalin-fixed, paraffin-embedded (FFPE) tumors were subjected to RNA-seq and NanoString PAM50 assays with risk of recurrence (ROR) scores. RESULTS:CTCs were detected in 29/33 (88%) patients. We selected 21 cases to attempt RNA-seq (median number of CTCs = 9). Sixteen CTC samples yielded results that passed quality-control metrics, and these samples had a median of 4,311,255 uniquely mapped reads (less than PB or tumors). Intrinsic subtype predicted by comparing estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) versus PAM50 for FFPE tumors was 85% concordant. However, CTC RNA-seq subtype assessed by the PAM50 classification genes was highly discordant, both with the subtype predicted by ER/PR/HER2 and by PAM50 tumors. Two patients died of metastatic disease, both of whom had high ROR scores and high CTC counts. We identified significant genes, canonical pathways, upstream regulators, and molecular interaction networks comparing CTCs by various clinical factors. We also identified a 75-gene signature with highest expression in CTCs and tumors taken together that was prognostic in The Cancer Genome Atlas and Molecular Taxonomy of Breast Cancer International Consortium datasets. CONCLUSION:It is feasible to use RNA-seq of CTCs in non-metastatic patients to discover novel tumor biology characteristics.