Diagnostic disagreements among pathologists occur throughout the spectrum of benign to malignant lesions. A computer-aided diagnostic system capable of reducing uncertainties would have important clinical impact. To develop a computer-aided diagnosis method for classifying breast biopsy images into a range of diagnostic categories (benign, atypia, ductal carcinoma in situ, and invasive breast cancer), we introduce a transformer-based hollistic attention network called HATNet. Unlike state-of-the-art histopathological image classification systems that use a two pronged approach, i.e., they first learn local representations using a multi-instance learning framework and then combine these local representations to produce image-level decisions, HATNet streamlines the histopathological image classification pipeline and shows how to learn representations from gigapixel size images end-to-end. HATNet extends the bag-of-words approach and uses self-attention to encode global information, allowing it to learn representations from clinically relevant tissue structures without any explicit supervision. It outperforms the previous best network Y-Net, which uses supervision in the form of tissue-level segmentation masks, by 8%. Importantly, our analysis reveals that HATNet learns representations from clinically relevant structures, and it matches the classification accuracy of 87 U.S. pathologists for this challenging test set.