Inertial attitude estimation is a crucial component of many modern systems and applications. Attitude estimation from commercial-grade inertial sensors has been the subject of an abundance of research in recent years due to the proliferation of Inertial Measurement Units (IMUs) in mobile devices, such as the smartphone. Traditional methodologies involve probabilistic, iterative-state estimation; however, these approaches do not generalise well over changing motion dynamics and environmental conditions, as they require context-specific parameter tuning. In this work, we explore novel methods for attitude estimation from low-cost inertial sensors using a self-attention-based neural network, the Attformer. This paper proposes to part ways from the traditional cycle of continuous integration algorithms, and formulate it as an optimisation problem. This approach separates itself by leveraging attention operations to learn the complex patterns and dynamics associated with inertial data, allowing for the linear complexity in the dimension of the feature vector to account for these patterns. Additionally, we look at combining traditional state-of-the-art approaches with our self-attention method. These models were evaluated on entirely unseen sequences, over a range of different activities, users and devices, and compared with a recent alternate deep learning approach, the unscented Kalman filter and the iOS CoreMotion API. The inbuilt iOS had a mean angular distance from the true attitude of 117.31∘, the GRU 21.90∘, the UKF 16.38∘, the Attformer 16.28∘ and, finally, the UKF-Attformer had mean angular distance of 10.86∘. We show that this plug-and-play solution outperforms previous approaches and generalises well across different users, devices and activities.