High energy physics, like many other scientific disciplines, has entered an exciting new era of big data, where both particle accelerators at the \emph{energy frontier} and astrophysical surveys at the \emph{cosmic frontier} are producing an enormous amount of data which may hold the very key to the most fundamental questions about nature. Mining such gold inevitably calls for revolutionary designs of ever more powerful and efficient statistical analysis frameworks, while at the same time scientific rigorousness places an additional requirement on the interpretability of any novel model proposed. Among a plethora of available modern machine learning techniques, the theory of \emph{optimal transport} stands out as a distinct approach that is both high performing and mathematically well grounded. By equipping the space of data represented as distributions with a suitable metric, optimal transport replaces \emph{ad hoc} notions of similarity with a well-defined distance, opening up a range of new applications with profound theoretical implications.
This thesis introduces the theory of optimal transport with an eye towards its usage in physics. Special emphasis is put on two particular optimal transport distances which enjoy unique geometric properties. Utilizing their geometric structure, we develop a computationally efficient linearization framework for the two distances and highlight their approximations for discrete distributions encountered in practice. We then showcase the power of this linearized optimal transport framework by applying it to two use cases---one in collider physics at the energy frontier and the other in dark matter astrophysics at the cosmic frontier. As the adoption of optimal transport in high energy physics is still in its early stage, the present thesis invites the readers to think of other potential applications for their own research.