Computational methods that enable calculations of thermodynamic properties emergent from the specific arrangements of distinct atomic species in materials have become indispensable in the advent of flourishing research into materials with increasingly large numbers of components. Calculations involving lattice models fitted to the energies of a set of representative atomic configurations of a material---predominantly by way of the cluster expansion method---are now standard and commonly used by researchers. As researchers explore materials with growing numbers of components, continued development of cluster expansion-based methodology has ensued. Although substantial progress has been made, the vast majority of developments have focused solely on statistical regression methodology to fit expansions using the original underlying mathematical formalism largely unchanged.
In this thesis, we revisit the mathematical framework underlying the cluster expansion method and re-establish it in a more general form as a representation for generalized lattice Hamiltonians of atomic configuration. In doing so, we present two categories of representation that are found to be direct generalizations of the Ising and Potts models respectively. We rigorously define Fourier cluster expansions---those used in the original formalism of the cluster expansion method---and present some of their useful mathematical properties. We then show how, regardless of the particular choice of basis, Fourier cluster expansions are essentially expressions of a unique cluster decomposition. The intimate relation between the cluster decomposition and well-established function decompositions used in statistics establishes an avenue to a formal interpretation of expansion terms as the mean of statistically independent atomic interactions. The second representation, which we have named the generalized Potts frame, involves a redundant representation by way of a mathematical frame. By constructing a representation that is over-complete (more functions than dimensions) additional robustness and expressiveness when estimating coefficients are obtained. We illustrate the capability of the Potts frame representation to fit the most accurate Hamiltonian for a system, which to the best of our knowledge, represents the largest configuration space attempted to date. We also describe general and practical ways to implement the aforementioned representations of lattice Hamiltonian and methods to carry out calculations with improved time complexity relative to other available cluster expansion implementations.
The formal structure of Fourier cluster expansions and Potts frame expansions are then used to motivate and develop novel structured-sparsity-based linear regression methods that allow robust parametrization of generalized lattice models from first principles electronic structure calculations. The methods developed rely on establishing structural priors on the expansion coefficients---some of which have previously been based on heuristics---which we motivate and justify with more rigorous mathematical and statistical arguments. The regression methods were developed with the goal of enabling accurate estimation of expansion coefficients in high dimensional configuration spaces using relatively small samples of training structures. We describe a series of practical implementations and auxiliary methods necessary for the practical implementation and learning of applied lattice models of complex multi-component materials. Finally, we demonstrate the successful application of the methodology developed to learn lattice models of several Li transition metal oxides and medium entropy alloys that have garnered considerable attention from researchers due to their remarkable and technologically relevant properties.
The thesis is concluded by suggesting avenues for continued development of lattice-based methods geared toward studying order and partial disordered in inorganic multi-component materials. A general commentary on the suite of lattice-based methodology in the context of the rapidly growing development of machine learning potentials is given.