Understanding the structure of chemical compounds and nanoscale materials is critical for materials chemistry discovery. In the context of high-throughput technology, automatic chemical synthesis and advanced robotic characterization techniques have been applied to many systems. In contrast, there have been very few explorations to improve and accelerate the process of understanding characterized data, which becomes the bottleneck of next generation materials chemistry discovery. In this dissertation, I combine the experimental and data driven approaches for characterization techniques to perform chemistry-structure relationship understanding, data analysis and management, structure identification, computational prediction and facility optimization. These developments aim to accelerate the characterization processes of materials chemistry systems.
The first part of this dissertation describes the motivation and necessary background. In the second part of this dissertation, I demonstrate a framework for characterizing materials chemistry systems by combining experimental methods and data driven approaches, using X-ray scattering as the characterization technique. There are three aspects to consider within this framework: the experimental methods, the automatic data categorization workflow, and application of machine learning models. Experimental characterization is an important and essential part in this framework. In chapter 3, I study the chemistry-structure-property relationship of a supramolecular system using X-ray scattering. In addition, this chapter describes the experimental data collection and conventional data interpretation process. However, the conventional method is not compatible with high-throughput materials chemistry discovery. To address this bottleneck, in chapter 4, I build a large-scale database and propose a machine learning-based hierarchical method for X-ray scattering data categorization toward high-throughput data analysis. Using the data in chapter 3 as an example, I demonstrate that this method can be potentially utilized in materials chemistry discovery. In many cases, labeling experimental X-ray scattering data requires extensive human input. In chapter 5, I simulate millions of X-ray scattering data to train machine learning models. With this high-quality large-scale dataset, I analyze the performance of machine learning model under different physical parameters and provide the interpretations of the prediction results.
The third part of this dissertation extends the data driven approaches to other characterization problems in materials chemistry. While the X-ray scattering technique is very powerful, it might not be sufficient to fully characterize all materials chemistry systems due to challenges such as low sensitivity to hydrogen and beam source instability. Nuclear Magnetic Resonance (NMR) is a complementary technique, in that its elemental sensitivities are very different, with better resolution for hydrogen in particular. In chapter 6, I use a deep learning method to predict chemical shifts in NMR crystallography. In comparison to the state-of-art DFT method, the deep learning method is significantly faster for large systems. Moreover, the prediction errors are lower than reported kernel ridge regression method. To improve source stability and characterization data quality, in chapter 7, I demonstrate a model-independent characterization facility optimization method using machine learning. The beam size variance is reduced using the neural network based feed-forward method.