In chemistry, the procurement of data for the training of machine learning models is often an intractable task. The reliance of data generated via experimental routes is two-fold problematic. First, it requires a vast amount of menial labor that will result in large amounts of chemical waste and is thus wasteful of resources and human capital, and second, the establishment of a truly exhaustive dataset in practical terms is highly limited by the accessibility of affordable, diverse experimental systems. Even if data is gathered from the literature, the lack of reported negative results can result in a concerning level of bias in experimentally driven studies. While experimental data is highly valuable as it provides real-world behaviors that are difficult to capture in idealized simulations, these challenges remain and thus simulation and computational approaches to data generation become necessary for the practical evaluation of many important chemical systems.
Given these considerations, I established a workflow in my graduate career for the study of electrochemical systems that are difficult to investigate via standard experimental or computational approaches. Via the marriage of fundamental electrochemistry, automation software, COMSOL finite element simulations, and deep-learning, I have investigated data intensive questions related to wire electrodes and cyclic voltammetry.
In my first research project (Chapter 2), I investigated wire electrodes for nitrogen fixation by leveraging models to investigate the influence of wire arrays on the competition between the nitrogen reduction reaction (NRR) and the hydrogen evolution reaction (HER). The challenge with the evaluation of a morphology’s influence on an electrode’s outputs of interest (current density, A/m2 and Faradaic efficiency, %) is that bulk analysis via fabrication is extremely time and resource intensive, while also requiring the investigation of large swaths of morphological parameter space (periodicity, P, length, L, and diameter, D) that result in extremely minor differences in electrode output. This concern is exacerbated by the issues of reproducibility and the introduction of multiple materials, where morphological parameters may have a unique influence on each material. While these issues are inherent of laboratory approaches, they can be mitigated by the establishment of a computationally generated dataset that can be used to cheaply explore large regions of the parameter space that constitutes both morphological and kinetic parameters of materials. My approach enabled the investigation of nitrogen fixation in acidic, ambient condition, wire electrodes in general, establishing a putative limit for wire electrode performance with Faradaic efficiencies of approximately 90% and current densities of approximately –2mA/cm2 (under the assumptions of the model). Further, this data-intensive approach provided insights into electrode development – demonstrating the morphology is sometimes a weak predictor of output, while at other times a strong predictor. Further, an importance analysis of the trained deep learning model indicated that the second step of the associative NRR pathway is likely limiting in many cases, providing fundamental insight towards the rational development of transport restrictive NRR electrodes going forward.
Following this approach to the study of wire electrodes, I investigated deep learning’s applicability to the custom design of reactant gradients in oxygen reducing electrodes (Chapter 3). In nature, reactant concentration heterogeneity is ubiquitous (e.g., oxygen availability in biofilms). However, the study of systems with a desired diffusion gradient is challenging as reproducible direct control over reactant availability is non-trivial. One solution to this challenge is electrochemically generated concentration gradients using wire electrodes, where applied potentials result in depletion of redox active species along the z-axis (perpendicular to base) of the electrode. While the technology exists for the establishment of these gradients, the selection of a specific gradient was still an open problem. To address this need, I developed models to predict oxygen concentration gradients (assuming a platinum electrode) and hydrogen peroxide gradients (assuming a gold electrode) to permit the inverse design of biologically relevant reactant gradients for subsequent study of microbial systems.
In a pivot to an investigation of a fundamental problem in electrochemistry, I began the study of deep-learning approaches for mechanism assignment in cyclic voltammetry (Chapter 4). Mechanism assignment is fundamental to the understanding of an electrochemical system – where further analyses of the chemistry is dependent on correct deduction of the number, order, and type of electrochemical (E steps) and chemical (C steps). Despite the fundamental and foundational importance of this assignment, experimentalists still perform mechanism deduction using qualitative and semi-quantitative means. This is problematic due to diversity in experience level, biased approaches to assignment, and the inherent sluggishness of manual inspection. Because of these concerns, I applied computer vision techniques to the elucidation of electrochemical mechanisms from cyclic voltammetry data. Beginning with the E, EC, CE, DISP1, and ECE mechanisms, I developed a framework to simulate these mechanisms while ensuring a diverse representation of the possible CV traces for each mechanism, with care taken to ensure their separability (i.e., a poorly constrained EC framework would be indistinguishable from E). Upon implementation of COMSOL models and MATLAB/Python scripts for each, a training set was generated and used to train a ResNet–18 neural network, which achieved accuracies of 98.5%. Further, the classification probabilities inherent of deep-learning classification tasks have proven valuable for the nascent development of automated, deep-learning driven artificial intelligence platforms that seek to reduce ambiguity in classification probabilities by tuning experimental parameters.
In a follow up to this research, I have investigated an object detection approach to enable the analysis of far more complex, multi-event CVs (Chapter 5). ResNet–18 is a powerful feature extraction tool – hence its success as an image/matrix data classifier – so it is amenable to object detection techniques that rely on the extraction of regions of data that have features correlating to several objects. In my case, those objects are the constituent electrochemical mechanisms that combine to give an overall complex CV trace. The implementation of a custom Faster R–CNN framework with one dimensional region proposals (as compared to the typical 2D) has achieved F1 scores up to 0.932 when classifying and regressing CVs with contributions from up to eight possible mechanisms: E, ECa, ECb, ECE, DISP1, T, SR, and EC’.
In the final chapter (Chapter 6), I will discuss my work developing software for the purposes of improving course evaluation practices in the UCLA physical sciences. Therein I developed software that digests and recapitulates student course evaluations as quantitative, summative reports that allow for quicker analysis of course strengths and weaknesses. This tool provided evidence of the power of sentiment analysis, a form of deep learning related to correlation of text data with implied positivity (e.g., how good/effective course practices were).
During my graduate career, I have shown deep learning’s applications in studying problems in electrochemistry that were otherwise intractable to explore. This combination of chemistry with deep learning is not only valuable for the generation of useful tools and the designing of systems, but it can also provide fundamental insights and allow for technologies that will accelerate the advancement of chemistry in a world where automation and data analysis is driving advancements in all fields.