Life Cycle Assessment (LCA) is a tool that can be used to assess the impacts of chemicals over the entire life cycle. As the large number of new chemicals being invented every day, the costs and time needed to collect necessary data for LCA studies pose a challenge to LCA practitioners, as the speed of LCA studies cannot keep up with the speed of new chemical development. In practice, therefore, LCAs are conducted in the presence of data gaps and proxy values, limiting the relevance and quality of the results. As the techniques of machine learning evolves, a new opportunity to improve on data deficiencies and on the quality of LCA emerged. This dissertation is an attempt to harness the power of machine learning techniques to address the data deficiencies in LCA. It consists of four chapters: (1) Introduction. (2) Rapid life-cycle impact screening for decision-support using artificial neural networks. (3) Species Sensitivity Distributions Derived for a Large Number of Chemicals Using Artificial Neural Networks. IV. (4) Reducing the Uncertainty of the Characterization Factors in USEtox by Machine Learning – A Case Study for Aquatic Ecotoxicity. Each chapter is elaborated briefly below.
The first chapter is the general introduction. The second chapter aims to demonstrate the method of estimating the characterized results using Artificial Neural Networks (ANNs). Due to the lack of necessary data, very limited amount of characterized results for organic chemicals exist. In this chapter, I developed ANNs to estimate the characterized results of chemicals. Using molecular structure information as an input, I trained multilayer ANNs for the characterized results of chemicals on six impact categories: (1) global warming. (2) acidification. (3) cumulative energy demand. (4) human health. (5) ecosystem quality. (6) eco-indicator 99. The application domain (AD) of the model was estimated for each impact category within which the model exhibits higher reliability. As a result, the ANN models for acidification, human health, and eco-indicator 99 showed relatively higher performances with R2 values of 0.73, 0.71, and 0.87, respectively. This chapter indicates that ANN models can serve as an initial screening tool for estimating life-cycle impacts of chemicals for certain impact categories in the absence of more reliable information.
The second chapter aims to estimate the ecotoxicological impact of chemicals using machine learning models. In chemical impact assessment, the overall ecotoxicological impact of a chemical to ecosystem, also known as the Effect Factor (EFs), is derived from the toxicity to multiple species through Species Sensitivity Distribution (SSDs). In the third chapter, I turned to estimate the chemical toxicities to several aquatic species with machine learning models, and then use them to build SSD, and to estimate the EF of organic chemicals. Over 2,000 experimental toxicity data were collected for 8 aquatic species from 20 sources, and an ANN model for each of the species was trained to estimate the Lethal Concentration (LC50) based on molecular structure. The 8 ANN models showed R2 scores of 0.54 to 0.75 (average 0.67, medium 0.69) on testing data. The toxicity values predicted by the ANN models were then used to fit SSDs using bootstrapping method. At the end, the models were applied to generate SSDs for 8,424 chemicals in the ToX21 database.
The last chapter of this dissertation aims to reduce the uncertainty of an existing chemical fate model using machine learning techniques. Fate Factor (FF), which accounts the persistence of chemicals in environmental compartments, is an intermediate input in to calculate the characterized results of life cycle impact assessment. The most widely used tool to calculate chemical FFs: USEtox, requires several chemical properties as inputs, including: octanol-water partitioning coefficient (Kow) and vapor pressure at 25 ℃ (Pvap25). When those chemical properties are missing, USEtox provides proxy methods to estimate them. In the fourth chapter, I seek to answer the question that whether replacing the current proxy methods with machine learning models are always improving the accuracy of FFs. The contribution of each chemical property to the FFs was evaluated. And ANN-based predictive models were developed to predict these chemical properties. The uncertainty of the current proxy methods in the USEtox’s FF model and the newly developed ANN models were compared. New FFs for the chemicals in the ToX21 database were calculated using the best predictive model when experimental properties were unknown. The EFs generated by the models in the second chapter were estimated. Lastly, more than 300 new CFs with good prediction confidence for the organic chemicals in the ToX21 database were calculated. These CFs are new to the field of LCA and can be used to reduce the uncertainty of LCA studies when the measured data isn’t available.