Machine Learning-Assisted Identification and Quantification of Hydroxylated Metabolites of Polychlorinated Biphenyls in Animal Samples

Laboratory studies of the disposition and toxicity of hydroxylated polychlorinated biphenyl (OH-PCB) metabolites are challenging because authentic analytical standards for most unknown OH-PCBs are not available. To assist with the characterization of these OH-PCBs (as methylated derivatives), we developed machine learning-based models with multiple linear regression (MLR) or random forest regression (RFR) to predict the relative retention times (RRT) and MS/MS responses of methoxylated (MeO-)PCBs on a gas chromatograph-tandem mass spectrometry system. The final MLR model estimated the retention times of MeO-PCBs with a mean absolute error of 0.55 min (n = 121). The similarity coefficients cos θ between the predicted (by RFR model) and experimental MS/MS data of MeO-PCBs were >0.95 for 92% of observations (n = 96). The levels of MeO-PCBs quantified with the predicted MS/MS response factors approximated the experimental values within a 2-fold difference for 85% of observations and 3-fold differences for all observations (n = 89). Subsequently, these model predictions were used to assist with the identification of OH-PCB 95 or OH-PCB 28 metabolites in mouse feces or liver by suggesting candidate ranking information for identifying the metabolite isomers. Thus, predicted retention and MS/MS response data can assist in identifying unknown OH-PCBs.


Molecular descriptors S5
Candidate ranking algorithm with the predicted and measured RRTs and MS/MS data of MeO-PCBs

S6
Animal experiments S8 Extraction of the hydroxylated PCBs from samples collected in animal studies S10 Table S1. List of methoxylated PCBs (MeO-PCBs) used for model training and external testing and their abbreviations and SMILES structures S12 Table S2. List of the optimal predictors and their linear coefficients and p-values that were obtained in the multiple linear regression (MLR) model development to predict the relative retention time (RRT) of MeO-PCBs S16 Table S3. List of the optimal predictors and parameters obtained in the random forest regression (RFR) model development to predict MS/MS data (expressed as relative levels of five MS transitions) of MeO-PCBs   Initially, RRT scores ( ) were calculated for all possible MeO-PCB candidates of a specific molecular weight. We used the reciprocal of the absolute difference between the measured S7 ( ) and predicted RRT ( ) to score potential candidates (i.e., the smaller the difference, the higher the score). The initial score was then normalized with the maximal ( , ) and the median ( , ) score across all candidates (i.e., = ).
In addition, we calculated the MS/MS scores ( ) for all possible MeO-PCB candidates. We used the similarity coefficient (cos ) between the measured and the predicted MS/MS data as an initial score to rank the MeO-PCB candidates. The cos θ assesses the differences between two multivariable vectors, for example, the MS/MS profiles of MeO-PCBs, with a value of zero for completely different vectors and a value of 1 for identical vectors. 15 The initial score was then normalized with the maximal ( , ) and median ( , ) score across all candidates (i.e., = cos ( , − , ) ⁄ ).
Subsequently, a weighted rank score (S) of a candidate was calculated. We initially assessed the true rates (TR) with which or ranked the true positive as the top 1 candidate. Briefly, a whole set of MeO-PCBs with one to three chlorines (n=295) were sampled, and the values of RRTs and MS/MS profiles were predicted with the models developed (i.e., MLR model for RRT prediction and RFR model for MS/MS prediction). For di-MeO-PCBs, only compounds with methoxy groups ortho or para to each other were sampled because only PCB catechol and hydroquinone metabolites are formed in metabolism studies. 9,16,17 The and values of representative, available mono-to tri-chlorinated MeO-PCBs were ranked, and the true rates were studied. The true rates of rankings ( ) was 67 % (n=52) and 19 % (n=51) for MeO-PCB congener and homologs, respectively, while the true rates of rankings ( ) were 73 % (n=42, coeluting compounds were removed) and 38 % (n=41, coeluting compounds were S8 removed) for MeO-PCB congener and homologs, respectively. The weights of ( ) and ( ) were calculated as = ( + ) ⁄ and = 1 − , respectively. These weights were used to calculate the weighted rank score of a candidate structure as = + . The weighted rank scores of all candidates were divided by the maximal score to receive scores that are scaled from 0 to 1. Sample collection from mice exposed to PCB 95. Adult male or female C57BL/6 mice were exposed to a single oral dose of racemic PCB 95 (1.0 mg/kg) in stripped corn oil (10 ml/kg; lot# A0395699; cat# 801-03-7; Fisher Scientific, Waltham, MA, USA) via oral gavage. PCB95 was synthesized and authenticated as described previously. 18 Control animals received corn oil alone.
Animals were euthanized 24 h after the PCB 95 administration; various tissues were dissected and stored for another study. Feces from dissected distal colon and rectum were collected, stored at -80 o C, and shipped on dry ice to the University of Iowa for the analysis of hydroxylated PCB 95 metabolites.
Sample collection from mice exposed to MARBLES PCB mixture. The liver sample from a male mouse exposed via the maternal diet to the MARBLES PCB mixture was generated at the University of California, Davis, as part of an overall study designed to assess the effects of developmental exposure to the MARBLES PCB mixture on multiple developmental outcomes. 19,20 Briefly, C57Bl/6J and SVJ129 WT mice were purchased from Jackson Labs (Sacramento, CA) and crossed to generate 75% C57Bl/6J / 25% SVJ129 mice. These mice were used as congenic S9 wild-type mice in the overall study. All animals were housed in clear plastic shoebox cages containing corn cob bedding and maintained on a 12 h light and dark cycle at 22 ± 2 °C with 40-50% humidity. Feed (Diet 5058, LabDiet, Saint Louis, MO) and water were available ad libitum.
Two weeks prior to mating, nulliparous and previously unmated dams (>6 weeks of age) were singly housed, and PCB dosing was initiated. Dams were placed with a male overnight for mating. Males and females were separated the next day, and females were checked for the presence of a copulatory plug, which was considered gestational day 0. After mating, dams were housed singly prior to parturition and with their pups after parturition. On postnatal day 2, pups were culled or cross-fostered to ensure all litters consisted of 4-8 pups.
The MARBLES PCB mixture was prepared to mimic the PCB congener profile of the twelve most prevalent PCB congeners detected in the serum of pregnant women enrolled in the MARBLES human epidemiological cohort. 18,21 These women are at increased risk for having a child with a neurodevelopmental disorder. 22

S10
Livers from several other PND 21 pups from a dam exposed to peanut butter without PCBs were used as controls.
Extraction of the hydroxylated PCBs from samples collected in animal studies. Analyzing hydroxylated PCB 95 metabolites in the feces of mice exposed to PCB 95. A feces sample was collected from a male mouse orally exposed to PCB 95 for 24 h and extracted following a published procedure. [23][24][25] Feces samples from a PCB 95 exposed mouse and a control mouse was cleaned up using an acidified silica gel column, followed by sulfuric acid treatment, before GC-MS/MS analysis using the MRM method described above. The sample extraction and analysis procedures were the same as described above, except that no sulfuric acid treatment step was employed.