Non-methane organic compounds (NMOCs) play an important role as air pollutant precursors. For example, NMOCs can form secondary organic aerosol (SOA), which is a major component of fine particulate matter ($\rm PM_{2.5}$). As a significant mass fraction of $\rm PM_{2.5}$, SOA affects air quality, visibility, radiative forcing and cloud droplet formation. NMOCs can be categorized as biogenic (e.g., forest emissions), anthropogenic (e.g., transportation emissions) or pyrogenic (e.g., wildfire emissions). Given the importance of NMOCs for atmospheric chemistry and air pollutant formation, there is ongoing need to track NMOCs and understand how variables such as source types, meteorology, and human behavior affect NMOC mixing ratios and SOA formation. Due to technological advances in analytical and instrumental techniques, including those used to quantify NMOCs, there has been a large increase in NMOC data generated. This poses a challenge for efficient and accurate data analysis, and thus presents an opportunity for statistical data analysis methods that are complementary to more conventional approaches. As presented in this thesis, two tools for faster NMOC data post-processing and chemometric analyses were created: 1) a chromatographic alignment algorithm, and 2) a supervised pattern recognition algorithm for classification and source apportionment analyses of emissions from emerging anthropogenic (e.g., vehicles, personal care products) and pyrogenic (e.g., wildfires) sources of interest in the formation of air pollutants.
The chromatographic alignment algorithm corrects any retention time deviations that occur due to matrix effects during instrumental analysis and is a necessary pre-processing step for the pattern recognition analysis. The alignment is performed using a simple measure of similarity, specifically, the cosine similarity. The supervised pattern recognition algorithm can unveil relationships between samples based on parameters of interest such as source types, and for biomass burning, fuel types. It is comprised of three main parts: 1) feature selection with an analysis of variance (ANOVA) based method, 2) dimensionality reduction via principal components analysis (PCA), and 3) clustering with \textit{k}-means. The chromatographic alignment algorithm was applied in an analysis of engine tailpipe emissions samples; the algorithm successfully aligned approximately 110 detected NMOCs in 32 emission samples and reduced post-processing time from weeks to minutes. The pattern recognition algorithm was applied in three case studies involving analysis of tailpipe emissions samples (anthropogenic) and biomass burning samples (pyrogenic).
Applied to the tailpipe emissions samples, the algorithm identified 15 NMOCs (out of 110) that effectively separated the tailpipe emission samples among eight different gasoline fuel blends, based on patterns in the emitted NMOCs associated with the different fuel blends. Applied to biomass burning samples, the algorithm was able to successfully differentiate laboratory smoke samples collected from combustion of three different fuel families (firs, pines and spruce) using automated selection of only five compounds (out of 93). Furthermore, using the unique NMOC profiles associated with the different fuel families, a classification model was created that successfully classified smoke samples from prescribed burns based on their dominant fuel source. In a second biomass burning case study, the algorithm was applied to data collected from smoke plume transects during two recent large-scale field campaigns (WE-CAN and FIREX-AQ), in an effort to identify NMOCs that were linked with observed enhancement in organic aerosol mass. Samples were separated based on whether positive or negative/neutral organic aerosol enhancement was observed; no consistent set of NMOCs could be identified that successfully differentiated the samples between the two groups. Additional metrics were evaluated, however they did not provide any further ability to differentiate the samples. It was determined that the plume-to-plume variability, and competing chemical and physical processes, overwhelmed clear patterns in NMOC mixing ratios which thus resulted in poor statistical power for differentiating samples. Collectively, the tools developed and presented here provide a streamlined approach for the analysis of samples from diverse sources that reduces the post-processing time and adds information that is difficult to obtain using traditional analytical approaches. The alignment algorithm can be applied to different samples, and the pattern recognition algorithm can be applied to different observational data, thus linking observations with predictive variables including source type, meteorology and human behaviors.