Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Previously Published Works bannerUC Davis

Machine Vision Methods, Natural Language Processing, and Machine Learning Algorithms for Automated Dispersion Plot Analysis and Chemical Identification from Complex Mixtures

Abstract

Gas-phase trace chemical detection techniques such as ion mobility spectrometry (IMS) and differential mobility spectrometry (DMS) can be used in many settings, such as evaluating the health condition of patients or detecting explosives at airports. These devices separate chemical compounds in a mixture and provide information to identify specific chemical species of interest. Further, these types of devices operate well in both controlled lab environments and in-field applications. Frequently, the commercial versions of these devices are highly tailored for niche applications (e.g., explosives detection) because of the difficulty involved in reconfiguring instrumentation hardware and data analysis software algorithms. In order for researchers to quickly adapt these tools for new purposes and broader panels of chemical targets, it is critical to develop new algorithms and methods for generating libraries of these sensor responses. Microelectromechanical system (MEMS) technology has been used to fabricate DMS devices that miniaturize the platforms for easier deployment; however, concurrent advances in advanced data analytics are lagging. DMS generates complex three-dimensional dispersion plots for both positive and negative ions in a mixture. Although simple spectra of single chemicals are straightforward to interpret (both visually and via algorithms), it is exceedingly challenging to interpret dispersion plots from complex mixtures with many chemical constituents. This study uses image processing and computer vision steps to automatically identify features from DMS dispersion plots. We used the bag-of-words approach adapted from natural language processing and information retrieval to cluster and organize these features. Finally, a support vector machine (SVM) learning algorithm was trained using these features in order to detect and classify specific compounds in these represented conceptualized data outputs. Using this approach, we successfully maintain a high level of correct chemical identification, even when a gas mixture increases in complexity with interfering chemicals present.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View