Chronic diseases such as cardiovascular disease, diabetes, and cancer are the leading causes of death among developed and developing countries, and account for approximately 75 percent of deaths worldwide. With the sequencing of the human genome and subsequent genomic studies, we now know genetic factors alone are responsible for a relatively small portion of these diseases. Specifically, cancer risk attributed to genetic factors is typically about eight percent. Thus, the vast majority of cancer risk likely lies within the realm of exposures (non-genetic factors) or a combination of genetic factors and exposures. The collection of exposures over an individual’s lifetime comprise the concept of the exposome, an epidemiological complement to the genome. The exposome is defined by measurement of both endogenous (inflammation, lipid peroxidation, microbiota) and exogenous (air pollutants, pesticides, drugs, diet, etc.) exposures within an individual.
Much exposure data is from non-individualized sources, such as air quality monitors or other spatial-temporal data, which have limited use in epidemiology. Individual exposure assessment consists largely of self-reported dietary and lifestyle data from interviews or questionnaires. In recent years, advances in analytical chemistry have permitted the simultaneous detection of thousands of molecules in biological fluids including urine, whole blood, plasma, and serum.
High resolution liquid chromatography mass spectrometry (LCMS) is a powerful technique to measure the accurate masses of molecules in biological fluids for high-throughput epidemiological studies. Chapter 1 of this dissertation details a method for the analysis of lipophilic molecules in plasma using specimens from 158 healthy volunteer subjects. The resulting data revealed levels of lipids and other molecules that differed between smoking and nonsmoking, white and black, and male and female subjects. A modified version of this LCMS method was used in the analysis of serum from subjects in a nested case-control study, described in Chapters 2 and 3.
Colorectal cancer (CRC) accounts for one fourth of all cancer deaths worldwide and less than about 15 percent of CRC risk is attributable to genetic factors alone. To investigate possible influences of exposures on CRC risk, serum from 190 subjects in the European Prospective Investigation of Cancer and Nutrition (EPIC) were extracted for lipophilic molecules and analyzed with high resolution LCMS. These prospective samples – collected up to 22 years prior to diagnosis - offered a unique opportunity to differentiate between CRC biomarkers related to disease causes and those that result from disease progression. Chapter 2 describes the testing of one class of lipids, ultra-long chain fatty acids (ULCFAs), that had been reported as a probable protective factor of CRC. Paired case-control differences were assessed with respect to the time period from when the serum was collected (study enrollment) to when the case was diagnosed. Since, case-control differences decreased with increasing time prior to case diagnosis, ULCFAs were likely depleted by cancer progression rather than by protective exposures.
Many of the features in LCMS profiling are unannotated (identity unknown) chemicals. Rather than relying on hypothesis-driven analyses of only known compounds, data-driven analyses of reliably detectable features can result in the generation of new hypotheses of possible disease-causing exposures. This untargeted methodology, used in Chapters 1 and 3, makes lipidomic and other exposure-related profiling a powerful tool in exposure assessment. In Chapter 3, the untargeted analysis of features in EPIC CRC serum samples revealed potentially relevant molecules associated with CRC causes and disease progression. As opposed to traditional p-value-centric analysis used in Chapters 1 and 2, a combination of regularized regression, random forest, and t-tests were employed in the feature selection for this untargeted analysis.
In Chapter 4, the lipophilic data from the healthy volunteer samples of Chapter 1 are studied once again. Using a method similar to the regularized regression technique described in Chapter 3, we determined which lipids were associated with levels of adductomic biomarkers (another methodology developed in our laboratory), which had also been measured in plasma from the same healthy volunteers. Analysis of the combined data from these two OMIC datasets found interesting correlations between particular lipids and adducts in these samples.