Foodborne outbreaks have been continuously reported during the past decade in the United States. Overall, 129 outbreaks of pathogenic Escherichia coli O157:H7, Listeria monocytogenes, and Salmonella enterica infections linked to fresh produce occurred from 2010 to 2021 in the U.S., causing a total of 8,889 illnesses with 2,412 hospitalizations and 90 deaths. Among the different kinds of fresh produce, Romaine lettuce (RL) is one of the most frequently reported variety of fresh produce associated with the outbreaks. Pathogen contamination can occur at any point during the preharvest, e.g., soil, irrigation water, and insect; and during the postharvest, e.g., produce processing, distribution, and preparation. Intensive studies have shown that foodborne pathogens can persist in RL for a period of time. Unfortunately, current washing strategies cannot eliminate the potential pathogens. The fact is that foodborne pathogens present in fresh produce are typically at low levels, making the detection of pathogens even more challenging. The development of next-generation sequencing techniques provides us with a better understanding of the native microbiota present in RL. The RL microbiota data allow us to not only understand the connection between native microbiota and pathogen contamination but also potentially can be used as a tool to predict produce safety and shelf-life. In this dissertation, I first investigated the impact of storage time, season, and brands on bacterial populations in commercially washed, chopped, and bagged RL by using culture-dependent and culture-independent methods. The results showed that RL had almost no differences in aerobic plate count (APC) and anaerobic plate count (AnPC) between “Use By” date (UBD) and 5 days after the UBD (UBD5) at 4 °C, but differences were seen between RL from two seasons as well as among three brands of RL. At the same time, the results based on 16S rRNA gene amplicon DNA sequencing showed no difference in bacterial diversity and composition between UBD and UBD5 at 4 oC, while the results showed that the season when the RL was harvested generated a significant impact on the diversity and composition of bacterial communities. The brand is the factor that significantly shaped the population as well, indicating that harvest locations, processing strategies, and distribution conditions may impact bacterial populations (Chapter 2).
Through my artificial inoculation study (Chapter 3), I found the native microbiota in RL impacted the behaviors of Listeria monocytogenes (LM) at 4 °C. LM decreased in abundance from UBD to UBD5 in the late-season RL but remained at the same levels in the early-season RL. Results of 16S rRNA gene amplicon DNA sequencing with differential abundance analysis showed that the identified Leuconostoc, Lactococcus, and Erwinia in late-season RL may be the potential reason for the LM decrease. In addition, Pseudomonas, Chryseobacterium, Duganella, and Listeria were identified as indicators for RL inoculated with a high level of LM, and Serratia, Pedobacter, Janthinobacterium, Flavobacterium, and Weissella were identified as indicators for RL inoculated with a low level of LM. Through culture-based methods, 32 antagonistic bacteria were isolated and confirmed by using the double-layer and spot-on-lawn methods. Three species were identified by using Sanger sequencing and whole genome sequencing, with these species being Carnobacterium maltaromaticum, Lactococcus lactis, and Bacillus filamentous. Quantitative PCR was used to further confirm and quantify the three antagonistic species in RL. The results showed no difference in abundances of C. maltaromaticum and Lc. lactis between RL from two seasons, while B. filamentosus presented higher abundances in late-season RL than in early-season RL. Additional research is needed to better confirm the antagonist function of B. filamentosus by using the co-culturing method and better identify the anti-LM mechanism of B. filamentosus.
During sequencing data processing, up to more than 50% of sequence loss occurred during alignment and denoising, which may cause important bacteria species information to be missing or lost. To test the hypothesis, I explored an alignment-free strategy, k-mer hashes, by establishing classifiers to predict fresh produce contamination and quality reduction. The k-mer hash approach was compared against the amplicon sequence variant (ASV) approach with a typical denoising step. The results showed that random forests (RF)-based classifiers for fresh produce safety and quality that were trained on 7-mer hash-preprocessed publicly available datasets had significantly higher classification accuracy than those using the ASV datasets, supporting our hypothesis that data preprocessing strategies without sequence loss (k-mer hash) retain more important information about bacteria species for produce safety and quality classification than the approach with sequence loss (ASV). In addition, the integration of multiple produce microbiota datasets led RF-based classifiers to have a higher classification accuracy than the classifiers trained with individual datasets. The RF-based classifiers based on integrated datasets identified more consistent and generalizable indicators associated with fresh produce safety and quality (Chapter 4).
In summary, this dissertation systematically profiled the structures of bacterial populations in RL and investigated how to better utilize culture-dependent methods, culture-independent methods, and publicly available datasets to identify bacteria associated with produce safety and quality and to isolate competitive exclusion bacteria that can potentially be used as biocontrol agents to reduce foodborne pathogen contamination of produce.