Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.

kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.

(2024)

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.

Cover page of Visualizing metagenomic and metatranscriptomic data: A comprehensive review

Visualizing metagenomic and metatranscriptomic data: A comprehensive review

(2024)

The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.

Cover page of Stress responses in an Arctic microalga (Pelagophyceae) following sudden salinity change revealed by gene expression analysis.

Stress responses in an Arctic microalga (Pelagophyceae) following sudden salinity change revealed by gene expression analysis.

(2024)

Marine microbes that have for eons been adapted to stable salinity regimes are confronted with sudden decreases in salinity in the Arctic Ocean. The episodic freshening is increasing due to climate change with melting multi-year sea-ice and glaciers, greater inflows from rivers, and increased precipitation. To investigate algal responses to lowered salinity, we analyzed the responses and acclimatation over 24 h in a non-model Arctic marine alga (pelagophyte CCMP2097) following transfer to realistic lower salinities. Using RNA-seq transcriptomics, here we show rapid differentially expressed genes related to stress oxidative responses, proteins involved in the photosystem and circadian clock, and those affecting lipids and inorganic ions. After 24 h the pelagophyte adjusted to the lower salinity seen in the overexpression of genes associated with freezing resistance, cold adaptation, and salt tolerance. Overall, a suite of ancient widespread pathways is recruited enabling the species to adjust to the stress of rapid salinity change.

Cover page of Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota.

Coassembly and binning of a twenty-year metagenomic time-series from Lake Mendota.

(2024)

The North Temperate Lakes Long-Term Ecological Research (NTL-LTER) program has been extensively used to improve understanding of how aquatic ecosystems respond to environmental stressors, climate fluctuations, and human activities. Here, we report on the metagenomes of samples collected between 2000 and 2019 from Lake Mendota, a freshwater eutrophic lake within the NTL-LTER site. We utilized the distributed metagenome assembler MetaHipMer to coassemble over 10 terabases (Tbp) of data from 471 individual Illumina-sequenced metagenomes. A total of 95,523,664 contigs were assembled and binned to generate 1,894 non-redundant metagenome-assembled genomes (MAGs) with ≥50% completeness and ≤10% contamination. Phylogenomic analysis revealed that the MAGs were nearly exclusively bacterial, dominated by Pseudomonadota (Proteobacteria, N = 623) and Bacteroidota (N = 321). Nine eukaryotic MAGs were identified by eukCC with six assigned to the phylum Chlorophyta. Additionally, 6,350 high-quality viral sequences were identified by geNomad with the majority classified in the phylum Uroviricota. This expansive coassembled metagenomic dataset provides an unprecedented foundation to advance understanding of microbial communities in freshwater ecosystems and explore temporal ecosystem dynamics.

Cover page of Comparative transcriptomics provides insights into molecular mechanisms of zinc tolerance in the ectomycorrhizal fungus Suillus luteus

Comparative transcriptomics provides insights into molecular mechanisms of zinc tolerance in the ectomycorrhizal fungus Suillus luteus

(2024)

Zinc (Zn) is a major soil contaminant and high Zn levels can disrupt growth, survival, and reproduction of fungi. Some fungal species evolved Zn tolerance through cell processes mitigating Zn toxicity, although the genes and detailed mechanisms underlying mycorrhizal fungal Zn tolerance remain unexplored. To fill this gap in knowledge, we investigated the gene expression of Zn tolerance in the ectomycorrhizal fungus Suillus luteus. We found that Zn tolerance in this species is mainly a constitutive trait that can also be environmentally dependent. Zinc tolerance in S. luteus is associated with differences in the expression of genes involved in metal exclusion and immobilization, as well as recognition and mitigation of metal-induced oxidative stress. Differentially expressed genes were predicted to be involved in transmembrane transport, metal chelation, oxidoreductase activity, and signal transduction. Some of these genes were previously reported as candidates for S. luteus Zn tolerance, while others are reported here for the first time. Our results contribute to understanding the mechanisms of fungal metal tolerance and pave the way for further research on the role of fungal metal tolerance in mycorrhizal associations.

Cover page of Functional genomic screening in Komagataella phaffii enabled by high-activity CRISPR-Cas9 library

Functional genomic screening in Komagataella phaffii enabled by high-activity CRISPR-Cas9 library

(2024)

CRISPR-based high-throughput genome-wide loss-of-function screens are a valuable approach to functional genetics and strain engineering. The yeast Komagataella phaffii is a host of particular interest in the biopharmaceutical industry and as a metabolic engineering host for proteins and metabolites. Here, we design and validate a highly active 6-fold coverage genome-wide sgRNA library for this biotechnologically important yeast containing 30,848 active sgRNAs targeting over 99% of its coding sequences. Conducting fitness screens in the absence of functional non-homologous end joining (NHEJ), the dominant DNA repair mechanism in K. phaffii, provides a quantitative means to assess the activity of each sgRNA in the library. This approach allows for the experimental validation of each guide's targeting activity, leading to more precise screening outcomes. We used this approach to conduct growth screens with glucose as the sole carbon source and identify essential genes. Comparative analysis of the called gene sets identified a core set of K. phaffii essential genes, many of which relate to metabolic engineering targets, including protein production, secretion, and glycosylation. The high activity, genome-wide CRISPR library developed here enables functional genomic screening in K. phaffii, applied here to gene essentiality classification, and promises to enable other genetic screens.

Expression of dehydroshikimate dehydratase in poplar induces transcriptional and metabolic changes in the phenylpropanoid pathway

(2024)

Modification of lignin in feedstocks via genetic engineering aims to reduce biomass recalcitrance to facilitate efficient conversion processes. These improvements can be achieved by expressing exogenous enzymes that interfere with native biosynthetic pathways responsible for the production of the lignin precursors. In planta expression of a bacterial 3-dehydroshikimate dehydratase in poplar trees reduced lignin content and altered the monomer composition, which enabled higher yields of sugars after cell wall polysaccharide hydrolysis. Understanding how plants respond to such genetic modifications at the transcriptional and metabolic levels is needed to facilitate further improvement and field deployment. In this work, we acquired fundamental knowledge on lignin-modified poplar expressing 3-dehydroshikimate dehydratase using RNA-seq and metabolomics. The data clearly demonstrate that changes in gene expression and metabolite abundance can occur in a strict spatiotemporal fashion, revealing tissue-specific responses in the xylem, phloem, or periderm. In the poplar line that exhibited the strongest reduction in lignin, we found that 3% of the transcripts had altered expression levels and ~19% of the detected metabolites had differential abundance in the xylem from older stems. The changes affected predominantly the shikimate and phenylpropanoid pathways as well as secondary cell wall metabolism, and resulted in significant accumulation of hydroxybenzoates derived from protocatechuate and salicylate.

Cover page of Evolutionary genomic analyses of canine E. coli infections identify a relic capsular locus associated with resistance to multiple classes of antimicrobials.

Evolutionary genomic analyses of canine E. coli infections identify a relic capsular locus associated with resistance to multiple classes of antimicrobials.

(2024)

UNLABELLED: Infections caused by antimicrobial-resistant Escherichia coli are the leading cause of death attributed to antimicrobial resistance (AMR) worldwide, and the known AMR mechanisms involve a range of functional proteins. Here, we employed a pan-genome wide association study (GWAS) approach on over 1,000 E. coli isolates from sick dogs collected across the US and Canada and identified a strong statistical association (empirical P < 0.01) of AMR, involving a range of antibiotics to a group 1 capsular (CPS) gene cluster. This cluster included genes under relaxed selection pressure, had several loci missing, and had pseudogenes for other key loci. Furthermore, this cluster is widespread in E. coli and Klebsiella clinical isolates across multiple host species. Earlier studies demonstrated that the octameric CPS polysaccharide export protein Wza can transmit macrolide antibiotics into the E. coli periplasm. We suggest that the CPS in question, and its highly divergent Wza, functions as an antibiotic trap, preventing antimicrobial penetration. We also highlight the high diversity of lineages circulating in dogs across all regions studied, the overlap with human lineages, and regional prevalence of resistance to multiple antimicrobial classes. IMPORTANCE: Much of the human genomic epidemiology data available for E. coli mechanism discovery studies has been heavily biased toward shiga-toxin producing strains from humans and livestock. E. coli occupies many niches and produces a wide variety of other significant pathotypes, including some implicated in chronic disease. We hypothesized that since dogs tend to share similar strains with their owners and are treated with similar antibiotics, their pathogenic isolates will harbor unexplored AMR mechanisms of importance to humans as well as animals. By comparing over 1,000 genomes with in vitro antimicrobial susceptibility data from sick dogs across the US and Canada, we identified a strong multidrug resistance association with an operon that appears to have once conferred a type 1 capsule production system.