Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.

kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species.

(2024)

The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.

Cover page of Tapping the treasure trove of atypical phages

Tapping the treasure trove of atypical phages

(2024)

With advancements in genomics technologies, a vast diversity of 'atypical' phages, that is, with single-stranded DNA or RNA genomes, are being uncovered from different ecosystems. Though these efforts have revealed the existence and prevalence of these nonmodel phages, computational approaches often fail to associate these phages with their specific bacterial host(s), while the lack of methods to isolate these phages has limited our ability to characterize infectivity pathways and new gene function. In this review, we call for the development of generalizable experimental methods to better capture this understudied viral diversity via isolation and study them through gene-level characterization and engineering. Establishing a diverse set of new 'atypical' phage model systems has the potential to provide many new biotechnologies, including potential uses of these atypical phages in halting the spread of antibiotic resistance and engineering of microbial communities for beneficial outcomes.

Cover page of BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery.

BioTextQuest v2.0: An evolved tool for biomedical literature mining and concept discovery.

(2024)

The process of navigating through the landscape of biomedical literature and performing searches or combining them with bioinformatics analyses can be daunting, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related repositories. Herein, we present BioTextQuest v2.0, a tool for biomedical literature mining. BioTextQuest v2.0 is an open-source online web portal for document clustering based on sets of selected biomedical terms, offering efficient management of information derived from PubMed abstracts. Employing established machine learning algorithms, the tool facilitates document clustering while allowing users to customize the analysis by selecting terms of interest. BioTextQuest v2.0 streamlines the process of uncovering valuable insights from biomedical research articles, serving as an agent that connects the identification of key terms like genes/proteins, diseases, chemicals, Gene Ontology (GO) terms, functions, and others through named entity recognition, and their application in biological research. Instead of manually sifting through articles, researchers can enter their PubMed-like query and receive extracted information in two user-friendly formats, tables and word clouds, simplifying the comprehension of key findings. The latest update of BioTextQuest leverages the EXTRACT named entity recognition tagger, enhancing its ability to pinpoint various biological entities within text. BioTextQuest v2.0 acts as a research assistant, significantly reducing the time and effort required for researchers to identify and present relevant information from the biomedical literature.

Cover page of Visualizing metagenomic and metatranscriptomic data: A comprehensive review

Visualizing metagenomic and metatranscriptomic data: A comprehensive review

(2024)

The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.

Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource

(2024)

Accessible and easy-to-use standardized bioinformatics workflows are necessary to advance microbiome research from observational studies to large-scale, data-driven approaches. Standardized multi-omics data enables comparative studies, data reuse, and applications of machine learning to model biological processes. To advance broad accessibility of standardized multi-omics bioinformatics workflows, the National Microbiome Data Collaborative (NMDC) has developed the Empowering the Development of Genomics Expertise (NMDC EDGE) resource, a user-friendly, open-source web application (https://nmdc-edge.org). Here, we describe the design and main functionality of the NMDC EDGE resource for processing metagenome, metatranscriptome, natural organic matter, and metaproteome data. The architecture relies on three main layers (web application, orchestration, and execution) to ensure flexibility and expansion to future workflows. The orchestration and execution layers leverage best practices in software containers and accommodate high-performance computing and cloud computing services. Further, we have adopted a robust user research process to collect feedback for continuous improvement of the resource. NMDC EDGE provides an accessible interface for researchers to process multi-omics microbiome data using production-quality workflows to facilitate improved data standardization and interoperability.

Genome sequence of Nitrosopumilus adriaticus CCS1 assembled from an ammonia-oxidizing enrichment culture.

(2024)

We report the metagenome-assembled genome of an ammonia-oxidizing archaeon that is closely related to Nitrosopumilus adriaticus NF5 but shows distinct genomic features compared to strain NF5.

Cover page of Changes to virus taxonomy and the ICTV Statutes ratified by the International Committee on Taxonomy of Viruses (2024).

Changes to virus taxonomy and the ICTV Statutes ratified by the International Committee on Taxonomy of Viruses (2024).

(2024)

This article reports changes to virus taxonomy and taxon nomenclature that were approved and ratified by the International Committee on Taxonomy of Viruses (ICTV) in April 2024. The entire ICTV membership was invited to vote on 203 taxonomic proposals that had been approved by the ICTV Executive Committee (EC) in July 2023 at the 55th EC meeting in Jena, Germany, or in the second EC vote in November 2023. All proposals were ratified by online vote. Taxonomic additions include one new phylum (Ambiviricota), one new class, nine new orders, three new suborders, 51 new families, 18 new subfamilies, 820 new genera, and 3547 new species (excluding taxa that have been abolished). Proposals to complete the process of species name replacement to the binomial (genus + species epithet) format were ratified. Currently, a total of 14,690 virus species have been established.

Cover page of Transcriptomics reveal a mechanism of niche defense: two beneficial root endophytes deploy an antimicrobial GH18‐CBM5 chitinase to protect their hosts

Transcriptomics reveal a mechanism of niche defense: two beneficial root endophytes deploy an antimicrobial GH18‐CBM5 chitinase to protect their hosts

(2024)

Effector secretion is crucial for root endophytes to establish and protect their ecological niche. We used time-resolved transcriptomics to monitor effector gene expression dynamics in two closely related Sebacinales, Serendipita indica and Serendipita vermifera, during symbiosis with three plant species, competition with the phytopathogenic fungus Bipolaris sorokiniana, and cooperation with root-associated bacteria. We observed increased effector gene expression in response to biotic interactions, particularly with plants, indicating their importance in host colonization. Some effectors responded to both plants and microbes, suggesting dual roles in intermicrobial competition and plant-microbe interactions. A subset of putative antimicrobial effectors, including a GH18-CBM5 chitinase, was induced exclusively by microbes. Functional analyses of this chitinase revealed its antimicrobial and plant-protective properties. We conclude that dynamic effector gene expression underpins the ability of Sebacinales to thrive in diverse ecological niches with a single fungal chitinase contributing substantially to niche defense.