Skip to main content
eScholarship
Open Access Publications from the University of California
Cover page of Metagenome-assembled genomes of freshwater Hyphomicrobium sp. G-191 and Methylophilus sp. enriched from Cedar Swamp, Woods Hole, MA.

Metagenome-assembled genomes of freshwater Hyphomicrobium sp. G-191 and Methylophilus sp. enriched from Cedar Swamp, Woods Hole, MA.

(2024)

Hyphomicrobium are facultative denitrifying anaerobes capable of using one-carbon compounds as a sole carbon source. Hyphomicrobium sp. G-191 was enriched from Cedar Swamp, Woods Hole, Massachusetts, using a selective medium for methanol-utilizing bacteria. We present two draft metagenome-assembled genomes (MAGs) of a Hyphomicrobium and a Methylophilus species.

Cover page of Tapping the treasure trove of atypical phages

Tapping the treasure trove of atypical phages

(2024)

With advancements in genomics technologies, a vast diversity of 'atypical' phages, that is, with single-stranded DNA or RNA genomes, are being uncovered from different ecosystems. Though these efforts have revealed the existence and prevalence of these nonmodel phages, computational approaches often fail to associate these phages with their specific bacterial host(s), while the lack of methods to isolate these phages has limited our ability to characterize infectivity pathways and new gene function. In this review, we call for the development of generalizable experimental methods to better capture this understudied viral diversity via isolation and study them through gene-level characterization and engineering. Establishing a diverse set of new 'atypical' phage model systems has the potential to provide many new biotechnologies, including potential uses of these atypical phages in halting the spread of antibiotic resistance and engineering of microbial communities for beneficial outcomes.

Standardized and accessible multi-omics bioinformatics workflows through the NMDC EDGE resource

(2024)

Accessible and easy-to-use standardized bioinformatics workflows are necessary to advance microbiome research from observational studies to large-scale, data-driven approaches. Standardized multi-omics data enables comparative studies, data reuse, and applications of machine learning to model biological processes. To advance broad accessibility of standardized multi-omics bioinformatics workflows, the National Microbiome Data Collaborative (NMDC) has developed the Empowering the Development of Genomics Expertise (NMDC EDGE) resource, a user-friendly, open-source web application (https://nmdc-edge.org). Here, we describe the design and main functionality of the NMDC EDGE resource for processing metagenome, metatranscriptome, natural organic matter, and metaproteome data. The architecture relies on three main layers (web application, orchestration, and execution) to ensure flexibility and expansion to future workflows. The orchestration and execution layers leverage best practices in software containers and accommodate high-performance computing and cloud computing services. Further, we have adopted a robust user research process to collect feedback for continuous improvement of the resource. NMDC EDGE provides an accessible interface for researchers to process multi-omics microbiome data using production-quality workflows to facilitate improved data standardization and interoperability.

AI-readiness for Biomedical Data: Bridge2AI Recommendations

(2024)

Biomedical research and clinical practice are in the midst of a transition toward significantly increased use of artificial intelligence (AI) and machine learning (ML) methods. These advances promise to enable qualitatively deeper insight into complex challenges formerly beyond the reach of analytic methods and human intuition while placing increased demands on ethical and explainable artificial intelligence (XAI), given the opaque nature of many deep learning methods. The U.S. National Institutes of Health (NIH) has initiated a significant research and development program, Bridge2AI, aimed at producing new "flagship" datasets designed to support AI/ML analysis of complex biomedical challenges, elucidate best practices, develop tools and standards in AI/ML data science, and disseminate these datasets, tools, and methods broadly to the biomedical community. An essential set of concepts to be developed and disseminated in this program along with the data and tools produced are criteria for AI-readiness of data, including critical considerations for XAI and ethical, legal, and social implications (ELSI) of AI technologies. NIH Bridge to Artificial Intelligence (Bridge2AI) Standards Working Group members prepared this article to present methods for assessing the AI-readiness of biomedical data and the data standards perspectives and criteria we have developed throughout this program. While the field is rapidly evolving, these criteria are foundational for scientific rigor and the ethical design and application of biomedical AI methods.

Metabolites from intact phage-infected Synechococcus chemotactically attract heterotrophic marine bacteria

(2024)

Chemical cues mediate interactions between marine phytoplankton and bacteria, underpinning ecosystem-scale processes including nutrient cycling and carbon fixation. Phage infection alters host metabolism, stimulating the release of chemical cues from intact plankton, but how these dynamics impact ecology and biogeochemistry is poorly understood. Here we determine the impact of phage infection on dissolved metabolite pools from marine cyanobacteria and the subsequent chemotactic response of heterotrophic bacteria using time-resolved metabolomics and microfluidics. Metabolites released from intact, phage-infected Synechococcus elicited strong chemoattraction from Vibrio alginolyticus and Pseudoalteromonas haloplanktis, especially during early infection stages. Sustained bacterial chemotaxis occurred towards live-infected Synechococcus, contrasted by no discernible chemotaxis towards uninfected cyanobacteria. High-throughput microfluidics identified 5'-deoxyadenosine and 5'-methylthioadenosine as key attractants. Our findings establish that, before lysis, phage-infected picophytoplankton release compounds that attract motile heterotrophic bacteria, suggesting a mechanism for resource transfer that might impact carbon and nutrient fluxes across trophic levels.

Cover page of Binary vector copy number engineering improves Agrobacterium-mediated transformation

Binary vector copy number engineering improves Agrobacterium-mediated transformation

(2024)

The copy number of a plasmid is linked to its functionality, yet there have been few attempts to optimize higher-copy-number mutants for use across diverse origins of replication in different hosts. We use a high-throughput growth-coupled selection assay and a directed evolution approach to rapidly identify origin of replication mutations that influence copy number and screen for mutants that improve Agrobacterium-mediated transformation (AMT) efficiency. By introducing these mutations into binary vectors within the plasmid backbone used for AMT, we observe improved transient transformation of Nicotiana benthamiana in four diverse tested origins (pVS1, RK2, pSa and BBR1). For the best-performing origin, pVS1, we isolate higher-copy-number variants that increase stable transformation efficiencies by 60-100% in Arabidopsis thaliana and 390% in the oleaginous yeast Rhodosporidium toruloides. Our work provides an easily deployable framework to generate plasmid copy number variants that will enable greater precision in prokaryotic genetic engineering, in addition to improving AMT efficiency.

Cover page of High-throughput protein characterization by complementation using DNA barcoded fragment libraries

High-throughput protein characterization by complementation using DNA barcoded fragment libraries

(2024)

Our ability to predict, control, or design biological function is fundamentally limited by poorly annotated gene function. This can be particularly challenging in non-model systems. Accordingly, there is motivation for new high-throughput methods for accurate functional annotation. Here, we used complementation of auxotrophs and DNA barcode sequencing (Coaux-Seq) to enable high-throughput characterization of protein function. Fragment libraries from eleven genetically diverse bacteria were tested in twenty different auxotrophic strains of Escherichia coli to identify genes that complement missing biochemical activity. We recovered 41% of expected hits, with effectiveness ranging per source genome, and observed success even with distant E. coli relatives like Bacillus subtilis and Bacteroides thetaiotaomicron. Coaux-Seq provided the first experimental validation for 53 proteins, of which 11 are less than 40% identical to an experimentally characterized protein. Among the unexpected function identified was a sulfate uptake transporter, an O-succinylhomoserine sulfhydrylase for methionine synthesis, and an aminotransferase. We also identified instances of cross-feeding wherein protein overexpression and nearby non-auxotrophic strains enabled growth. Altogether, Coaux-Seq's utility is demonstrated, with future applications in ecology, health, and engineering.

Community standards and future opportunities for synthetic communities in plant–microbiota research

(2024)

Harnessing beneficial microorganisms is seen as a promising approach to enhance sustainable agriculture production. Synthetic communities (SynComs) are increasingly being used to study relevant microbial activities and interactions with the plant host. Yet, the lack of community standards limits the efficiency and progress in this important area of research. To address this gap, we recommend three actions: (1) defining reference SynComs; (2) establishing community standards, protocols and benchmark data for constructing and using SynComs; and (3) creating an infrastructure for sharing strains and data. We also outline opportunities to develop SynCom research through technical advances, linking to field studies, and filling taxonomic blind spots to move towards fully representative SynComs.

Cover page of VISTA Enhancer browser: an updated database of tissue-specific developmental enhancers

VISTA Enhancer browser: an updated database of tissue-specific developmental enhancers

(2024)

Regulatory elements (enhancers) are major drivers of gene expression in mammals and harbor many genetic variants associated with human diseases. Here, we present an updated VISTA Enhancer Browser (https://enhancer.lbl.gov), a database of transgenic enhancer assays conducted in developing mouse embryos in vivo. Since the original publication in 2007, the database grew nearly 20-fold from 250 to over 4500 experiments and currently harbors over 23 500 images. The updated database provides structured information on experiments conducted at different stages of embryonic development, including enhancer activities of human pathogenic and synthetic variants and sequences derived from a variety of species. In addition to manually curated results of thousands of individual experiments, the new database also features hundreds of manually curated comparisons between alleles. The VISTA Enhancer Browser provides a crucial resource for study of human genetic variation, gene regulation and developmental biology.