Search

Article

Discovery and annotation of small proteins using genomics, proteomics and computational approaches

LBL Publications (2011)

Small proteins (10 200 amino acids aa in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering, and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained 2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts fromthe EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10 200 aa in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: (1) codingpotential prediction, (2) evolutionary conservation between P. deltoides and other plant species, and (3) gene familyclustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were supported by proteomics data. Of the 611 highest-confidence candidate sORF genes, 56 were newto the current Populus genome annotation. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.

Cover page: Discovery and annotation of small proteins using genomics, proteomics and computational approaches

Article
Peer Reviewed

Bioenergy Underground: Challenges and opportunities for phenotyping roots and the microbiome for sustainable bioenergy crop production

UC Berkeley Previously Published Works (2022)

Bioenergy production often focuses on the aboveground feedstock production for conversion to fuel and other materials. However, the belowground component is crucial for soil carbon sequestration, greenhouse gas fluxes, and ecosystem function. Roots maximize feedstock production on marginal lands by acquiring soil resources and mediating soil ecosystem processes through interactions with the microbial community. This belowground world is challenging to observe and quantify; however, there are unprecedented opportunities using current methodologies to bring roots, microbes, and soil into focus. These opportunities allow not only breeding for increased feedstock production but breeding for increased soil health and carbon sequestration as well. A recent workshop hosted by the USDOE Bioenergy Research Centers highlighted these challenges and opportunities while creating a roadmap for increased collaboration and data interoperability through standardization of methodologies and data using F.A.I.R. principles. This article provides a background on the need for belowground research in bioenergy cropping systems, a primer on root system properties of major U.S. bioenergy crops, and an overview of the roles of root chemistry, exudation, and microbial interactions on sustainability. Crucially, we outline how to adopt standardized measures and databases to meet the most pressing methodological needs to accelerate root, soil, and microbial research to meet the pressing societal challenges of the century.

Article
Peer Reviewed

Plant Biosystems Design Research Roadmap 1.0

LBL Publications (2020)

Human life intimately depends on plants for food, biomaterials, health, energy, and a sustainable environment. Various plants have been genetically improved mostly through breeding, along with limited modification via genetic engineering, yet they are still not able to meet the ever-increasing needs, in terms of both quantity and quality, resulting from the rapid increase in world population and expected standards of living. A step change that may address these challenges would be to expand the potential of plants using biosystems design approaches. This represents a shift in plant science research from relatively simple trial-and-error approaches to innovative strategies based on predictive models of biological systems. Plant biosystems design seeks to accelerate plant genetic improvement using genome editing and genetic circuit engineering or create novel plant systems through de novo synthesis of plant genomes. From this perspective, we present a comprehensive roadmap of plant biosystems design covering theories, principles, and technical methods, along with potential applications in basic and applied plant biology research. We highlight current challenges, future opportunities, and research priorities, along with a framework for international collaboration, towards rapid advancement of this emerging interdisciplinary area of research. Finally, we discuss the importance of social responsibility in utilizing plant biosystems design and suggest strategies for improving public perception, trust, and acceptance.

Cover page: Plant Biosystems Design Research Roadmap 1.0

Article

Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

LBL Publications (2009)

Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

Cover page: Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

Article
Peer Reviewed

Thousands of small, novel genes predicted in global phage genomes

UC Berkeley Previously Published Works (2022)

Small genes (<150 nucleotides) have been systematically overlooked in phage genomes. We employ a large-scale comparative genomics approach to predict >40,000 small-gene families in ∼2.3 million phage genome contigs. We find that small genes in phage genomes are approximately 3-fold more prevalent than in host prokaryotic genomes. Our approach enriches for small genes that are translated in microbiomes, suggesting the small genes identified are coding. More than 9,000 families encode potentially secreted or transmembrane proteins, more than 5,000 families encode predicted anti-CRISPR proteins, and more than 500 families encode predicted antimicrobial proteins. By combining homology and genomic-neighborhood analyses, we reveal substantial novelty and diversity within phage biology, including small phage genes found in multiple host phyla, small genes encoding proteins that play essential roles in host infection, and small genes that share genomic neighborhoods and whose encoded proteins may share related functions.

Cover page: Thousands of small, novel genes predicted in global phage genomes

Article
Peer Reviewed

Genome sequencing and analysis of the model grass Brachypodium distachyon

UC Davis Previously Published Works (2010)