Skip to main content
eScholarship
Open Access Publications from the University of California

Rapid annotation of nifH gene sequences using classification and regression trees facilitates environmental functional gene analysis.

  • Author(s): Frank, Ildiko E
  • Turk-Kubo, Kendra A
  • Zehr, Jonathan P
  • et al.
Abstract

The nifH gene is a widely used molecular proxy for studying nitrogen fixation. Phylogenetic classification of nifH gene sequences is an essential step in diazotroph community analysis that requires a fast automated solution due to increasing size of environmental sequence libraries and increasing yield of nifH sequences from high-throughput technologies. A novel approach to rapidly classify nifH amino acid sequences into well-defined phylogenetic clusters that provides a common platform for comparative analysis across studies is presented. Phylogenetic group membership can be accurately predicted with decision tree-type statistical models that identify and utilize signature residues in the amino acid sequences. Our classification models were trained and evaluated with a publicly available and manually curated nifH gene database containing cluster annotations. Model-independent sequence sets from diverse ecosystems were used for further assessment of the models' prediction accuracy. The utility of this novel sequence binning approach was demonstrated in a comparative study where joint treatment of diazotroph assemblages from a wide range of habitats identified habitat-specific and widely-distributed diazotrophs and revealed a marine - terrestrial distinction in community composition. Our rapid and automated phylogenetic cluster assignment circumvents extensive phylogenetic analysis of nifH sequences; hence, it saves substantial time and resources in nitrogen fixation studies.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
Current View