Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Previously Published Works bannerUC Berkeley

AncestralClust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees

Published Web Location

https://nielsen-lab.github.io/pdfs/papers/clusteringsupp.pdf
No data is associated with this publication.
Creative Commons 'BY-NC-ND' version 4.0 license
Abstract

Motivation

Clustering is a fundamental task in the analysis of nucleotide sequences. Despite the exponential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. Traditional clustering methods have mostly focused on optimizing high speed clustering of highly similar sequences. We develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences.

Results

We describe a clustering program AncestralClust, which is developed for clustering divergent sequences. We compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. We show that, in divergent datasets, AncestralClust has higher accuracy and more even cluster sizes than current popular methods.

Availability and implementation

AncestralClust is an Open Source program available at https://github.com/lpipes/ancestralclust.

Supplementary information

Supplementary data are available at Bioinformatics online.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Item not freely available? Link broken?
Report a problem accessing this item