Skip to main content
Open Access Publications from the University of California


UC San Francisco Previously Published Works bannerUCSF

Orchid: a novel management, annotation and machine learning framework for analyzing cancer mutations



As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks.


We present orchid, a python based software package for the management, annotation and machine learning of cancer mutations. Building on technologies of parallel workflow execution, in-memory database storage and machine learning analytics, orchid efficiently handles millions of mutations and hundreds of features in an easy-to-use manner. We describe the implementation of orchid and demonstrate its ability to distinguish tissue of origin in 12 tumor types based on 339 features using a random forest classifier.

Availability and implementation

Orchid and our annotated tumor mutation database are freely available at Software is implemented in python 2.7, and makes use of MySQL or MemSQL databases. Groovy 2.4.5 is optionally required for parallel workflow execution.


Supplementary information

Supplementary data are available at Bioinformatics online.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View