Skip to main content
Download PDF
- Main
GBZ file format for pangenome graphs
- Sirén, Jouni;
- Paten, Benedict
- Editor(s): Martelli, Pier Luigi
Published Web Location
https://doi.org/10.1093/bioinformatics/btac656Abstract
Motivation
Pangenome graphs representing aligned genome assemblies are being shared in the text-based Graphical Fragment Assembly format. As the number of assemblies grows, there is a need for a file format that can store the highly repetitive data space efficiently.Results
We propose the GBZ file format based on data structures used in the Giraffe short-read aligner. The format provides good compression, and the files can be efficiently loaded into in-memory data structures. We provide compression and decompression tools and libraries for using GBZ graphs, and we show that they can be efficiently used on a variety of systems.Availability and implementation
C++ and Rust implementations are available at https://github.com/jltsiren/gbwtgraph and https://github.com/jltsiren/gbwt-rs, respectively.Supplementary information
Supplementary data are available at Bioinformatics online.Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.
Main Content
For improved accessibility of PDF content, download the file to your device.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Page Size:
-
Fast Web View:
-
Preparing document for printing…
0%