Skip to main content
Download PDF
- Main
Use solid k-mers in minHash-based genome distance estimation
- Zheng, An
- Advisor(s): Pevzner, Pavel
Abstract
MinHash is a popular method for genome distance estimation. However, its requirement for input data quality is relatively strict, and its performance deteriorates if the input sequences are generated from sequencers with high sequencing error rates, especially from long-read sequencers. To solve this problem, in this thesis, we use solid (frequently occurring) k-mers as input to feed MinHash, and prove the effectiveness of this solid k-mer powered MinHash by comparing its performance in genome distance estimation with regular MinHash. In addition, we also discuss how to select the optimal threshold for solid k-mers in order to make the most of our solid k-mer powered MinHash.
Main Content
For improved accessibility of PDF content, download the file to your device.
Enter the password to open this PDF file:
File name:
-
File size:
-
Title:
-
Author:
-
Subject:
-
Keywords:
-
Creation Date:
-
Modification Date:
-
Creator:
-
PDF Producer:
-
PDF Version:
-
Page Count:
-
Page Size:
-
Fast Web View:
-
Preparing document for printing…
0%