Skip to main content
Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM.
- Author(s): Li, Liqi;
- Yu, Sanjiu;
- Xiao, Weidong;
- Li, Yongsheng;
- Huang, Lan;
- Zheng, Xiaoqi;
- Zhou, Shiwen;
- Yang, Hua
- et al.
Published Web Locationhttps://doi.org/10.1186/1471-2105-15-340
BackgroundIdentification of the recombination hot/cold spots is critical for understanding the mechanism of recombination as well as the genome evolution process. However, experimental identification of recombination spots is both time-consuming and costly. Developing an accurate and automated method for reliably and quickly identifying recombination spots is thus urgently needed.
ResultsHere we proposed a novel approach by fusing features from pseudo nucleic acid composition (PseNAC), including NAC, n-tier NAC and pseudo dinucleotide composition (PseDNC). A recursive feature extraction by linear kernel support vector machine (SVM) was then used to rank the integrated feature vectors and extract optimal features. SVM was adopted for identifying recombination spots based on these optimal features. To evaluate the performance of the proposed method, jackknife cross-validation test was employed on a benchmark dataset. The overall accuracy of this approach was 84.09%, which was higher (from 0.37% to 3.79%) than those of state-of-the-art tools.
ConclusionsComparison results suggested that linear kernel SVM is a useful vehicle for identifying recombination hot/cold spots.
For improved accessibility of PDF content, download the file to your device.