Before RNA can be sequenced using next generation sequencing (NGS) technologies, it is first converted into cDNA (RNA-Seq). In 2016 Oxford Nanopore Technologies released their direct RNA nanopore sequencing technology, circumventing the requirement for cDNA. The native RNA is sequenced continuously from the 3' end through to the 5' end. Two limitations of this approach are: ambiguity in discriminating between full-length and truncated reads; and the requirement for a known invariable 3' end, such as the poly(A) tail.
In collaboration with New England Biolabs, we developed a technique to identify full-length native RNA nanopore reads by specifically labeling capped RNA 5' ends with a nanopore detectable sequence. Using this strategy, we aimed to identify individual high-confidence full-length human mRNA isoform scaffolds among ~4 million nanopore poly(A)-selected RNA reads. First, we exchanged the biological 5' m$7G cap for a modified cap bearing a 45-nucleotide oligomer. This oligomer improved 5' end sequencing and ensured identification of capped strands. Second, among these capped reads, we screened for 3' ends consistent with documented polyadenylation sites. This gave 185,434 high-confidence mRNA scaffolds, including 4,262 that represented isoforms absent from GENCODE. Most of these had transcription start sites internal to longer, previously identified mRNA isoforms. Combined with orthogonal data, these mRNA scaffolds provide decisive evidence for full-length mRNA isoforms.
In collaboration with the Ares lab, we developed a technique to label native RNA 3' ends with polyinosine. This step permits sequencing adapters to be ligated to both poly(A) and non-poly(A) RNA in a single sequencing experiment. Polyinosine tails are not known to naturally occur and produce a recognizable signal in nanopore ionic current data. These two features make it ideal for adapting a variety of RNA types while preserving native RNA 3' end sequence information. We implemented a Hidden Markov Model that identifies the polyinosine tail signal on the RNA 3' ends with 98.46% accuracy. This classifier can be used to filter the reads for a particular RNA 3' end type (e.g. separate nascent RNA from mature mRNA