Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Modeling of Biochemical States of DNA Replication Using Hidden Markov Models

Abstract

In nanopore experiments, DNA replication facilitated by $\phi 29$ DNA polymerase (DNAP) can be observed at the single molecule level. The biochemical state of the DNA-DNAP complex was studied by setting the complex atop a $\alpha$-hemolysin nanopore and applying an electric voltage. The movement of the DNA strand relative to the nanopore was observed on a single base pair level by the ionic current blockade. The time trace of the recorded ionic current amplitude from these experiments was used to study the biochemical states. Given that the recorded amplitude of the ionic current was an indirect measurement of the true amplitude level, which in turn was an indirect measurement of the true biochemical state, the experiments were modeled as a Hidden Markov Chain (HMC). When the DNA position of two biochemical states relative to the nanopore is the same, the states yield the same current amplitude level. To extract the dynamic transition rates between biochemical states that are not distinguishable in amplitude level, two methodologies were applied to study the HMC. The first was a fully Bayesian model, for which Markov chain Monte-Carlo (MCMC) simulations were used to infer the reaction rates in a system of three biochemical states with two observed amplitude levels. The second model adopted concepts of Viterbi training or the segmental k-means algorithm to find point estimates of the transition rates. Given the low transition probabilities, the properties of the second model led to a substantial bias in inference. The bias was addressed by first using a meta-model to describe the relation between the generating transition rates and the biased inference. Then the inverse problem of the meta-model was solved to reduce the bias in the inference. The meta-model was a fully Bayesian Gaussian process model, built by creating a series of computer-generated datasets around the given dataset. Improved inference was obtained by drawing posterior samples from the inverse of the meta-model. Since both the MCMC simulations and bias reduction techniques resulted in simulated posteriors, the two methods were used to confirm each other. In the comparison of these two methods, the approach of the second model plus bias reduction has the advantage of achieving similar inference accuracy at a much lower computational cost.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View