The LITSE Algorithm: Theory and Application
In this dissertation, we present a novel method -- the Learning with Iteration and Tree-based Search Estimation algorithm -- for the estimation of the malarial haplotype composition in one or more individuals and the corresponding haplotype population frequencies, focusing in particular on the case where individuals have been infected by more than one strain. This estimation must take place in the presence of pooled readings of the genetic composition of the parasites present.
The approach consists of the combination of a parameterized tree-based combinatorial search and a refinement phase incorporating the Expectation Maximization algorithm. The EM algorithm is particularly attractive as it is structured to be applied to situations involving both observed and unobserved information.
A test of an implementation of the algorithm on simulated data demonstrates its effectiveness in accurately estimating the haplotype compositions, both prior to and following the refinement. Its effectiveness established, the algorithm is then applied to a set of laboratory-produced malarial strain data.
In addition, the algorithm has also been made available to other researchers through a dedicated website allowing submissions and the downloading of results.
While the current research focused on the application of the method to malarial parasites, the method is general enough to be applied to cases of infection by other organisms.
Finally, the dissertation presents several suggestions for future work in enhancing the algorithm both computationally and statistically and extending its scope to related research topics.