With the advent of massively parallel high-throughput sequencing, geneticists have the technology to answer many problems. What we lack are analytical tools. As the amount of data from these sequencers continues to overwhelm much of the current analytical tools, we must come up with more efficient methods for analysis. One potentially useful tool is the MM, majorize-minimize or minorize-maximize, algorithm.
The MM algorithm is an optimization method suitable for high-dimensional problems. It can avoid large matrix inversions, linearize problems, and separate parameters. Additionally it deals with constraints gracefully and can turn a non-differentiable problem into a smooth one. These benefits come at the cost of iteration.
In this thesis we apply the MM algorithm in the optimization of three problems. The first problem we tackle is an extension of random graph theory by Erdos. We extend the model by relaxing two of the three underlying assumptions, namely any number of edges can form between two nodes and edges form with a Poisson probability with mean dependent on the two nodes. This is aptly named a random multigraph.
The next problem extends random multigraphs to include clustering. As before, any number of edges can still form between two nodes. The difference is now the number of edges formed between two nodes is Poisson distributed with mean dependent on the two nodes along with their clusters.
For our last problem we place individuals onto the map using their genetic information. Using a binomial model with a nearest neighbor penalty, we estimate allele frequency surfaces for a region. With these allele frequency surfaces, we calculate the posterior probability that an individual comes from a location by a simple application of Bayes' rule and place him at his most probable location. Furthermore, with an additional model we estimate admixture coefficients of individuals across a pixellated landscape.
Each of these problems contain an underlying optimization problem which is solved using the MM algorithm. To demonstrate the utility of the models we applied them to various genetic datasets including POPRES, OMIM, gene expression, protein-protein interactions, and gene-gene interactions. Each example yielded interesting results in reasonable time.