Efficient inference algorithms for near-deterministic systems
- Author(s): Chatterjee, Shaunak
- Advisor(s): Russell, Stuart J
- et al.
This thesis addresses the problem of performing probabilistic inference in stochastic systems where the probability mass is far from uniformly distributed among all possible outcomes. Such near-deterministic systems arise in several real-world applications. For example, in human physiology, the widely varying evolution rates of physiological variables make certain trajectories much more likely than others; in natural language, a very small fraction of all possible word sequences accounts for a disproportionately high amount of probability under a language model. In such settings, it is often possible to obtain significant computational savings by focusing on the outcomes where the probability mass is concentrated. This contrasts with existing algorithms in probabilistic inference---such as junction tree, sum product, and belief propagation algorithms---which are well-tuned to exploit conditional independence relations.
The first topic addressed in this thesis is the
structure of discrete-time temporal graphical models of
near-deterministic stochastic processes. We show how the structure
depends on the ratios between the size of the time step and the
effective rates of change of the variables. We also prove that accurate
approximations can often be obtained by sparse structures even for very
large time steps. Besides providing an intuitive reason for causal sparsity in discrete temporal models, the sparsity also speeds up inference.
The next contribution is an eigenvalue algorithm for a linear factored system (e.g., dynamic Bayesian network), where existing algorithms do not scale since the size of the system is exponential in the number of variables. Using a combination of graphical model inference algorithms and numerical methods for spectral analysis, we propose an approximate spectral algorithm which operates in the factored representation and is exponentially faster than previous algorithms.
The third contribution is a temporally abstracted Viterbi (TAV) algorithm. Starting with a spatio-temporally abstracted coarse representation of the original problem, the TAV algorithm iteratively refines the search space for the Viterbi path via spatial and temporal refinements. The algorithm is guaranteed to converge to the optimal solution with the use of admissible heuristic costs in the abstract levels and is much faster than the Viterbi algorithm for near-deterministic systems.
The fourth contribution is a hierarchical image/video segmentation algorithm, that shares some of the ideas used in the TAV algorithm. A supervoxel tree provides the abstraction hierarchy for this application. The algorithm starts working with the coarsest level supervoxels, and refines portions of the tree which are likely to have multiple labels. Several existing segmentation algorithms can be used to solve the energy minimization problem in each iteration, and admissible heuristic costs once again guarantee optimality. Since large contiguous patches exist in images and videos, this approach is more computationally efficient than solving the problem at the finest level of supervoxels.
The final contribution is a family of Markov Chain Monte Carlo (MCMC) algorithms for near-deterministic systems when there exists an efficient algorithm to sample solutions for the corresponding deterministic problem. In such a case, a generic MCMC algorithm's performance worsens as the problem becomes more deterministic despite the existence of the efficient algorithm in the deterministic limit. MCMC algorithms designed using our methodology can bridge this gap.
The computational speedups we obtain through the various new algorithms presented in this thesis show that it is indeed possible to exploit near-determinism in probabilistic systems. Near-determinism, much like conditional independence, is a potential (and promising) source of computational savings for both exact and approximate inference. It is a direction that warrants more understanding and better generalized algorithms.