We present a generative graphical model and stochastic filtering algorithm for simultaneous tracking of 3D rigid and nonrigid motion, object texture, and background texture from single-camera video. The inference procedure takes advantage of the conditionally Gaussian nature of the model using Rao-Blackwellized particle filtering, which involves Monte Carlo sampling of the nonlinear component of the process and exact filtering of the linear Gaussian component. The smoothness of image sequences in time and space is exploited using Gauss-Newton optimization and Laplace's method to generate proposal distributions for importance sampling. Our system encompasses an entire continuum from optic flow to template-based tracking, elucidating the conditions under which each method is optimal, and introducing a related family of new tracking algorithms. We demonstrate an application of the system to 3D nonrigid face tracking. We also introduce a new method for collecting ground truth information about the position of facial features while filming an unmarked subject, and introduce a data set created using this technique. We develop a neurally plausible method for learning the models used for 3D face tracking, a method related to learning factorial codes. Factorial representations play a fundamental role in cognitive psychology, computational neuroscience, and machine learning. Independent component analysis pursues a form of factorization proposed by Barlow [1994] as a model for coding in sensory cortex. Morton proposed a different form of factorization that fits a wide variety of perceptual data [Massaro, 1987]. Recently, Hinton [2002] proposed a new class of models that exhibit yet another form of factorization. Hinton also proposed an objective function, contrastive divergence, that is particularly effective for training models of this class. We analyze factorial codes within the context of diffusion networks, a stochastic version of continuous time, continuous state recurrent neural networks. We demonstrate that a particular class of linear diffusion networks models precisely the same class of observable distributions as factor analysis. This suggests novel nonlinear generalizations of factor analysis and independent component analysis that could be implemented using interactive noisy circuitry. We train diffusion networks on a database of 3D faces by minimizing contrastive divergence, and explain how diffusion networks can learn 3D deformable models from 2D data