Deep model-based reinforcement learning methods offer a conceptually simple approach to the decision-making and control problem: use learning for the purpose of estimating an approximate dynamics model, and offload the rest of the work to classical trajectory optimization. However, this combination has a number of empirical shortcomings, limiting the usefulness of model-based methods in practice. The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems. We begin by generalizing the dynamics model itself, replacing the standard single-step formulation with a model that predicts over probabilistic latent horizons. The resulting model, trained with a generative reinterpretation of temporal difference learning, leads to infinite-horizon variants of the procedures central to model-based control, including the model rollout and model-based value estimation.
Next, we show that poor predictive accuracy of commonly-used deep dynamics models is a major bottleneck to effective planning, and describe how to use high-capacity sequence models to overcome this limitation. Framing reinforcement learning as sequence modeling simplifies a range of design decisions, allowing us to dispense with many of the components normally integral to reinforcement learning algorithms. However, despite their predictive accuracy, such sequence models are limited by the search algorithms in which they are embedded. As such, we demonstrate how to fold the entire trajectory optimization pipeline into the generative model itself, such that sampling from the model and planning with it become nearly identical. The culmination of this endeavor is a method that improves its planning capabilities, and not just its predictive accuracy, with more data and experience. Along the way, we highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, classifier-guided sampling, and image inpainting, can be reinterpreted as viable planning strategies for reinforcement learning problems.