Building autonomous agents that learn to make predictions and take actions in sequential environments is a central problem in artificial intelligence, with applications as diverse as personalized medicine, self-driving cars, finance, and scientific discovery. Despite impressive success in certain areas such as natural language, games, and robotic demonstrations, sequential prediction and decision-making remains challenging in the absence of known models, accurate environment simulators, short-range dependencies, and large and diverse datasets.
In this thesis, we formulate problems to capture challenging yet prevalent settings encountered in the real-world. Given the formulations, we then design reliable and efficient learning algorithms, leveraging recent advances in statistics and optimization. In the first part of the thesis, we consider the problem of learning to make predictions in unknown and only partially observed linear dynamical systems. Contrary to prior predictive models which fail in the presence of long-range dependencies, we design an algorithm that provably returns near-optimal predictions regardless of the system's degree of stability and forecast memory.
In the second part, we shift our attention to reinforcement learning (RL), the problem of learning to make decisions in an unknown sequential environment. We start by focusing on the offline setting, where the agent is only provided with a previously-collected dataset of interactions and does not have further access to the environment. We propose a new framework to study offline learning problems given datasets of any composition, ranging from expert-only to uniform coverage, and thus unifying two main offline learning paradigms: imitation learning and vanilla offline RL. Equipped with this framework, we design an algorithm based on pessimism in the face of uncertainty and prove that it is nearly optimal for any, possibly unknown dataset composition.
We then turn to the online setting, where the agent learns while interacting with the environment. In this setting, the agent faces a dilemma in each step: whether it should exploit the current knowledge and select a seemingly optimalaction or it should explore and visit different regions of the environment. We propose a framework that unifies common exploration methods by adding an adaptive regularizer to the standard RL objective. We show that a particular regularizer design yields a simple optimistic exploration strategy that enjoys fast optimization and efficient exploration, achieving state-of-the-art performance in several locomotion and navigation tasks when combined with deep neural networks.