Learning to Navigate in Visual Environments
Artificially intelligent agents with some degree of autonomy in the real world must learn to complete visual navigation tasks. In this dissertation, we consider the learning problem of visual navigation, as well as implementation issues facing agents that utilize learned visual perception systems. We begin by formulating visual navigation tasks in the setting of deep reinforcement learning under partial observation. Previous approaches to deep reinforcement learning do not adequately address partial observation while remaining sample-efficient. Our first contribution is a novel deep reinforcement learning algorithm, advantage-based regret minimization (ARM), which learns robust policies in visual navigation tasks in the presence of partial observability. Next, we are motivated by performance bottlenecks arising from large scale supervised learning for training visual perception systems. Previous distributed training approaches are affected by synchronization or communication bottlenecks which limit their scaling to multiple compute nodes. Our second contribution is a distributed training algorithm, gossiping SGD, which avoids both synchronization and centralized communication. Finally, we consider how to train deep convolutional neural networks when inputs and activation tensors have high spatial resolution and do not easily fit in GPU memory. Previous approaches to reducing memory usage of deep convnets involve trading off between computation and memory usage. Our third and final contribution is an implementation of spatially parallel convolutions, which partition activation tensors along the spatial axes between multiple GPUs, and achieve practically linear strong scaling.