Adaptive Learning Algorithms for Transferable Visual Recognition
Understanding visual scenes is a crucial piece in many artificial intelligence applications ranging from autonomous vehicles and household robotic navigation to automatic image captioning for the blind. Reliably extracting high-level semantic information from the visual world in real-time is key to solving these critical tasks safely and correctly. Existing approaches based on specialized recognition models are prohibitively expensive or intractable due to limitations in dataset collection and annotation. By facilitating learned information sharing between recognition models these applications can be solved; multiple tasks can regularize one another, redundant information can be reused, and the learning of novel tasks is both faster and easier.
In this thesis, I present algorithms for transferring learned information between visual data sources and across visual tasks - all with limited human supervision. I will both formally and empirically analyze the adaptation of visual models within the classical domain adaptation setting and extend the use of adaptive algorithms to facilitate information transfer between visual tasks and across image modalities.
Most visual recognition systems learn concepts directly from a large collection of manually annotated images/videos. A model which detects pedestrians requires a human to manually go through thousands or millions of images and indicate all instances of pedestrians. However, this model is susceptible to biases in the labeled data and often fails to generalize to new scenarios — a detector trained in Palo Alto may have degraded performance in Rome, or a detector trained in sunny weather may fail in the snow. Rather than require human supervision for each new task or scenario, this work draws on deep learning, transformation learning, and convex-concave optimization to produce novel optimization frameworks which transfer information from the large curated databases to real world scenarios.