Deep learning has made great progress in solving many computer vision tasks for which labeled data is plentiful. But progress has been limited for tasks where labels are difficult or impossible to obtain. In this thesis, we propose alternative methods of supervised learning that do not require direct labels. Intuitively, although we do not know what the labels are, we might know various properties they should satisfy. The key idea is to formulate these properties as objectives for supervising the target task. We show that this kind of “meta-supervision” on how the output behaves, rather than what it is, turns out to be surprisingly effective in learning a variety of vision tasks.
The thesis is organized as follows. Part I proposes to use the concept of cycle-consistency as supervision for learning dense semantic correspondence. Part II proposes to use the task of view synthesis as supervision for learning different representations of scene geometry. Part III proposes to use adversarial supervision for learning gradual image transformations. Finally, we discuss the general concept of meta-supervision and how it can be applied to tasks beyond those presented in this thesis.