Recent advances in unsupervised representation learning have led to a host of widely used AI tools, such as ChatGPT and Stable Diffusion. These tools have been the result of applying relatively simple training algorithms to massive models on massive GPU clusters, even larger amounts of unlabeled training data, and by tuning the algorithms on a host of labeled evaluation tasks. In this dissertation, we present methodologies to address removing each of these components when training models for representation learning, i.e. limited compute, limited training data, and limited evaluation data. This dissertation contains four main chapters that focus on data and label-efficient representation learning.
Data efficient representation learning focuses on learning useful representations with less data (labeled or unlabeled), which as discussed throughout this dissertation, can be particularly important for applications with limited data availability. Label efficient representation learning focuses on learning useful representations with little or no human annotations for the training data. As will be discussed, this is important for applications where it is often difficult or impossible to obtain accurately labeled data, such as in privacy sensitive fields or for applications with highly ambiguous label definitions.
The four chapters in this dissertation that address these topics include: (1) SelfAugment: Automatic Augmentation Policies for Self-Supervised Learning, which explored how to develop augmentation policies with little/no labeled training data and small amount of unlabeled data for unsupervised learning pipelines. (2) Data Efficient Self-Supervised Representation Learning, which explored how to leverage a form of hierarchical pretraining for 80x more data efficient pretraining. (3) Region Similarity Representation Learning, which explored one of the first methods for learning region-level representation by performing contrastive learning at a region (patch-based) level and let to substantial improvements for downstream tasks such as object detection/segmentation when few labeled data were available. (4) Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning, which explored methods for leveraging known scale information for geospatial representation learning.