Active Learning and Confidence-rated Prediction
- Author(s): Zhang, Chicheng;
- Advisor(s): Chaudhuri, Kamalika;
- et al.
This thesis studies active learning and confidence-rated prediction, and the interplay between these two notions.
Active learning is a machine learning paradigm that allows a learner to perform label queries over the examples interactively. The goal of active learning is to get an accurate classifier using only a few label queries. In this thesis, we take a step further in the study of active learning, with a focus on active learning with complex queries. Specifically:
1. We study the problem of active learning with weak and strong labelers. We present a statistically consistent algorithm that has a lower cost complexity compared to learning with the strong labeler alone, under certain conditions.
2. We consider active learning with a novel type of queries, namely search queries. We show that in the setting of model selection, using the search queries can substantially reduce the labeling efforts for active learning.
Confidence-rated prediction considers the learning setting where the learned classifier is allowed to abstain, i.e. to predict ``I Don't know''. In this setting, abstaining rather than making a thoughtless classification decision may sometimes be preferable. In this thesis, we study confidence-rated prediction in batch and online settings, and advance the state of the art results. Specifically:
1. In the batch setting, we propose a linear program based algorithm that has some optimality properties, and has superior performance over previous approaches.
2. In the online setting, we propose an algorithm that achieves minimax optimal tradeoffs between its performance measures, and establish a novel combinatorial measure called Extended Littlestone's Dimension that characterizes the tradeoff.
Furthermore, we propose confidence-based active learning, establishing a connection between active learning and confidence-rated prediction. We show that our confidence-based active learning algorithm achieves statistical consistency, works for general hypothesis classes and data distribution, and has a lower label complexity compared to the state of the art active learning algorithms.