This dissertation develops and analyzes active learning algorithms for binary classification problems. In passive (non-active) learning, a learner uses a random sample of labeled examples from a fixed distribution to select a hypothesis with low error. In active learning, a learner receives only a sample of unlabeled data, but has the option to query the label of any of these data points. The hope is that the active learner needs to query the labels of just a few, carefully chosen points in order to produce a hypothesis with low error. The first part of this dissertation develops algorithms based on maintaining a version space---the set of hypotheses still in contention to be selected. The version space is specifically designed to tolerate arbitrary label noise and model mismatch in the agnostic learning model. The algorithms maintain the version space using a reduction to a special form of agnostic learning that allows for example-based constraints; this represents a computational improvement over previous methods. The generalization behavior of one of these algorithms is rigorously analyzed using a quantity called the disagreement coefficient. This algorithm is shown to have label complexity that improves over that of previous methods, and matches known label complexity lower bounds in certain cases. The second part of this dissertation develops algorithms based on simpler reductions to agnostic learning that more closely match the standard abstraction of supervised learning procedures. The generalization behavior of these algorithms are also analyzed in the agnostic learning model, and are shown to have label complexity similar to the version space methods. Therefore, these algorithms represent qualitative improvements over version space methods, as strict version space methods can be risky to deploy in practice. The first of these algorithms is based on a relaxation of a version space method, and the second is based on an importance weighting technique. The second algorithm is also shown to automatically adapt to various noise conditions that imply a tighter label complexity analysis. Experiments using this algorithm are also presented to illustrate some of the promise of the method