In this thesis we study the following basic machine learning task: Given a fixed set of n input points in a d-dimensional linear regression problem, we wish to predict a hidden response value for each of the points. We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset. The performance of the predictions is evaluated by the total square loss on all responses. We show that a good approximate solution to this least squares problem can be obtained from just dimension d many responses by using a joint sampling technique called volume sampling. Moreover, the least squares solution obtained for the volume sampled subproblem is an unbiased estimator of optimal solution based on all n responses. This unbiasedness is a desirable property that is not shared by standard subset selection techniques.
Motivated by these basic properties, we develop a theoretical framework for studying volume sampling, which leads to a number of new expectation formulas and statistical guarantees which are of importance not only to least squares regression but also numerical linear algebra in general. Our methods lead to several novel extensions of volume sampling, including a regularized variant, and we propose the first efficient algorithms which make this technique a practical tool in the machine learning toolbox. Finally, we provide experimental evidence which confirms our theoretical findings.