The parallel problems server: an interactive tool for large scale machine learning
- Author(s): Husbands, Parry
- et al.
Imagine that you wish to classify data consisting of tens of thousands of examples residing in a twenty-thousand-dimensional space. How can one apply standard machine learning algorithms? We describe the Parallel Problem Server (PPServer) and MATLAB*P. In tandem they allow users of networked computers to work transparently on large data sets from within Matlab. This work is motivated by the desire to bring the many benefits of scientific computing algorithms and computational power to machine learning researchers.We demonstrate the usefulness of the system on a number of tasks. For example, we perform independent components analysis on very large text corpora consisting of tens of thousands of documents, making minimal changes to the original Bell and Sejnowski Matlab source (Bell and Sejnowski, 1995). Applying ML techniques to data previously beyond their reach leads to interesting analyses of both data and algorithms.