Skip to main content
eScholarship
Open Access Publications from the University of California

The parallel problems server: an interactive tool for large scale machine learning

Abstract

Imagine that you wish to classify data consisting of tens of thousands of examples residing in a twenty-thousand-dimensional space. How can one apply standard machine learning algorithms? We describe the Parallel Problem Server (PPServer) and MATLAB*P. In tandem they allow users of networked computers to work transparently on large data sets from within Matlab. This work is motivated by the desire to bring the many benefits of scientific computing algorithms and computational power to machine learning researchers.We demonstrate the usefulness of the system on a number of tasks. For example, we perform independent components analysis on very large text corpora consisting of tens of thousands of documents, making minimal changes to the original Bell and Sejnowski Matlab source (Bell and Sejnowski, 1995). Applying ML techniques to data previously beyond their reach leads to interesting analyses of both data and algorithms.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View