Content-based music classification systems attempt to predict musical attributes of songs directly from their audio content. Commonly, the ground truth labels are explicitly identified traits such as the genre of a piece of music. Ground truth annotations can also be derived implicitly from user listening patterns, leading to content-based music recommendation systems.
We improve upon previous work in content-based music recommendation in two ways. One, we match the Million Song Dataset (MSD) to the recent LFM-1b dataset, which is much larger than the standard Taste Profile Subset. Two, we train our model using deep convolutional architectures to predict latent factors directly from the audio of the music, instead of the standard practice of training on an intermediate time-frequency representation. We also evaluate the effectiveness of using latent factor prediction as a source task for tag prediction via transfer learning.