Skip to main content
eScholarship
Open Access Publications from the University of California

Exploiting tag correlations to improve multilabel learning

Abstract

This thesis looks at applying tags to musical songs as a multilabeling problem. We focus on the CAL500 dataset which summarizes 1704 student reviews into tags for 502 songs. This summarization loses information, so we create the CAL1700 dataset which uses each of the student reviews to generate a single multilabel. We develop a two-layer technique to exploit tag correlations. The first layer makes tag predictions based on data features. The second layer applies correlation information to these predictions to create the final prediction. Our two-layer technique differs from previous stacking methods because it trains both layers on the full training set. Our second layer is a weighted combination of first layer predictions. We look at learning tag correlations using linear-chain conditional random field models (CRF) and using these CRFs as the first layer in our two-layer technique. We exceed previously published results for CAL500 data using our two -layer technique with a linear-chain CRF first layer. We train this CRF using CAL1700 labels. We achieve 0.288 F1- macro (F1M) and 0.401 mean average precision (MAP) scores using 100 data features. The previous best was 0.207 F1M and 0.394 MAP. Our F1M results are significantly better than previously published results

Main Content
Current View