The thesis studies the problem of multi-label text classification, and argues that it could benefit from bringing the question into the stage of language understanding. In specific, rather than limit the use of annotated labels to providing supervision in classification only, we also rely on them as auxiliary information to guide the learning of an effective representation that is tangent to the down-stream task. Two approaches are discussed: a) learn a label-word attention layer for composition of word embedding into document vectors; b) learn a high-level latent abstraction via an auto-encoder generative model with structured priors conditional on labels. We introduce two designs of label-enhanced representation learning: Label-embedding Attention Model (LEAM) and Conditional Variational Document model (CVDM) with application on real-world datasets, in order to demonstrate their ability in promoting the classification performances with improved interpretability.