Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Learning Structured Distributions: Power-Law and Low-Rank

Abstract

Utilizing the structure of a probabilistic model can significantly

increase its compression efficiency and learning speed. We consider

these potential improvements under two naturally-omnipresent

structures.

Power-Law: English words and many other natural phenomena are

well-known to follow a power-law distribution. Yet this ubiquitous

structure has never been shown to help compress or predict these

phenomena. It is known that the class of unrestricted distributions

over alphabet of size k and blocks of length n can never be

compressed with diminishing per-symbol redundancy, when k>n. We

show that under power-law structure, in expectation we can compress

with diminishing per-symbol redundancy for k growing as large as

sub-exponential in n.

For learning a power-law distribution, we rigorously explain the

efficacy of the absolute-discount estimator using less pessimistic

notions. We show that (1) it is adaptive to an effective

dimension and (2) it is strongly

related to the Good--Turing estimator and inherits its

competitive properties.

Low-Rank: We study learning low-rank conditional probability

matrices under expected KL-risk. This choice accentuates smoothing,

the careful handling of low-probability elements. We define a loss

function, determine sample-complexity bound for its global minimizer,

and show that this bound is optimal up to logarithmic terms. We

propose an iterative algorithm that extends classical non-negative

matrix factorization to naturally incorporate additive smoothing and

prove that it converges to the stationary points of our loss function.

Power-Law and Low-Rank: We consider learning distributions in

the presence of both low-rank and power-law structures. We study

Kneser-Ney smoothing, a successful estimator for the N-gram language

models through the lens of competitive distribution estimation. We

first establish some competitive properties for the contextual

probability estimation problem. This leads to Partial Low Rank,

a powerful generalization of Kneser-Ney that we conjecture to have

even stronger competitive properties. Empirically, it significantly

improves the performance on language modeling, even matching the

feed-forward neural models, and gives similar gains on the task of

predicting attack types for the Global Terrorism Database.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View