The alignment problem in curriculum learning
Skip to main content
eScholarship
Open Access Publications from the University of California

The alignment problem in curriculum learning

Abstract

In curriculum learning, teaching involves cooperative selection of sequences of data via plans to facilitate efficient and effective learning. One-off cooperative selection of data has been mathematically formalized as entropy-regularized optimal transport and the limiting behavior of myopic sequential interactions has been analyzed, both yielding theoretical and practical guarantees. We recast sequential cooperation with curriculum planning in a reinforcement learning framework and analyze performance mathematically and by simulation. We prove that infinite length plans are equivalent to not planning under certain assumptions on the method of planning, and isolate instances where monotonicity and hence convergence in the limit hold, as well as cases where it does not. We also demonstrate through simulations that argmax data selection is the same across planning horizons and demonstrate problem-dependent sensitivity of learning to the teacher's planning horizon. Thus, we find that planning ahead yields efficiency at the cost of effectiveness.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View