Shafto, Patrick; Sheller, Benjamin

The alignment problem in curriculum learning

2024

Creative Commons 'BY' version 4.0 license

Abstract

In curriculum learning, teaching involves cooperative selection of sequences of data via plans to facilitate efficient and effective learning. One-off cooperative selection of data has been mathematically formalized as entropy-regularized optimal transport and the limiting behavior of myopic sequential interactions has been analyzed, both yielding theoretical and practical guarantees. We recast sequential cooperation with curriculum planning in a reinforcement learning framework and analyze performance mathematically and by simulation. We prove that infinite length plans are equivalent to not planning under certain assumptions on the method of planning, and isolate instances where monotonicity and hence convergence in the limit hold, as well as cases where it does not. We also demonstrate through simulations that argmax data selection is the same across planning horizons and demonstrate problem-dependent sensitivity of learning to the teacher's planning horizon. Thus, we find that planning ahead yields efficiency at the cost of effectiveness.

Main Content

For improved accessibility of PDF content, download the file to your device.

Proceedings of the Annual Meeting of the Cognitive Science Society

The alignment problem in curriculum learning