In curriculum learning, teaching involves cooperative selection of sequences of data via plans to facilitate efficient and effective learning.
One-off cooperative selection of data has been mathematically formalized as entropy-regularized optimal transport and the limiting behavior of myopic sequential interactions has been analyzed, both yielding theoretical and practical guarantees.
We recast sequential cooperation with curriculum planning in a reinforcement learning framework and analyze performance mathematically and by simulation.
We prove that infinite length plans are equivalent to not planning under certain assumptions on the method of planning, and isolate instances where monotonicity and hence convergence in the limit hold, as well as cases where it does not. We also demonstrate through simulations that argmax data selection is the same across planning horizons and demonstrate problem-dependent sensitivity of learning to the teacher's planning horizon. Thus, we find that planning ahead yields efficiency at the cost of effectiveness.