Meta Learning for Control
In this thesis, we discuss meta learning for control:
policy learning algorithms that can themselves generate algorithms that are highly customized towards a certain domain of tasks.
The generated algorithms can be orders of magnitudes faster than human-designed, general purpose algorithms.
We begin with a thorough review of existing policy learning algorithms for control, which motivates the need for better algorithms that can solve complicated tasks with affordable sample complexity.
Then, we discuss two formulations of meta learning.
The first formulation is meta learning for reinforcement learning, where the task is specified through a reward function, and the agent needs to improve its performance by acting in the environment, receiving scalar reward signals, and adjusting its strategy according to the information it receives.
The second formulation is meta learning for imitation learning, where the task is specified through an expert demonstration of the task, and the agent needs to mimic the behavior of the expert to achieve good performance under new situations of the same task, as measured by the underlying objective of the expert (which is not directly given to the agent).
We present practical algorithms for both formulations, and show that these algorithms can acquire sophisticated learning behaviors on par with learning algorithms designed by human experts, and can scale to complex, high-dimensional tasks.
We also analyze their current limitations, including challenges associated with long horizons and imperfect demonstrations, which suggest important venues for future work.
Finally, we conclude with several promising future directions of meta learning for control.