In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number ofchallenging task domains, but are constrained by a demand for large training sets. A critical present objective is thus to developdeep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge,which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trainedusing one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learnedRL algorithm can differ from the original one in arbitrary ways and exploit structure in the training domain. We unpack thesepoints in five proof-of-concept experiments to examine key aspects of deep meta-RL.