How do people explore in order to gain rewards in uncer-tain dynamical systems? Within a reinforcement learningparadigm, control normally involves trading off between ex-ploration (i.e. trying out actions in order to gain more knowl-edge about the system) and exploitation (i.e. using currentknowledge of the system to maximize reward). We study anovel control task in which participants must steer a boat ona grid, aiming to follow a path of high reward whilst learninghow their actions affect the boat’s position. We find that partic-ipants explore strategically yet conservatively, exploring morewhen mistakes are less costly and practicing actions that willbe required later on.