Learning sequential actions is an essential human ability, formost daily activities are sequential. We modify the serial reac-tion time (SRT) task, originally used to teach people a con-sistent sequence of button presses by cueing them with thenext target response, to record mouse movements, collectingcontinuous response trajectories. Further, we introduce a rein-forcement learning version of the paradigm in which the nexttarget is not cued. Instead, learners must explore response al-ternatives, and receive a penalty for each incorrect response,as well as a reward for a correct response. Participants arenot told that they are to learn a single deterministic sequenceof responses, nor that it will repeat (nor how often), nor howlong it is. Given the difficulty of the task, it is unsurprisingthat some learners performed poorly. However, many learn-ers performed remarkably well, and some acquired the full 10-item sequence within 10 repetitions. We compare the high- andlow-performers’ detailed results in this reinforcement learning(RL) task with a cued trajectory SRT task, finding both simi-larities and discrepancies. Finally, we note that humans in thistask outperform three standard RL models and have differentpatterns of errors that suggest future modeling directions.