Working memory is widely assumed to underlie multi-step planning, where representations of possible future actions and rewards are iteratively updated before determining a choice. But most working memory research focuses on a context where stimuli are presented simultaneously and the value of encoding each stimulus is independent of others. It is unclear how working memory functions in planning scenarios where the rewards of future actions unfold over time, are retained in working memory, and must be integrated for plan selection. To bridge this gap, we adapted a version of the "mouselab task" in which participants sequentially observe the reward at each node in a decision tree before selecting a plan that maximizes cumulative rewards. We specified a theoretical model to characterize the optimal encoding and maintenance strategy for this task given the working memory constraints, which trades off the cost of storing information with the potential benefit of informing later choices. The model encoded rewards in choice-relevant plans more often, in particular, rewards on the best and (to a lesser extent) worst plans. We then tested this hypotheses on human participants, who showed the same pattern in the accuracy of their explicit recall. Our study thus establishes an empirical and theoretical foundation for models of how people encode and maintain information during planning.