Deep reinforcement learning agents such as AlphaZero have achieved superhuman strength in complex combinatorial games. By contrast, the cognitive science of planning has mostly focused on simple tasks for experimental and computational tractability. Using a board game that strikes a balance between complexity and tractability, we find that AlphaZero agents improve in value function quality and planning depth through learning, similar to human in previous modeling work. In addition, these metrics reflect causal contributions to AlphaZero's playing strength. Yet the strongest contributor is the policy quality. The decrease in policy entropy also drives the increase in planning depth. The contribution of planning depth to performance is lessened in late training. These results contribute to a joint understanding of machine and human planning, providing an interpretable way of understanding the learning and strength of AlphaZero, while generating novel hypothesis on human planning.