How humans process and utilize experienced outcomes and actions to adapt to a constantly evolving and noisy world is an important area of research. We investigate the role of the pupil-linked arousal system in adaptive value-based decision-making in an uncertain and changing environment using a two-armed bandit task with occasional changes in reward contingencies. We find that pupil size fluctuation encodes reward- and uncertainty-related values across trials; moreover, pupil size reflects future-choice-dependent contributions of these variables to learning and decision-making: larger pupil encoding of reward prediction error (RPE) promotes reward-driven switches in choice, while larger pupil encodings of estimation uncertainty (EU) promotes uncertainty-driven switches in choice. Furthermore, individual differences in pupil's encoding of RPE and EU correlate with individual variabilities in choice bias and task performance. Given the relationship of pupil size to noradrenergic and cholinergic modulations, these results provide insights into the computational and neural process underlying adaptive decision-making.