Humans frequently overestimate the likelihood of desirableevents while underestimating the likelihood of undesirableones: a phenomenon known as unrealistic optimism. Previ-ously, it was suggested that unrealistic optimism arises fromasymmetric belief updating, with a relatively reduced codingof undesirable information. Prior studies have shown that areinforcement learning (RL) model with asymmetric learningrates (greater for a positive prediction error than a negativeprediction error) could account for unrealistic optimism in abandit task, in particular the tendency of human subjects topersistently choosing a single option when there are multi-ple equally good options. Here, we propose an alternativeexplanation of such persistent behavior, by modeling humanbehavior using a Bayesian hidden Markov model, the Dy-namic Belief Model (DBM). We find that DBM captures hu-man choice behavior better than the previously proposed asym-metric RL model. Whereas asymmetric RL attains a measureof optimism by giving better-than-expected outcomes higherlearning weights compared to worse-than-expected outcomes,DBM does so by progressively devaluing the unchosen op-tions, thus placing a greater emphasis on choice history inde-pendent of reward outcome (e.g. an oft-chosen option mightcontinue to be preferred even if it has not been particularly re-warding), which has broadly been shown to underlie sequentialeffects in a variety of behavioral settings. Moreover, previouswork showed that the devaluation of unchosen options in DBMhelps to compensate for a default assumption of environmentalnon-stationarity, thus allowing the decision-maker to both bemore adaptive in changing environments and still obtain near-optimal performance in stationary environments. Thus, thecurrent work suggests both a novel rationale and mechanismfor persistent behavior in bandit tasks.