Abstract. We test the current generation of global chemistry–climate models in their ability to simulate observed, present-day surface ozone. Models are evaluated against hourly surface ozone from 4217 stations in North America and Europe that are averaged over 1° × 1° grid cells, allowing commensurate model–measurement comparison. Models are generally biased high during all hours of the day and in all regions. Most models simulate the shape of regional summertime diurnal and annual cycles well, correctly matching the timing of hourly (~ 15:00 local time (LT)) and monthly (mid-June) peak surface ozone abundance. The amplitude of these cycles is less successfully matched. The observed summertime diurnal range (~ 25 ppb) is underestimated in all regions by about 7 ppb, and the observed seasonal range (~ 21 ppb) is underestimated by about 5 ppb except in the most polluted regions, where it is overestimated by about 5 ppb. The models generally match the pattern of the observed summertime ozone enhancement, but they overestimate its magnitude in most regions. Most models capture the observed distribution of extreme episode sizes, correctly showing that about 80 % of individual extreme events occur in large-scale, multi-day episodes of more than 100 grid cells. The models also match the observed linear relationship between episode size and a measure of episode intensity, which shows increases in ozone abundance by up to 6 ppb for larger-sized episodes. We conclude that the skill of the models evaluated here provides confidence in their projections of future surface ozone.