As droughts have widespread social and ecological impacts, it is critical to develop long-term adaptation and mitigation strategies to reduce drought vulnerability. Climate models are important in quantifying drought changes. Here, we assess the ability of 285 CMIP6 historical simulations, from 17 models, to reproduce drought duration and severity in three observational data sets using the Standardized Precipitation Index (SPI). We used summary statistics beyond the mean and standard deviation, and devised a novel probabilistic framework, based on the Hellinger distance, to quantify the difference between observed and simulated drought characteristics. Results show that many simulations have less than ±10% error in reproducing the observed drought summary statistics. The hypothesis that simulations and observations are described by the same distribution cannot be rejected for more than 80% of the grids based on our H distance framework. No single model stood out as demonstrating consistently better performance over large regions of the globe. The variance in drought statistics among the simulations is higher in the tropics compared to other latitudinal zones. Though the models capture the characteristics of dry spells well, there is considerable bias in low precipitation values. Good model performance in terms of SPI does not imply good performance in simulating low precipitation. Our study emphasizes the need to probabilistically evaluate climate model simulations in order to both pinpoint model weaknesses and identify a subset of best-performing models that are useful for impact assessments.