We discuss various statistical distributions of earthquake numbers. Previously we derived several discrete distributions to describe earthquake numbers for the branching model of earthquake occurrence: these distributions are the Poisson, geometric, logarithmic, and the negative binomial (NBD). The theoretical model is the `birth and immigration' population process. The first three distributions above can be considered special cases of the NBD. In particular, a point branching process along the magnitude (or log seismic moment) axis with independent events (immigrants) explains the magnitude/moment-frequency relation and the NBD of earthquake counts in large time/space windows, as well as the dependence of the NBD parameters on the magnitude threshold (magnitude of an earthquake catalogue completeness). We discuss applying these distributions, especially the NBD, to approximate event numbers in earthquake catalogues. There are many different representations of the NBD. Most can be traced either to the Pascal distribution or to the mixture of the Poisson distribution with the gamma law. We discuss advantages and drawbacks of both representations for statistical analysis of earthquake catalogues. We also consider applying the NBD to earthquake forecasts and describe the limits of the application for the given equations. In contrast to the one-parameter Poisson distribution so widely used to describe earthquake occurrence, the NBD has two parameters. The second parameter can be used to characterize clustering or over-dispersion of a process. We determine the parameter values and their uncertainties for several local and global catalogues, and their subdivisions in various time intervals, magnitude thresholds, spatial windows, and tectonic categories. The theoretical model of how the clustering parameter depends on the corner (maximum) magnitude can be used to predict future earthquake number distribution in regions where very large earthquakes have not yet occurred.