Skip to main content
Open Access Publications from the University of California

Small sample statistics for classification error rates II: confidence intervals and significance tests


Several techniques for estimating the range of uncertainty of estimated error rates and for estimating the significance of observed differences in error rates are explored in this paper. Textbook formulas which assume a large test set (i.e., a normal distribution) are commonly used to approximate the confidence limits of error rates or as an approximate significance test for comparing error rates. Expressions for determining more exact limits and significance levels for small samples are given here, and criteria are also given for determining when these more exact methods should be used. The assumed normal distribution gives a poor approximation to the confidence interval in most cases, but is usually useful for significance tests when the proper mean and variance expressions are used. A commonly used ±2σ significance test uses an improper expression for σ, which is too low and leads to a high likelihood of Type I errors. Common machine learning methods for estimating significance from observations on a single sample may be unreliable.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View