Many problems in statistics are inherently discrete. When one of these problems also contains an optimization component, integer programming may be used to facilitate a solution to the statistical problem. We use integer programming techniques to help solve problems in the following areas: optimal blocking of a randomized controlled experiment with several treatment categories and statistical auditing using stratified random samples.
We develop a new method for blocking in randomized experiments that works for an arbitrary number of treatments. We analyze the following problem: given a threshold for the minimum number of units to be contained in a block, and given a distance measure between any two units in the finite population, block the units so that the maximum distance between any two units within a block is minimized. This blocking criterion can minimize covariate imbalance, which is a common goal in experimental design. Finding an optimal blocking is an NP-hard problem. However, using ideas from graph theory, we provide the first polynomial time approximately optimal blocking algorithm for when there are more than two treatment categories. In the case of just two such categories, our approach is more efficient than existing methods. We derive the variances of estimators for sample average treatment effects under the Neyman-Rubin potential outcomes model for arbitrary blocking assignments and an arbitrary number of treatments.
In addition, statistical election audits can be used to collect evidence that the set of winners (the outcome) of an election according to the machine count is correct---that it agrees with the outcome that a full hand count of the audit trail would show. The strength of evidence is measured by the p<\italic>-value of the hypothesis that the machine outcome is wrong. Smaller p<\italic>-values are stronger evidence that the outcome is correct. Most states that have election audits of any kind require audit samples stratified by county for contests that cross county lines. Previous work on p<\italic>-values for stratified samples based on the largest weighted overstatement of the margin used upper bounds that can be quite weak. Sharper $p$-values than those found by previous work can be found by solving a 0-1 knapsack problem. We also give algorithms for choosing how many batches to draw from each stratum to reduce the counting burden.