Online Controlled Experiment Design: Trade-off between Statistical Uncertainty and Cumulative Reward
Online experiment is widely used in online advertising, web development to compare effects, e.g. click through rate, conversion rate, of different versions. Among all the designs, A/B testing is the most popular one. It randomly segments users into two groups with equal probability and shows them different versions. This method is easy to implement. However the shortcoming is also obvious: to measure both versions it cannot expose all users to the best version, which leads to potential loss of rewards, e.g. clicks and conversions. Though this loss is inevitable in experiment, it can be reduced somehow. Reducing the loss is essentially equivalent to maximizing cumulative reward, which is also the goal of typical multi-armed bandit problem. Thus, multi-armed bandit algorithms are proposed to reduce potential loss in experiment. Compared with A/B testing, multi-armed bandit algorithms produce more cumulative reward during experiment. However, they suffer from high statistical uncertainty: e.g. they need more users than A/B testing to reach particular statistical significance level.
To solve this problem, this paper aims at building a model to analyze two conflicting goals: reducing statistical uncertainty and maximizing cumulative reward. We develop an algorithm for online experiment to balance the trade-off between these two goals. Right now our analysis focuses on one kind of online experiment: batch updating binomial experiment. We first discuss several statistical uncertainty criterion and propose corresponding algorithms to optimize these criterion for experiment. Then we extend some multi-armed bandit algorithms to maximizing cumulative reward for batch updating problem. Besides that, we propose an new algorithm: sequential two stages (STS) to solve this problem. After that, an improved performance evaluation method, which integrates statistical uncertainty with cumulative reward, is put forwarded. Instead of simply combining two objective functions, this new measure, virtual future measure (VFM) establishes connection between statistical uncertainty and cumulative reward directly through virtual future reward. Compared with other method, our proposed algorithm STS is adaptable to optimize VFM.