Offline Evaluation of Multi-Armed Bandit Algorithms using Bootstrapped Replay on Expanded Data
- Author(s): Dai, Jin
- Advisor(s): Xu, Hongquan
- et al.
Online split experiments, particularly A/B testing, have been widely accepted by most internet companies nowadays as a primary tool to test differences between digital product variations. However, some of the new problems that businesses are struggling with today are not easily solvable by A/B testing due to its limitations. One of the significant drawbacks of A/B testing is resource allocation, which costs companies a tremendous amount of time and money every year. Companies could have avoided this issue by adopting other techniques such as bandit algorithms. In this thesis, we first review how classic A/B testing works and its inability to solve certain modern business problems. Further, we talk about the idea of multi-armed bandits, how it can help solve the explore-exploit dilemma and some popularly used strategies. Besides, we discuss the Replay methodology and its improved version using the bootstrap approach to solve the offline evaluation problem. In the last section, we apply the method to a real-world dataset and evaluate different bandit algorithms’ performance.