This paper applies various statistical techniques with the goal of maximizing model performance for the task ofclassification on a dataset with heavily imbalanced classes. A dataset is created by combining several sources into
one comprehensive dataset. Exploratory data analysis will be performed to understand the available factors, their
corresponding distributions and relationship to the outcome variable. Then steps will be taken to prepare the data
for the task of classification. Next, a collection of different training set sampling strategies will be outlined using
methods such as Random Over Sampling, Random Under Sampling and Synthetic Minority Oversampling
Technique. Machine learning models such as Random Forest Classifiers will be fitted for each of the sets of
parameters and the model fit will be evaluated on the test set in order to provide insight into the differences of
various sampling techniques in the imbalanced classification task. Metrics used to evaluate model fit will include
traditional statistical measures as well as other strategies that more closely align with the specific business
problem.