Skip to main content
Open Access Publications from the University of California


UCLA Electronic Theses and Dissertations bannerUCLA

Statistical Analysis of The 2016 and 2017 NCAA Division-I Swimming Championships


This paper applies the implementation of web-scraping to create a single new dataset composed of eight separate competition results datasets. Exploratory analysis will be performed in large to identify the measurable reasons why most swimmers perform worse at the fastest collegiate competition in the nation. Additionally, using forward and backward stepwise variable selection, the impact of various factors on the outcome variable time difference will be studied. Machine learning algorithms such as ridge regression and lasso method will create models that predict the time difference between entry time and final time of a swimmer’s race. The mean squared error value will evaluate the overall performance of the models. Although many variables are created and used to best fit the final ridge regression model, there are unmeasurable factors that must be taken into account to accurately describe what impacts how fast a swimmer goes at the competition.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View