Skip to main content
eScholarship
Open Access Publications from the University of California

Housing Sale Price Prediction Using Machine Learning Algorithms

  • Author(s): Zhou, Yichen
  • Advisor(s): Wu, Yingnian
  • et al.
Abstract

In this thesis, I explore how predictive modeling can be applied in housing sale price prediction by analyzing the housing dataset and use machine learning models. Actually, I try four different models, namely, linear regression, lasso regression, randomforest and xgboost. Additionally, as the data have 79 explanatory variables with many missing values, I spend much time dealing with the data. I do explorary data analysis, feature enginnering before model fitting. And then using rmse and R-squared to measure the model performance. After I try four different models, I get some results. As for the first model - linear regression, it doesn’t meet the assumption of equality of the variances. Therefore we can’t use the linear model as the candidate of our final model. Then I try lasso regression, but the RMSE and R-squared looks not so good. Then I try Random forest. The R squared in this model of training set is very good, but in the test set the R squared is relatively low, which may show the RF model is a little bit overfitting. Finally I try the fourth model - xgboost. All of the results of this xgboost model seem very good. Therefore, I will use this xgboost model as my final model to predict the housing price. The xgboost model also shows which variables have important effects on sale price.

Main Content
Current View