- Main
Income Prediction Using Machine Learning Techniques
- Jo, Kahyun
- Advisor(s): Schoenberg, Frederic P.
Abstract
This thesis presents a comprehensive study on predicting income levels, specifically predicting whether individuals earn more than $50,000 per year, with advanced machine learning techniques, using various demographic predictor variables such as capital gain, education level, relationship, occupation, and capital loss. The prediction of income levels is crucial for elucidating economic disparities and informing policy decisions. Utilizing the Adult Income dataset from the UCI Machine Learning Repository, which comprises demographic and socio-economic variables, the research entails a thorough evaluation of each model’s performance. The methodology involves a preprocessing stage to ensure data quality, followed by the application of various machine learning algorithms including, but not limited to, Logistic Regression, k-Nearest Neighbors, Decision Trees, Random Forests, Support Vector Machines, and Neural Networks. A significant focus is placed on systematic hyper-parameter tuning to fine-tune models, particularly with the complex structures of Neural Networks and Random Forests. The findings indicate that Random Forest models exhibit superior performance in income prediction tasks across most metrics, including accuracy, sensitivity, precision, specificity, F1 score, AUC, and RMSE. The Baseline Random Forest achieves the best accuracy (86.410%), specificity (88.600%), and RMSE (0.315), suggesting strong overall performance and well-calibrated probabilities. The Tuned Random Forest achieves the highest AUC (94.964%) and F1 score (82.057%), indicating strong overall performance and an effective balance between precision and recall.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-