Stock Price Prediction using Adaptive Time Series Forecasting and Machine Learning Algorithms
- Author(s): Chen, Lumeng
- Advisor(s): Wu, Yingnian
- et al.
In this thesis, ARIMA model, Long Short Term Memory (LSTM) model and Extreme Gradient Boosting (XGBoost) models were developed to predict daily adjusted close price of selected stocks from January 3, 2017 to April 24, 2020. Daily stock price data includes columns of open, close, adjusted close, high, low and volume. In ARIMA and LSTM models, the only features we used as model inputs were previous N days’ stock prices. Prediction on day N+1 was calculated based on previous N values. RMSE and MAPE were calculated from this rolling forecast and the actual price in the test dataset. Optimal parameters were selected to be the setting that yielded the lowest RMSE score. Residuals diagnostic was performed to check model assumption for the final ARIMA model. In XGBoost model, feature engineering was used to create two additional features from open, close, high and low price. Same with LSTM model, previous N days features were used as features in day N+1 for prediction. In both LSTM and XGBoost models, training dataset was scaled for model fitting. Features and output from cross-validation and test dataset were scaled too based on previous N days’ values. The prediction results were then reverted back to original scale before calculation of RMSE and MAPE scores.
In conclusion, looking at the prediction versus actual stock price plot for each stock and their RMSE and MAPE scores, all three models produced good forecast of next day’s stock price. However, during the time with great volatility, the lag between forecast value and actual value is more noticeable. In our models, historical N days stock price on its own could provide a relatively accurate prediction on N+1 day’s stock price. In XGBoost model particularly, we found out that N=2 provided better RMSE and MAPE(%) results than other larger values of N (previous N days). As N gets larger, prediction accuracy got lower in XGBoost. In XGBoost feature importance analysis, the most important factor to today’s stock price is its price yesterday. Although the final ARIMA model achieved the lowest RMSE score, grid search for one-step ARIMA forecast model parameters took the longest computing time, while XGBoost model with the second lowest RMSE score required the least time for parameter tuning and forecast calculation.