Prediction of Firms’ Annual and Quarterly Return Using NLP Techniques
In this thesis, we investigate the fact that changes in firms’ annual and quarterly report language are useful signals to predict firms’ future returns. A dictionary for all firms filings submitted to the Edgar website from 1993 to present is created. The filings then needed to be parsed and stripped to be able to extract useful information from each document. To measure the change in the language of reporting, several similarity measures are introduced to compute future return indicators based on those. Then, merged COMPUSTAT data, which has the firms’ book value, with this information and created features to train the regressor. Also, to be able to train the regressor to predict future returns, we used CRSP data set, which provides each firms’ return for each month. The idea is from Lazy Prices paper by Cohen, Malloy, and Nguyen.