Skip to main content
Open Access Publications from the University of California

UC Irvine

UC Irvine Electronic Theses and Dissertations bannerUC Irvine

Prediction of Firms’ Annual and Quarterly Return Using NLP Techniques


In this thesis, we investigate the fact that changes in firms’ annual and quarterly report language are useful signals to predict firms’ future returns. A dictionary for all firms filings submitted to the Edgar website from 1993 to present is created. The filings then needed to be parsed and stripped to be able to extract useful information from each document. To measure the change in the language of reporting, several similarity measures are introduced to compute future return indicators based on those. Then, merged COMPUSTAT data, which has the firms’ book value, with this information and created features to train the regressor. Also, to be able to train the regressor to predict future returns, we used CRSP data set, which provides each firms’ return for each month. The idea is from Lazy Prices paper by Cohen, Malloy, and Nguyen.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View