Skip to main content
Open Access Publications from the University of California

Bayesian Nonparametric Models on Big Data

  • Author(s): Ozcan, Fulya
  • Advisor(s): Poirier, Dale J.
  • et al.
Creative Commons Attribution 4.0 International Public License

This thesis focuses on the role investor type and sentiment play in financial markets, using data from social media. First paper investigates the effect of the interaction between asset maturity and liquidity restrictions in ``on-the-run" phenomenon, using asset markets with search frictions. Under the presence of search frictions, investors would not prefer holding assets with very short time to maturity, since they will need to go back to the market and search for a counterpart to buy new assets every time their assets mature, incurring a search cost. However, they also would not want to hold assets with very long time to maturity due to liquidity considerations. An asset search model is set up to determine asset choices of investors with different liquidity preferences. Model considers two assets that differ in their maturities and two investor types who differ in their liquidity preferences. Main finding of this paper is that liquidity cost matters in the presence of search frictions as the model predicts a separating equilibrium where high type agents choose the long term asset and the low type agents choose the short term asset. When the two assets have the same time-to-maturities, the same separating equilibrium is obtained. Spread of the long term asset is found to be higher than that of the short term asset, which goes in line with the data and hence this paper shows that ``on-the-run" phenomenon can be explained by higher search frictions in the off-the-run markets and investors with different liquidity preferences.

Second paper predicts intra-day foreign exchange rates by making use of trending topics from Twitter, using a sentiment based topic clustering algorithm. Twitter trending topics data provide a good source of high frequency information, which would improve the short-term or intra-day exchange rate predictions. This project uses an online dataset, where trending topics in the world are fetched from Twitter every ten minutes since July 2013. First, using a sentiment lexicon, the trending topics are assigned a sentiment (negative, positive, or uncertain), and then using a continuous Dirichlet process mixture model, the trending topics are clustered regardless of whether they are explicitly related to the currency under consideration. This unique approach enables to capture the general sentiment among users, which implicitly affects the currencies. Finally, the exchange rates are estimated using a linear model which includes the topic based sentiment series and the lagged values of the currencies, and a VAR model on the topic based sentiment time series. The main variables of interest are Euro/USD, GBP/USD, Swiss Franc/USD and Japanese Yen/USD exchange rates. The linear model with the sentiments from the topics and the lagged values of the currencies is found to perform better than the benchmark AR(1) model. Incorporating sentiments from tweets also resulted in a better prediction of currency values after unexpected events.

Third paper investigates the behavior of Reddit's news subreddit users and the relationship between their sentiment on exchange rates. Using graphical models and natural language processing, hidden online communities among Reddit users are discovered. The data used in this project are a mixture of text and categorical data from a news website. It includes the titles of the news pages, as well as a few user characteristics, in addition to users' comments. This dataset is an excellent resource to study user reaction to news since their comments are directly linked to the webpage contents. The model considered in this paper is a hierarhical mixture model which is a generative model that detects overlapping networks using the sentiment from the user generated content. The advantage of this model is that the communities (or groups) are assumed to follow a Chinese restaurant process, and therefore it can automatically detect and cluster the communities. The hidden variables and the hyperparameters for this model can be obtained using Gibbs sampling.

Main Content
Current View