The immense scale of the web has rendered itself as a huge content repository. Web users seek information content of interest primarily from search engines and social media. The sheer amount of online content, ranging from professionally-produced content to user-generated content, varies greatly in quality, which can often result in confusion, sub-optimum decisions or dissatisfaction with choices made by users. It is, therefore, highly significant to develop learning models that are able to automatically discover high-quality content for web users.
This thesis explores two general schemes toward this ultimate goal: 1. Learning to discover high-quality content and delivering it to users. 2. Learning to identify domain authorities who generate high-quality content, so users can obtain quality content from these authorities. Under the two schemes, we propose a range of Bayesian statistical models, each specifically designed for a unique application in social media or web search engines. These models are able to discover high-quality information by statistically analyzing the online content and users in the systems.
In particular, in the social media domain, we introduce a range of Bayesian models specifically designed to identify topic-specific influencers or experts in microblogs and content-sharing websites. On the other hand, in the search engine domain, two different Bayesian models are proposed to analyze the search users and database. One of the models is specifically designed to build a recommender system on a knowledge base, which suggests related entities for search users, while the other model is developed to infer the demographics of users, which can be utilized to enhance their search experience. Extensive experiments have been conducted on real-world data to confirm the effectiveness of all the proposed models.