Effective Exploration of Web and Social Network Data
- Author(s): Cheng, Shiwen
- Advisor(s): Christidis, Evangelos
- et al.
The amount of Internet data is rapidly increasing due to the growth of the Web and the success of Online Social Networks. However, it is challenging to users to effectively explore these dynamic and massive data.
Search engines offer a convenient way for users to explore Web and Social data through keyword query interfaces.
However it is not rare for a keyword search query to return many results not relevant to the user's information need in terms of content, time or structure. A user's information need is hard to be understood by search engines purely based on the query keywords, especially when the query is ambiguous in nature.
In Social Networks, in additional to ad-hoc search, subscription is another common way to consume data. For example, a user in Facebook and Twitter can subscribe to other users to have their real-time updates shown in her timeline. However, a user could be overloaded by the large number of posts in the subscription due to the high post rate.
In this dissertation we develop solutions to help users effectively explore Web and Social Network data:
First, we study the role of the document creation time in Web search queries in relation to the freshness requirement of a query. Second, we study the structural ambiguity of a query and we estimate the hardness of keyword queries over structured data. A search engine can use this technique to decide when to employ query suggestion and query reformulation techniques to offer better user experience.
Third, we proposed context-aware ranking models to improve the search of medical literature by leveraging user query sessions. The user query session can help to understand a user's information need.
Fourth, we applied novel diversification techniques on Online Social Network data to alleviate the information overload problem and help users to more effectively explore social data through search interfaces or a posts timeline.