Resolution, Recommendation, and Explanation in Richly Structured Social Networks
- Author(s): Kouki, Pigi
- Advisor(s): Getoor, Lise
- et al.
There is an ever-increasing amount of richly-structured data from online social media. Making effective use of such data for recommendations and decisions requires methods capable of extracting knowledge both from the content as well as the structure of such networks. Utilizing richly-structured networks derived from real-world data involves three major challenges that I address in this dissertation: 1) matching multiple references that correspond to the same entity (a problem known as entity resolution),
2) exploiting the heterogeneous nature of the data to provide accurate recommendations, and, given the complexity and heterogeneity of the data, 3) explaining the recommendations to users. My goal in this work is to address these challenges and improve both accuracy and user experience for resolution and recommendation over richly-structured social data.
In the first part of this work, I introduce a collective approach for the problem of entity resolution in familial networks that can incorporate statistical signals, relational information, logical constraints, and predictions from other algorithms. Moreover, the method is capable of using training data to learn the weight of different similarity scores and relational features. In experiments on real-world data, I show the importance of supporting mutual exclusion and different types of transitive relational rules that can model the complex familial relationships. Furthermore, I show the performance improvements in the ER task of the collective model compared to state-of-the-art models that use relational features but lack the ability to perform collective reasoning.
In the second part of this work, I present a general-purpose, extensible hybrid recommender system that can incorporate and reason over a wide range of social data sources. Such sources include multiple user-user and item-item similarity measures, content, and social information. Additionally, the framework automatically learns to balance these different information signals when making predictions. I experimentally evaluate my approach on two popular recommendation datasets, showing that the proposed framework can effectively combine multiple information types for improved performance, and can significantly outperform existing state-of-the-art approaches.
In the third part of this work, I show how to generate personalized, hybrid explanations from the output of a hybrid recommender system. Next, I conduct two large crowd-sourced user studies to explore different ways explanations can be presented to the users: a non-personalized and a personalized. In the first, non-personalized study, I evaluate explanations for hybrid algorithms in a variety of textual and visual formats. I find that people do not have a specific preference among different versions of textual formats. At the same time, my analysis indicates that among a variety of visualization formats people prefer Venn diagrams. In the second, personalized study, I ask users to evaluate the persuasiveness of different explanation styles and find that users prefer item-based and content-based styles over socio-based explanations. I also study whether the number of the explanation styles can affect the persuasiveness of the explanation. My analysis indicates that users lose interest after showing them three to four different explanation styles. Finally, I experiment with a variety of formats that hybrid explanations can be presented to the users, such as textual or visual, and find that textual explanations are perceived as most persuasive.
I formulate the problems of entity resolution, recommendation, and explanation as inference in a graphical model. To create my models and reason over the graphs, I build upon a statistical relational learning framework called probabilistic soft logic. My models, which allow for scalable, collective inference, show an improved performance over state-of-the-art methods by leveraging richly-structured data, i.e., relational features (such as user similarities), complex relationships (such as mutual exclusion), a variety of similarity measures, as well as other heterogenous data sources (such as predictions from other algorithms).