Understanding and predicting users' behavior is crucial to decision-making processes in various applications. With the growing scale of the user population in numerous domains, autonomous approaches like Artificial Intelligence (AI) and algorithmic designs have become an essential part of many user behavior analyses. Intelligent systems outperform humans on specific tasks and facilitate human decisions by improving their perception. For practical, social, and in many cases, legal reasons, the adoption of intelligent systems in many domains hinges on their ability to provide explanations to developers, users, and regulators. Example domains are financial and e-commerce services, medical diagnosis, social media, and government policies. Such explanations increase users' trust in why and how decisions are made, and enable their designers to increase robustness and fairness of the systems and reduce bias and discrimination.
Data-driven systems often use logs of captured human interactions with applications, services, and systems for their user behavior analyses. These records represent naturalistic human behavior uninfluenced by observers and typically include many kinds of users worldwide. However, they usually provide information about what people do and less about why they do so. Some data-driven AI techniques like neural networks remain primarily black boxes despite their performance achievement. Designing a model capable of simultaneously achieving high performance in prediction tasks and explaining the underlying reasoning of those predictions is desirable.
This dissertation aims to design predictive models that bridge the gap between performance in predicting user behavior and the interpretability of the model in multiple application domains. We investigate several data-driven approaches, from stochastic models to neural networks. We propose effective methods that offer explanations along with performance and discuss their limitations.
The domains studied in this dissertation are:- First, we look at the case of recommender systems. They play critical roles by offering items of interest to the users, thereby narrowing down a vast search space that comprises hundreds of thousands of products. We propose an architecture that relies on common patterns as well as individual behaviors to tailor its recommendations for each person. Simulations under a controlled environment show that our proposed model learns interpretable personalized user behaviors. Our empirical results on Nielsen Consumer Panel dataset indicate that the proposed approach achieves up to 27.9% performance improvement compared to the state-of-the-art.
- Second, we investigate how human behavior in the presence of Non-Pharmaceutical Interventions (NPI) (e.g., limited public gatherings and masks) impacts the spread of a contagious disease. A deeper understanding of the policy effects on human behavior and, subsequently, disease containment allows a more accurate forecast of disease spread when NPIs are partially loosened and provides policymakers with better data for making informed decisions. We adapt the Susceptible–Exposed–Infected–Recovered (SEIR) model for disease propagation in a network of interconnecting humans to incorporate human movement and social distancing. Even though the proposed SEIR model can estimate the disease propagation accurately, it singly does not explain the effect of NPIs on the disease spread. We measure the impact of NPIs on human behavior and, therefore, on mitigating COVID-19 spread by exploiting the spatio-temporal variations in policy measures across the 16 states of Germany. Our model finds that German policies that mandated contact restrictions (e.g., limited movement in public space) and closure of educational institutions are associated with the sharpest drops in movement within and across states. While this quasi-experiment does not allow for causal identification, each policy's effect on reducing disease spread provides meaningful insights. By combining the SEIR model with a model that measures the policy contributions to mobility reduction, we forecast scenarios for relaxing various types of NPIs. In another study, we separately use a reduced form econometric model to relate population-wide changes in mask-wearing to the growth rate of airborne disease infections in the presence of other NPIs. We use the estimated growth rate to predict COVID-19 spread across 24 countries using the Susceptible–Infected–Recovered (SIR) network model.
- Lastly, we study the propagation of malicious user activity in Online Social Networks (OSN). OSNs spread content widely and rapidly among users; thus, they can be used as vectors to disseminate malicious content, e.g., spam. We analyze malicious behaviors by conducting case studies on a content-driven OSN, Pinterest. Based on the insights gained from our analyses, we develop learning-based models to detect whether posted content is malicious. We investigate the role of various features in the prediction task and show that the observed properties when a content is posted can be used to protect users from potential risks.