Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

A Latent Space Approach for Cognitive Social Structures Modeling and Graphical Record Linkage

  • Author(s): Sosa, Juan
  • Advisor(s): Rodriguez, Abel
  • et al.
Abstract

Latent space models have proven useful in all sorts of applications involving social network data due to their flexibility and interpretability. This dissertation extends such class of models to other fields where relational data play a fundamental roll.

First, we introduce a novel approach for modeling cognitive social structures (CSSs) data. We rely on a generalized linear model that incorporates a bilinear structure to model transitivity effects within networks, and a hierarchical specification on the bilinear effects to borrow information across networks. The model allows us to perform a formal evaluation of the accuracy of actors' perception about their position in social space, and also to obtain a consensus representation of the social space that accounts for differential perceptual acuity among actors. The performance of the model is evaluated using simulated data as well as two real CSSs reported in the literature.

Next, we study the integration of databases in the context of online social networks (OSNs). We propose a model for discovering multiple profiles of a single user using both profile and network data. Our proposal is able to handle multiple networks and makes straightforward the propagation of uncertainty of record linkage into later analysis. We illustrate our methodology in two different settings, namely, re-identification and identity resolution, using real data from three popular OSNs.

Finally, we study the impact of combining profile and network data in a de-duplication setting. We also assess the influence of a range of prior distributions on the linkage structure, including our proposal. Our proposed prior makes it straightforward to specify prior believes and naturally enforces the microclustering property. Furthermore, we explore stochastic gradient Hamiltonian Monte Carlo methods as a faster alternative to obtain samples for the network parameters. Our methodology is evaluated using the RLdata500 data, which is a popular dataset in the record linkage literature.

Main Content
Current View