Multi-relational Representation Learning and Knowledge Acquisition
- Author(s): Chen, Muhao
- Advisor(s): Zaniolo, Carlo
- et al.
Multi-relational representation learning methods encode entities or concepts of a knowledge graph in a continuous and low-dimensional vector space, where the relational inferences of entities (concepts) are modeled as some simple vector algebras. Despite such knowledge representations being crucial to a wide range of knowledge-driven applications, state-of-the-art methods are limited to learning embeddings for simple relation facts in a single knowledge graph. In this dissertation, we pursue the goal of comprehensively capturing the multifaceted relational knowledge in various types of knowledge bases, and towards that we contribute on three fronts: (i) we introduce the first multi-relational representation learning framework that learns to transfer embeddings across multiple knowledge bases; (ii) we propose techniques for preserving relational facts with complex properties in the embedding space, including those enforce relational properties, form hierarchies, or endowed uncertainty; (iii) we investigate large-scale relational learning based on other modalities of data, with the aim of acquiring knowledge to enrich the knowledge bases.
Each of these three research problems presents a series of key challenges which we address. Thus, for transferred embeddings, we develop joint learning of relational structure encoders that confront the heterogeneity of contents in knowledge graphs, together with diverse types of alignment models that learn to transfer on the basis of simple, hierarchical or fuzzy alignment information. In addition, we extend the joint learning framework with semi-supervised co-training of entity descriptions, and proactive score propagation for fuzzy alignment, so as to conquer the scenarios where alignment information is limitedly provided. To capture complex relation facts, we focus first on the relational properties that cause non-linearity in embedding structures, for which we leverage a non-linear component-specific mappings of embeddings to eliminate the conflicts, and strengthens the learning process with hierarchical regularization. For uncertain relation facts, we preserve the uncertainty by utilizing Probablistic Soft Logic to guide the non-linear regressor that is jointly trained with the structure encoder. We further study the support of relational learning based on sequence data. Our model proposes generic neural sequence pair models to support large-scale relation detection, in which we incorporate different sequence encoders for heterogeneous data such as structured articles, amino acid sequences, and lexicographic knowledge.
The methods proposed in this dissertation extend the application of multi-relational embeddings, and improve a wide spectrum of applications in different domains. These include knowledge alignment, monolingual and cross-lingual knowledge graph completion, semantic search, entity typing, paraphrase identification, uncertain relation prediction, protein-protein interaction prediction, protein binding affinity estimation, single-cell RNA-sequence imputation, and Webscale sub-article matching.