Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Barbara

UC Santa Barbara Electronic Theses and Dissertations bannerUC Santa Barbara

Towards Effort-Saving Knowledge Mining and Reasoning over the Web

Abstract

The web exposes modern humans to ever-growing information about the world. Meanwhile, knowledge is democratized and spread to a wider population, not just privileged elites. However, with broad knowledge buried deeply and diversely under the Internet, the potential of knowledge democratization is not fully exploited: knowledge can be hard to find and digest. Knowledge mining and reasoning techniques acquire such knowledge on unstructured and structured data to satisfy people's craving for knowledge. It pushes forward the democratization of knowledge, making the broader knowledge more accessible to a wider part of the world.

However, for practitioners to build a system and present it to final users, much more human effort is involved in the whole process. (1) From the system aspect, human supervision needs to be provided: For a new domain, high cost of data collection, the data-hungry nature of mainstream approaches like neural networks all pose challenges on label efficiency, i.e., to reduce the need for human supervision. (2) From the user aspect, certain human intelligence is needed: it takes time for users to digest, understand, accept the returned results. It suggests that the system should provide a global picture and has explainability. So the effort of human intelligence is saved. (3) Users may want intelligent human-machine interaction. It is probable they have a very vague query idea at the beginning and need some exploration until they have a clear mind. A human-in-the-loop intelligent system is demanded: it supports iterative query, exploration, refinement, and navigation.

In this dissertation, we propose complementary approaches targeting these aspects towards effort-saving knowledge mining and reasoning. We begin with knowledge mining, which directly harvests knowledge from massive unstructured text. We formulate and mine a graph describing a global picture of scientific development with free weak supervision. We also design a human-in-the-loop system to ease query development and facilitate intelligence exploration of a large text repository. Next, we propose a general-purpose textual relation embedding that is transferable for downstream relation-involving tasks. Finally, we focus on knowledge reasoning, leveraging strong and large pre-trained language models. We propose to use a pre-trained language model to incorporate both structural and textual information of knowledge graph. We also adopt a constrained decoding strategy to the pre-trained language model, successfully applying the generative model in commonsense knowledge base completion. Altogether, these allow a more effort-saving knowledge mining and reasoning, which accelerates the democratization of knowledge.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View