Skip to main content
eScholarship
Open Access Publications from the University of California

Mining Disparate Sources for Question Answering

  • Author(s): Sun, Huan
  • Advisor(s): Yan, Xifeng
  • et al.
Abstract

Today's paradigm of information search is in the midst of a significant transformation. Question answering (QA) techniques that can directly and precisely answer user questions are becoming more and more desired, in contrast to traditional search engines retrieving lengthy web pages. The big data age is endowed with large-scale diversified information sources, such as structured knowledge bases (KBs), unstructured texts, semi-structured tables, as well as human networks including social and expert networks. How to mine such large-scale and disparate sources to advance question answering? In this dissertation, we systematically investigate this problem from the perspectives of text mining, network analysis and human behavior understanding. Specifically, our research lies in:

(1) Text-based question answering. We recognize that KBs are usually far from complete and information required to answer questions may not always exist in KBs. This framework jointly utilizes web texts and knowledge bases: It mines answers directly from large-scale web corpora, and meanwhile employs KBs as a significant auxiliary to boost QA performance.

(2) Table-based question answering. Owing to their prevalence on the Web and large topical diversity, we explore tables, which are distinctive from KBs and texts, for question answering. Specifically, we investigate the problem of given millions of tables, how to precisely retrieve table cells to answer a user question. We propose a table cell search framework to deal with it. The framework is compared with state-of-the-art KB-based QA systems. Experimental results validate the hypothesis that web tables provide rich knowledge missing from existing KBs, and thus serve as a good complement.

(3) Expert-based question answering. The intelligence possessed by current machines is still limited in many aspects. Human intelligence, contributed by crowdsourcing platforms and collaborative networks, should be exploited to complement machine-aided question answering and problem solving. We are among the first to quantitatively analyze task/question routing behaviors of experts in real collaborative networks, which aims at detecting the efficiency bottleneck and optimizing human collaboration.

The developed methodologies and frameworks in this dissertation hence pave the path for building intelligent systems which can utilize an array of complementary knowledge sources including knowledge bases, texts, tables, and human networks to directly answer user questions in various domains, discover novel knowledge, and thereby assist problem solving and decision making.

Main Content
Current View