Skip to main content
eScholarship
Open Access Publications from the University of California

UC Davis

UC Davis Electronic Theses and Dissertations bannerUC Davis

Building Intelligent and Reliable Summarization Systems

Abstract

Data, in various formats, surrounds us everywhere in our daily lives, such as education, entertainment, and media. Living in the era of big data, the massive amount of web textual data has grown exponentially over the past decade. This leads to the problem of information overload, where an individual is exposed to more information than they could process. Thus, the need for an automatic text summarization (ATS) system emerges, which could transform this vast raw information into key points in the form of smaller, digestible pieces automatically.

ATS systems operate by extracting or generating a concise and readable summary while preserving salient information from the original documents. Developing intelligent systems that can produce concise, fluent, and reliable summaries has been a long-standing goal in natural language processing (NLP). Significant progress has been made in recent years, thanks to breakthroughs like pre-trained language models such as BERT and GPT. However, text summarization remains a complex and multifaceted task. Similar to the cognitive process humans undertake when crafting summaries, text summarization requires the machine to first semantically understand the contents of a document, then identify and extract salient information from the document, and finally generate an accurate and faithful summary.

This dissertation presents several distinct approaches to tackle the three critical steps of building ATS systems. Specifically, I first present my work to improve the modeling of long documents for extractive summarization. I introduce model HEGEL, a hypergraph neural network for long document summarization that captures high-order cross-sentence relations. HEGEL updates and learns effective sentence representations with hypergraph transformer layers and fuses different types of sentence dependencies, including latent topics, keywords, coreference, and section structure. Extensive experiments on two benchmark datasets demonstrate the effectiveness and efficiency of HEGEL in long document modeling and extractive summarization.

Then I move on to the holistic extraction of salient information from documents. To address the limitation of individual sentence label prediction in existing extractive summarization systems, I propose a novel paradigm for extractive summarization named DiffuSum. DiffuSum directly generates the desired summary sentence representations with diffusion models and extracts sentences based on sentence representation matching. Additionally, DiffuSum jointly optimizes a contrastive sentence encoder with a matching loss for sentence representation alignment and a multi-class contrastive loss for representation diversity. On the other hand, I also introduce a new holistic framework for unsupervised multi-document extractive summarization. The method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners.

Next, I demonstrate my work on improving the quality and faithfulness of generated summaries. While text summarization systems have made significant progress in recent years, they typically generate summaries in one single step. However, the one-shot summarization setting is sometimes inadequate, as the generated summary may contain hallucinations or overlook essential details related to the reader's interests. To address this, I propose SummIt, an iterative text summarization framework based on large language models (LLMs) like ChatGPT. SummIt enables the model to refine the generated summary iteratively through self-evaluation and feedback, resembling humans' iterative process when drafting and revising summaries. Furthermore, I explore the potential benefits of integrating knowledge and topic extractors into the framework to enhance summary faithfulness and controllability. Both automatic evaluation and human studies are conducted on three benchmark summarization datasets to validate the effectiveness of the iterative refinements and to identify potential issues of over-correction.

Finally, as the emergence of large language models reshapes NLP research, I present a thorough evaluation of ChatGPT's performance on extractive summarization and compare it with traditional fine-tuning methods on various benchmark datasets. The experimental analysis reveals that ChatGPT exhibits inferior extractive summarization performance in terms of ROUGE scores compared to existing supervised systems, while achieving higher performance based on LLM-based evaluation metrics. I also explore the effectiveness of in-context learning and chain-of-thought reasoning for enhancing its performance and propose an extract-then-generate pipeline with ChatGPT, which could yield significant performance improvements over abstractive baselines in terms of summary faithfulness. These observations highlight potential directions for enhancing ChatGPT's capabilities in faithful summarization using two-stage approaches.

In summary, by demonstrating and examining these systems and solutions, I aim to highlight the three critical yet challenging steps in building intelligent and reliable summarization systems, which are also crucial steps towards advancing the design of a more powerful and trustworthy AI assistant. I hope future research endeavors will continue to advance along these directions.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View