Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Adaptive Entity Normalization for Biomedical Text Mining

Abstract

Entity normalization is an essential but challenging task for knowledge base construction by text mining the scientific literature. Related to entity linking and word sense disambiguation, models for entity normalization usually depend either on the surface text phrases of the entities or their coherence in the context. In this paper, we show that NormCo, a deep neural network normalization model, can switch between phrase and coherence models. Specifically, we tested this model on the tasks of normalizing bacteria and disease entities extracted from the scientific literature. These two entity types are important to construct a knowledge base of associations between diseases and human microbiome, an emerging development in biotechnology. We show that NormCo switched to either phrase or coherence model to accomplish the best performance for different entity types. We revised NormCo with a dynamic document-level switch and tested it with novel embedding techniques and obtained encouraging results. We organized and consolidated available lexical resources and annotated corpora for bacteria entity tagging and normalization, revealing a high level of discrepancy among these resources. Our results with these resources suggest that the skewed distribution of biomedical entity mentions may require different normalization approaches for highly mentioned entities from long-tail ones.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View