Skip to main content
Open Access Publications from the University of California

Using Text Mining to Accelerate Automatic Curation of Biomedical Databases


Numerous publicly available biomedical databases derive data by curating from literatures. However, using curated data in Machine Learning is challenging, because the exact mentions and locations in the text are lacking. This thesis describes a general approach to use curated data as training examples for information extraction. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of classifiers that consider both curated data and the text

Main Content
Current View