UC San Diego
Towards Extracting Protein-Compound Interactions from BioChemical Patents
- Author(s): Pister, Kaiser Stefan
- Advisor(s): Bergen, Leon
- et al.
We present in this work a protein entity tagging and normalization process focused on data extraction from biochemical patents. The project acts a single stage in the pipeline of general chemical interaction extraction. Novel to this work is the character embedded approach to mention identification and normalization. Additionally, this is the first work to use a siamese network and a prototypical network to augment protein database normalization. Our results show that character embeddings provide a reasonable approach to protein entity extraction achieving up to 6\% better results than previous work, and that normalization tasks can be improved significantly with a learned embedded space.