We present in this work a protein entity tagging and normalization process focused on data extraction from biochemical patents. The project acts a single stage in the pipeline of general chemical interaction extraction. Novel to this work is the character embedded approach to mention identification and normalization. Additionally, this is the first work to use a siamese network and a prototypical network to augment protein database normalization. Our results show that character embeddings provide a reasonable approach to protein entity extraction achieving up to 6\% better results than previous work, and that normalization tasks can be improved significantly with a learned embedded space.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.