- Cavalleri, E;
- Soto-Gomez, M;
- Pashaeibarough, A;
- Malchiodi, D;
- Caufield, H;
- Reese, J;
- Mungall, CJ;
- Robinson, PN;
- Casiraghi, E;
- Valentini, G;
- Mesiti, M
Relation extraction from the scientific literature to comply with a domain ontology is a well-known problem in natural language processing and is particularly critical in precision medicine. The advent of large language models (LLMs) has paved the way for the development of new effective approaches to this problem, but the extracted relations can be affected by issues such as hallucination, which must be minimized. In this paper, we present the initial design and preliminary experimental validation of SPIREX, an extension of the SPIRES-based system for the extraction of RDF triples from scientific literature involving RNA molecules. Our system exploits schema constraints in the formulations of LLM prompts along with our RNA-based KG, RNA-KG, for evaluating the plausibility of the extracted triples. RNA-KG contains more than 9M edges representing different kinds of relationships in which RNA molecules can be involved. Initial experimental results on a controlled data set are quite encouraging.