For over a decade, drug-induced liver injury (DILI) has posed significant drawbacks in the synthesis and development of drugs and remains a consequential concern. With finite success within the existing preclinical models, DILI is one of the main causes of drug withdrawal or termination from the market. Particularly, this withdrawal occurs during the late stages of drug development (Kullak-Ublick, 2017). Since DILI is difficult to diagnose and treat, it has become an obstacle in the drug production market that in turn affects clinicians, pharmaceutical companies, and consumers. We propose a method for learning features of DILI-positive drugs based on the graphical relationships and patterns they possess within a network of biological databases. We also train various statistical and machine learning models on these learned features in order to classify the drugs as DILI-positive or negative. Our methods include Random Forest, Neural networks, and logistic regression classification. We utilize labeled DILI-positive and DILI-negative datasets, which were developed by the FDA and the National center for toxicological research, as well as additional literature datasets (Thakkar, 2020) in order to validate our results and assess our featurization and model accuracy.
Keywords: liver toxicity, hepatoxic drug analysis, drug classification, FDA clinical trials, graph databases, data processing, graph embeddings, classification models, machine-learning featurization, model comparison.