The elucidation of a protein’s interaction/association network is important for defining itsbiological function. Mass spectrometry-based proteomic approaches have emerged as
powerful tools for identifying protein-protein interactions (PPIs) and protein-protein
associations (PPAs). However, interactome/association experiments are difficult to
interpret considering the complexity and abundance of data that is generated. Although
tools have been developed to quantitatively identify protein interactions/associations,
there is still a pressing need for easy-to-use tools that allow users to contextualize their
results.
To address this, we developed CANVS, a computational pipeline that cleans, analyzes,and visualizes mass spectrometry-based interactome/association data. CANVS is
wrapped as an interactive Shiny dashboard, allowing users to easily interface with the
pipeline. With simple requirements, users can analyze complex experimental data and create PPI/A networks. The application integrates systems biology databases like
BioGRID and CORUM to contextualize the results. Furthermore, CANVS features a Gene
Ontology tool that allows users to identify relevant GO terms in their results and create
visual networks with proteins associated with relevant GO terms. As examples, we
recently used the analytical framework included in CANVS to study the PPI/A networks
of DUSP7, which helped to define its regulation of ERK2 during mitosis and also to
analyze the PPA networks of core spindle assembly checkpoint proteins. Overall, CANVS
is an easy-to-use application that benefits all researchers, especially those who lack an
established bioinformatic pipeline and are interested in studying interactome/association
data.
Additionally, we describe a supervised machine learning method that incorporatesannotated data from the contaminant repository for affinity purification data (CRAPome)
that predicts contaminants in affinity and proximity purification data. The method involves
first calculating amino acid content, sequence order, hydrophobicity and hydrophilicity
from protein sequence. Then balancing data using data augmentation methods. Finally,
measuring precision and accuracy using protein-protein interaction/association data. The
results suggest that our supervised method can predict with 90% accuracy contaminants
in protein-protein interaction/association data.