- Ciszek, Robert;
- Ndode-Ekane, Xavier Ekolle;
- Gomez, Cesar Santana;
- Casillas-Espinosa, Pablo M;
- Ali, Idrish;
- Smith, Gregory;
- Puhakka, Noora;
- Lapinlampi, Niina;
- Andrade, Pedro;
- Kamnaksh, Alaa;
- Immonen, Riikka;
- Paananen, Tomi;
- Hudson, Matthew R;
- Brady, Rhys D;
- Shultz, Sandy R;
- O'Brien, Terence J;
- Staba, Richard J;
- Tohka, Jussi;
- Pitkänen, Asla
The Epilepsy Bioinformatics Study for Antiepileptogenic Therapy (EpiBioS4Rx) is a National Institutes for Neurological Diseases and Stoke funded Centers-Without-Walls international multidisciplinary study aimed at preventing epileptogenesis. The preclinical biomarker discovery in EpiBios4Rx applies a multicenter study design to allow the number of animals that are required for adequate statistical power for the analysis to be studied in an efficient manner. Further, the use of multiple centers mimics the clinical trial situation, and therefore potentially the chance of successful clinical translation of the outcomes of the study. Its successful implementation requires harmonization of procedures and data analyses between the three contributing centers in Finland, Australia, and USA. The objective of the present analysis was to develop metrics for analysis of the success of harmonization of procedures to guide further data analyses and plan the future multicenter preclinical studies. The interim analysis of data is based on the analysis of data from 212 rats with lateral fluid-percussion injury or sham-operation included in the biomarker discovery by April 30, 2018. The details of protocols, including production of injury, post-injury follow-up, blood sampling, electroencephalogram recording, and magnetic resonance imaging have been presented in the accompanying manuscripts in this Supplement. Implementation of protocols in EpiBios4Rx project participant centers was visualized in 2D using t-distributed stochastic neighborhood embedding (t-SNE). The protocols applied to each rat were presented as feature vectors of procedure related variables (e.g., impact pressure, anesthesia time). The total number of protocol features linked to each rat was 112. The missing data was accounted in visualization by utilizing imputation and adding the number of missing values as a third dimension to 2D t-SNE plot, resulting in a 3D overview of protocol data. Intraclass correlation coefficient (ICC) using Euclidean distances and area under receiver operating characteristic curve (AUC) of k-nearest neighbor classifier (KNN) were utilized to quantify the degree of clustering by center. Both subsets of data with incomplete protocol vectors omitted and missing protocol data imputed were assessed. Our data show that a visible clustering by center was observed in all t-SNE plots, except for day 7 neuroscores. Both ICC and AUC indicated clustering by center in all protocol variable subsets, excluding unimputed day 7 neuroscores (ICC 0.04 and AUC 0.6). ICC for imputed set of all protocol related variables was 0.1 and KNN AUC 0.92. In conclusion, both ICC and AUC indicated differences in protocol between EpiBios4Rx participating centers, which needs to be taken into account in data analysis. Importantly, the majority of observed differences are recoverable as they relate to insufficient updates in record keeping. While AUC score of KNN is a more sensitive measure for protocol harmonization than ICC for data that displays complex splintered clustering, ICC and AUC provide complementary measures to assess the degree of procedural harmonization. This experience should be helpful for other groups planning such multicenter post-traumatic epileptogenesis studies in the future.