Purpose
To evaluate the potential for artificial intelligence-based video analysis to determine surgical instrument characteristics when moving in the three-dimensional vitreous space.Methods
We designed and manufactured a model eye in which we recorded choreographed videos of many surgical instruments moving throughout the eye. We labeled each frame of the videos to describe the surgical tool characteristics: tool type, location, depth, and insertional laterality. We trained two different deep learning models to predict each of the tool characteristics and evaluated model performances on a subset of images.Results
The accuracy of the classification model on the training set is 84% for the x-y region, 97% for depth, 100% for instrument type, and 100% for laterality of insertion. The accuracy of the classification model on the validation dataset is 83% for the x-y region, 96% for depth, 100% for instrument type, and 100% for laterality of insertion. The close-up detection model performs at 67 frames per second, with precision for most instruments higher than 75%, achieving a mean average precision of 79.3%.Conclusions
We demonstrated that trained models can track surgical instrument movement in three-dimensional space and determine instrument depth, tip location, instrument insertional laterality, and instrument type. Model performance is nearly instantaneous and justifies further investigation into application to real-world surgical videos.Translational relevance
Deep learning offers the potential for software-based safety feedback mechanisms during surgery or the ability to extract metrics of surgical technique that can direct research to optimize surgical outcomes.