Automated Pain Detection in Facial Videos using Transfer Learning
Accurately determining pain levels is difficult, even for trained professionals. Facial activity provides sensitive and specific information about pain, and computer vision algorithms have been developed to automatically detect facial activities such as Facial Action Units (AUs) defined by the Facial Action Coding System (FACS). Previous work on automated pain detection from facial expressions has primarily focused on frame-level objective pain metrics, such as the Prkachin and Solomon Pain Intensity (PSPI). However, the current gold standard pain metric is the visual analog scale (VAS), which is self-reported at the video level. In this thesis, we propose machine learning models to directly evaluate VAS in video.
First, we study the relationship between sequence-level metrics and frame-level metrics. Specifically, we explore an extended multitask learning model to predict VAS from human-labeled AUs with the help of other sequence-level pain measurements during training. This model consists of two parts: a multitask learning neural network model to predict multidimensional pain scores, and an ensemble learning model to linearly combine the multidimensional pain scores to best approximate VAS. Starting from human-labeled AUs, the model outperforms provided human sequence-level estimates.
Secondly, we explore ways to learn sequence-level metrics based on frame-level automatically predicted AUs. We start with an AU prediction software called iMotions. We apply transfer learning by training another machine learning model to map iMotions AU codings to a subspace of manual AU codings to enable more robust pain recognition performance when only automatically coded AUs are available for the test data. We then learn our own AU prediction system which is a VGGFace neural network multitask learning model to predict AUs.
Thirdly, we propose to improve our model using individual models and uncertainty estimation. For a new test video, we jointly consider which individual models generalize well generally, and which individual models are more similar/accurate to this test video, in order to choose the optimal combination of individual models and get the best performance on new test videos. Our structure achieves state-of-the-art performance on two datasets.