Chen, Zhimin

The Role of Visual Context in Emotion Recognition

2020

Chen, Zhimin
Advisor(s): Whitney, David

Abstract

Emotion recognition is an essential human ability critical for social functioning. It is widely assumed that identifying facial expression is the key to this, and models of emotion recognition have mainly focused on facial and bodily features in unnatural, static, or decontextualized conditions. However, an individual’s face and body are usually perceived within a meaningful context, not in isolation. The visual context, therefore, may provide useful or even necessary information when interpreting emotion. Here, we investigated the role of visual context in dynamic emotion recognition. First, we developed a novel method, "inferential affective tracking (IAT)", to reveal and quantify the contribution of visual context to affect (valence and arousal) perception. We show that when characters’ faces and bodies were masked in silent videos, viewers inferred the affect of the invisible characters successfully and in high agreement, based solely on visual context. We further show that the context is not only sufficient but also necessary to accurately perceive human affect over time, as it provides a substantial and unique contribution beyond the information available from face and body. Next, we tested the efficiency of IAT by measuring the speed of recognizing emotion from contextual information alone. Using cross-correlation analyses, we found that inferring affect based on visual context alone is just as fast as tracking affect with all available information including face and body. We further demonstrated with empirical evidence that this approach has high precision in detecting a sub-second temporal lag. Finally, we extended and adapted the IAT method to test categorical emotion perception rather than affect. This method is very similar to the IAT technique and so we call it “inferential emotion tracking (IET)”. Using IET, we show that the presence of visual context can override interpreted emotion categories from face and body information. Strikingly, we find that visual context determines perceived emotion nearly as much and as often as face and body information does. Taken together, these experiments reveal that emotion recognition is, at its heart, an issue of context as much as it is about faces. Seemingly complex context-based emotion perception is far more efficient than previously assumed.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Berkeley

The Role of Visual Context in Emotion Recognition