Skip to main content
eScholarship
Open Access Publications from the University of California

Two-Stream Vision Swin Transformer for Video-based Eye Movement Detection

Abstract

Eye movement detection plays a crucial role in various fields, including eye tracking applications and understanding human perception and cognitive states. Existing detection methods typically rely on gaze positions predicted by gaze estimation algorithms, which may introduce cumulative errors. While certain video-based methods, directly classifying behaviours from videos, have been introduced to address this issue, they often have limitations as they primarily focus on detecting blinks. In this paper, we propose a video-based two-stream framework designed to detect four eye movement behaviours—fixations, saccades, smooth pursuits, and blinks—from infrared near-eye videos. To explicitly capture motion information, we introduce optical flow as the input for one stream. Additionally, we propose a spatio-temporal feature fusion module to combine information from the two streams. The framework is evaluated on a large-scale eye movement dataset and performs excellent results.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View