The human eye manifests remarkable optical and mechanical characteristics that can be exploited to determine where a person is looking. While the IR- based devices can closely model such attributes, the webcam-based geometrical methods for determining gaze often suffer from low accuracy due to their sensitivity to estimated physical parameters. Over the past several years, a number of data-driven gaze tracking algorithms have been proposed, which have been shown to outperform classic model-based methods in terms of gaze direction accuracy. These algorithms leverage the recent development of sophisticated CNN architectures, as well as the availability of large gaze datasets captured under various conditions. One shortcoming of black-box, end-to-end methods, though, is that any unexpected behaviors are difficult to explain. In addition, there is always the risk that a system trained with a certain data set may not perform well when tested on data from a different source (the “domain gap” problem.) In this work, we propose a novel method to embed eye geometry information in an end-to-end gaze estimation network by means of an “analytic layer”. Our experimental results show that our system outperforms other state-of-the-art methods in cross-dataset evaluation, while producing competitive performance over within dataset tests. In addition, the proposed system is able to extrapolate gaze angles outside the range of those considered in the training data.
For many gaze-related tasks, such as eye landmarks detection, obtaining manual annotations for deep learning models is labor-intensive and prone to errors. Consequently, such models can be trained on the synthetic datasets, rendered using computer-graphics techniques. However, a model trained on these datasets might not perform well on the eye images captured in the real world due to significant photometric differences. We propose a domain adaptation algorithm to translate the images in the synthetic domain to the target domain of real-world images using an intermediate segmentation mask while preserving the annotations from the synthetic domain. Our method outperformed the previous domain adaptation techniques in maintaining the annotations, which is critical for training deep learning models for downstream tasks.
We further augment this approach to control the gaze and attributes of the generated eye image. We cast this problem as a style-based eye image synthesis and separately train a gaze redirector network to manipulate the gaze of the segmentation mask. The eye image with the target gaze is thus obtained by altering the gaze of the corresponding mask and then generating the eye image from the modified mask, while preserving the style. Since the segmentation masks are domain-independent, the whole pipeline does not require gaze labeled real-world data for training, while still showing competitive performance with the previous state-of-the-art algorithms trained on real-world gaze annotated datasets.