Skip to main content
eScholarship
Open Access Publications from the University of California

UC Santa Cruz

UC Santa Cruz Electronic Theses and Dissertations bannerUC Santa Cruz

Learning Visual-based Head-based Pointing

Abstract

Devices such as computers, smartphones, tablets, and other smart gadgets have become integral to daily life, enhancing productivity and connectivity. A fundamental aspect of Human-Computer Interaction is pointing, which involves selecting and manipulating objects on display using a pointing device such as a mouse, touchpad, or stylus. However, such devices require fine control of the wrist and fingers. Those with poor upper limb mobility or control can use technology such as eye gaze tracking or head tracking for pointing control. In particular, head tracking does not require a dedicated device (since it can be accomplished by analyzing images of the user taken by a screen camera). It was shown to be more effective and acceptable than eye gaze tracking.

This dissertation first studies visual-based head-based pointing in different settings. It starts by proposing a mobile head-based pointing in the context of online shopping. Then, the study is extended by proposing a more general mobile head-based pointing and conducting Fitts' Law studies with people with motor impairments. User studies demonstrated the method's robustness across different environments. These initial evaluations also suggest that head-based pointing solutions with predefined control mechanisms can be limited. For example, the pointer motion may not accurately reflect the user’s intent, forcing them to move their head in “non-natural” ways to accomplish specific pointing tasks.

The second phase of the dissertation proposes a user-centric approach to create a flexible pointing algorithm that adapts to the user’s intent. This approach involves collecting data by asking participants to follow with their heads the motion of a target on the screen, moving through pre-defined trajectories, while images of their heads are taken by a screen camera. The videos thus recorded were analyzed using computer vision algorithms. This analysis considered several types of features, such as facial landmarks and head poses. An affine transformation was computed using least squares regression to map these features to pointer locations on a screen. The analysis revealed unique head movement patterns among individuals performing similar pointing tasks, indicating that the relationship between head position and desired pointer motion is complex and conditional, with biases based on the location and direction of pointing. To improve pointing precision, I also implemented fully connected neural networks (FCNs) and recurrent neural networks (RNNs) based on the features extracted from video frames. In my evaluation, I considered different visual feature sets and personalization techniques. The results of this analysis show the potential of using advanced transformation models and carefully selected feature sets to enhance the accuracy of head-based pointing systems. The findings underscore the importance of personalization and selecting appropriate mapping types and feature sets in designing efficient and user-friendly head-based pointing. As technology advances, these insights could pave the way for more intuitive and accessible human-computer interaction mechanisms.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View