This dissertation seeks to enable intelligent vehicles to see, to predict intentions, to understand and to model the state of driver.
We developed a state of the art vision based non-contact gaze estimation framework by carefully designing submodules which will build up to achieve continuous and robust estimation. Key modules in this system include, face detection using deep convolutional neural networks, landmark estimation from cascaded regression models, head pose from geometrical correspondence mapping from 2-D points in the image plane to 3-D points in the head model, horizontal gaze surrogate based on geometrical formulation of the eye ball and iris position, vertical gaze surrogate based on openness of the upper eye lids and appearance descriptor, and finally, a 9-class gaze zone estimation from naturalistic driving data driven random forest algorithm.
We developed a framework to model driver's gaze behavior by representing the scanpath over a time period using glance durations and transition frequencies. As a use case, we explore the driver's scanpath patterns during maneuvers executed in freeway driving, namely, left lane change maneuver, right lane change maneuver and lane keep. It is shown that condensing temporal scanpath into glance durations and glance transition frequencies leads to recurring patterns based on driver activities. Furthermore, modeling these patterns show predictive powers in maneuver detection up to a few seconds a priori and show a promise for developing gaze guidance during take over requests in highly automated vehicles.
We introduce a framework to model the spatio-temporal movements of head, eyes and hands given naturalistic driving data of looking-in at the driver for any events or tasks performed of interest. As a use case, we explore the temporal coordination of the modalities on data of drivers executing maneuvers at stop-controlled intersections; the maneuvers executed are go straight, turn left and turn right. In sequentially increasing time windows, by training classifiers which have the ability to provide discriminative quality of its input variable, the experimental study at intersections shows which type of, when and how long distinguishable preparatory movements occur in the range of a few milliseconds to a few seconds.
We introduce one part of the Vision for Intelligent Vehicles and Applications (VIVA) challenge, namely, the VIVA-face challenge. VIVA is a platform designed to share naturalistic driving data with the community in order to: present issues and challenges in vision from real-world driving conditions, benchmark existing vision approaches using proper metrics and progress the development of future vision algorithms. With a special focus on challenges from looking inside at the driver’s face, we provide information on how the data is acquired and annotated, and how methods are benchmarked, compared and shared on leaderboards.
Finally, we propose de-identification filters for protecting the privacy of drivers while preserving sufficient details to infer driver behavior, such as the gaze direction, in naturalistic driving videos. We implement and compare de-identification filters, which are made up of a combination of preserving eye regions and distorting the background, to show promising results. With such filters, researchers may be more inclined to publicly share deidentified naturalistic driving data. The research community can then tremendously benefit from large amounts of naturalistic driving data and focus on the analysis of human factors in the design and evaluation of intelligent vehicles.