Audio localization in the Automatic Cameraman
- Author(s): Ettinger, Evan Ira
- et al.
This dissertation studies the audio localization component of a touchless interactive display located in the CSE building at UC San Diego. The display has been named The Automatic Cameraman (TAC) and consists of four large television displays, a PTZ camera, and a microphone array. In this work, we propose a simple solution to the problem of accurately pointing the PTZ camera at speaking humans who are interacting with TAC. The focus of this dissertation will be on a novel audio localization and tracking algorithm based on what we call the coordinate- free approach. Previous approaches to localization assume a precise known geometry for the microphone array. This is expressed through a coordinate system for the room with an exact position for each microphone element. As a result, arrays are typically built so that microphone positions can be known easily e.g. as linear or planar with fixed spacing. The coordinate-free method we propose requires no such knowledge of such a coordinate system allowing for an ad-hoc placement of microphones. Our coordinate-free localization algorithm employs a statistical approach by learning a mapping from observed time-delays between microphone pairs directly to a pan and tilt directive for the PTZ-camera. In addition, we explicitly utilize the fact that the training set of time-delay vectors lie on a low-dimensional structure, namely a three-dimensional structure governed by the sound source's true spatial location. We explore various regressor models with special attention to those that are known to exploit this intrinsic low dimensionality. We follow this with a study of a particle filtering based tracker of the time-delays between microphones. Our tracker employs a novel approach to the particle filtering problem based on online learning. It introduces a new, practically useful, particle resampling scheme. It is also more robust to model misspecification than traditional particle filters. In the final part of the dissertation, we examine a MEMS digital microphone based array that we recently implemented on an FPGA. We explore how this digital array will alleviate many of the technical deficiencies of the current analog array in TAC