Social robots are becoming a part of everyday life, from household smart companions to education and healthcare assistants, assembly robots in factories and museum guide robots, they adapt different appearances and exhibit a range of capabilities and duties. These robots are benefiting from machine perception systems that automatically recognize people, recognize their facial expressions, and the surrounding environment. In this thesis, we focus on RUBI, a social robot designed to interact with toddlers in classroom and enrich the early childhood education environments. We develop perception algorithms for RUBI to recognize faces while interacting with toddlers in the classroom. We use facial recognition, along with facial expression recognition to analyze the social structure of the classroom and to monitor the emotional development of toddlers in the classroom. In the first two chapters of this thesis, we show that RUBI discovers useful aspects of the social structure of the toddler's group, as well as the children's preferences for different activities and toddler's preferences to play with or to avoid playing with other specific children. These studies illustrate that social robots may become a useful tool in early childhood education to discover socio-emotional patterns over time and to monitor toddlers development.
In the next chapters, we focus on active object recognition in RUBI. While interacting with children, RUBI can teach object names to children, to extend their vocabulary and monitor their learning skills. In chapter 3, we introduce GERMS, a dataset designed to accelerate progress on active object recognition in the context of human robot interaction. In chapter 4, a deep neural network is used for joint prediction of the object label and the action. A generative model of object similarities based on the Dirichlet distribution is proposed and embedded in the network for encoding the state of the system. In chapter 5, we propose a method for supervised learning of active object recognition, in which we train a Long Short Term Memory (LSTM) network to predict the best next action on the training set rollouts. We show improved recognition performance by optimizing the observation function and retraining the supervised LSTM network.