In the context of computer vision and artificial intelligence, scene understanding is a complex task that involves a combination of image analysis, pattern recognition, context modeling, and often relies on advanced machine learning techniques. Scene understanding has applications in various fields, including autonomous navigation, robotics, surveillance, augmented reality, and more, where a deeper understanding of visual scenes is crucial for making informed decisions and interacting effectively with the environment. This dissertation aims to contribute to two applications: assistive robotics and precision agriculture.
The first part of the dissertation focuses on the emerging research area of infant action recognition, specifically targeting the task of reaching—a significant developmental milestone. Existing action recognition models primarily cater to adults, leaving a gap in pediatric applications. To bridge this gap, BabyNet, a lightweight network, is introduced. It employs annotated bounding boxes to capture spatial and temporal relationships, accurately detecting reaching onset, offset, and complete actions. BabyNet outperforms larger networks and provides more accurate detection compared to smaller networks. Challenges emerge due to the dataset's limited perspectives and the reliance on the detector network's performance, which can hinder recognition in specific scenarios. To overcome these limitations, E-BabyNet is proposed. It employs LSTM and Bidirectional LSTM layers to assess reaching actions and deliver precise onset and offset keyframes, handling transitions between actions effectively. E-BabyNet's evaluation against other models demonstrates its ability to accurately detect reaching actions and minimize false detections.
The second part focuses on the automation of leaf sampling known to provide highly accurate and timely information to growers. The first step in such automated procedure is identifying and cutting a leaf from a tree. For this purpose, we present a novel approach for leaf detection and localization using 3D point cloud to support robotic plant phenotyping. The preliminary results highlight successful indoor and outdoor leaf detection and localization. To better assess our approach, an actuation-perception framework was integrated with two distinct end-effectors, electrical- and pneumatic-based. Both end-effectors underwent evaluation within an indoor environment. Based on the obtained results, further experiments were conducted in the field, exclusively employing the pneumatic end-effector. The findings demonstrate the adaptability of the perception module to various designs, yielding precise detection and localization of candidate leaves. In addition, the design improvements lead to better performance in leaf-clustered scenarios and enhancement of capture and cutting success rates.