- Main
Building Efficient Vision Models for Ecological and Earth Observation Studies
- Kumar, Satish
- Advisor(s): Manjunath, B.S.
Abstract
Numerous large vision models for natural images, such as SAM, Florence-2, and GPT-4, have achieved state-of-the-art (SOTA) performance, largely due to vast amounts of image and text data available online. Smaller models like EfficientSAM and CLIP have also shown the potential of achieving significant results with comparatively less data. However, real-world scientific problems, particularly in remote sensing, present unique challenges due to the complexity of the data and scarcity of annotations. These problems often require data from multiple sources, such as hyperspectral sensors on airplanes and multispectral sensors on satellites, which are expensive and time-consuming to acquire.This dissertation addresses the key question: how can large vision models be built and trained effectively under data constraints? The proposed solution involves integrating domain-specific knowledge into large vision models, specifically vision transformers, to optimize their performance and training efficiency. By incorporating core signal processing techniques, domain-specific knowledge is encoded as prior information, guiding the feature extraction process and refining randomly initialized queries via a query refiner module. This approach accelerates convergence with limited training data. Three key applications are explored: (1) methane detection in remote sensing from aerial imagery, (2) animal detection and classification in large grasslands for ecological studies, and (3) estimation of physiological signals such as ECG and ISTI for stress assessment in biomedical contexts. This research establishes an optimal methodology for embedding domain-specific knowledge into deep learning models, thereby enhancing performance in data-limited environments. It provides valuable insights for improving the applicability of vision transformer-based models across various domains, contributing to advancements in computer vision research and its practical real-world applications.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-