Search

Scholarly Works (19 results)

Sort By:

Show:

Thesis
Peer Reviewed

On Hybrid Methods that Blend Computer Vision and Physics

Ba, Yunhao
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2023)

Deep learning has exhibited remarkable performance on various computer vision tasks. However, these models usually suffer from the generalization issue when the training sets are not sufficiently large or diverse. Human intelligence, on the other hand, is capable of learning with a few samples. One of the potential reasons for this is that we use other prior knowledge to generalize to new environments and unseen data, as opposed to learning everything from the provided training sets. We aim to enable machines with such capability. More specifically, we focus on integrating different types of prior physical knowledge and inductive biases into neural networks for various computer vision applications.

The core idea is to exploit physical models as inductive biases and design specific strategies to blend them with the neural network learning process. This problem is difficult since we need to consider both the fidelity of our prior knowledge and the quality of the training samples. To validate the effectiveness of the proposed blending strategies, extensive experiments have been conducted on multiple computer vision tasks, such as Shape from Polarization (SfP), remote photoplethysmography (rPPG), and single-image rain removal.

Cover page: On Hybrid Methods that Blend Computer Vision and Physics

Thesis
Peer Reviewed

Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Jiang, Sicheng
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2023)

3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework leads to warp-level divergence. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. Project website at: https://feature-3dgs.github.io/.

Cover page: Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

Thesis
Peer Reviewed

Diverse R-PPG: Contactless Smartphone Camera-based Heart Rate Estimation for Diverse Skin Tones and Scenes

Kabra, Krish
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2021)

Heart rate (HR) is an essential clinical measure for the assessment of cardiorespiratory instability. The growing telemedicine market opens up the urgent requirement for scalable yet affordable remote HR estimation. Smartphones that use in-built camera modules to measure HR from facial videos offer a more economical solution in comparison to mass deployment of wearable sensors. However, existing computer vision methods that estimate HR from facial videos exhibit biased performance against dark skin tones. This is a major concern, since communities of color are disproportionately affected by both COVID-19 and cardiovascular disease. We identify the origin of this bias and present a novel physics-driven algorithm that boosts performance on darker skin tones in our reported data. We assess the performance of our method through the creation of the first telemedicine-focused remote vital signs dataset, the VITAL dataset. 472 videos (~944 minutes) of 59 subjects with diverse skin tones are recorded under realistic scene conditions with corresponding vital sign data. Our method reduces errors due to lighting changes, shadows, and specular highlights and imparts unbiased performance gains across skin tones, setting the stage for making non-contact HR sensing technologies a viable reality for patients across skin tones, using just smartphone cameras.

Cover page: Diverse R-PPG: Contactless Smartphone Camera-based Heart Rate Estimation for Diverse Skin Tones and Scenes

Thesis
Peer Reviewed

SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

Muttukuru, Sairisheek
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2024)

The problem of novel view synthesis has grown significantly in popularity recently withthe introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. This work proposes a method to enable training coherent 3DGS-based radiance fields of 360° scenes from sparse training views. Depth priors are integrated with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that this method outperforms base 3DGS by 6.4% in LPIPS and by 12.2% in PSNR, and NeRF-based methods by at least 17.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost. Project website at: https://tinyurl.com/sparsegs.

Cover page: SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting

Thesis
Peer Reviewed

Discovering the Invisible from Visual Data

Talegaonkar, Chinmay
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2021)

This thesis attempts at discovering features from visual data which cannot be obtained or observed directly using standard computer vision algorithms. We refer to these features as Invisible features. In particular, we focus on two types of invisible features. First, the physics of a scene which governs the visual cues for the objects in the scene. In this part of the thesis, we teach a machine to discover the laws of physics from video streams. We assume no prior knowledge of physics, beyond a temporal stream of bounding boxes. The problem is very difficult because a machine must learn not only a governing equation (e.g. projectile motion) but also the existence of governing parameters (e.g. velocities). Second, texture information that is invisible in standard RGB images but can be seen in other imaging modalities such as polarization. Texture in some scenes is challenging for intensity images to capture. Imagine a black car in shadow or an oil slick on road. We exploit this adversity of contrast in the intensity domain to adapt a new representation for polarization cues, proposing a new degree of linear polarization (DOLP) that has favorable statistical properties. The new representation of DOLP we obtain is not only more robust in the context of noise, but can also preserve the scientific information in the original DOLP that allows geometry and photoelastic effects to be discerned. We hope this work lays a foundation for the future of a Polarized ISP process, particularly for sensor fusion applications.

Cover page: Discovering the Invisible from Visual Data

Thesis
Peer Reviewed

Combining Physics with Machine Learning: Case Study of Shape from Polarization

Ba, Yunhao
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2019)

Shape from Polarization (SfP) recovers an object's shape from polarized photographs of the scene. In previous works, the SfP algorithms use idealized physical equations to recover the shape. These previous approaches are error-prone when real-world conditions deviate from the idealized physics. In this thesis, we propose a physics-based neural network to address the SfP problem. Our algorithm fuses deep learning with synthetic renderings (derived from physics) to exceed the quality of all previous SfP methods. A two-stage encoder is used to resolve the longstanding problem of ambiguities. Our results of surface normal recovery are an improvement upon methods that utilize physics-based solutions alone.

Cover page: Combining Physics with Machine Learning: Case Study of Shape from Polarization

Thesis
Peer Reviewed

Robust and Equitable Non-Contact Health Sensing

Vilesov, Alexander
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2023)

Contactless vital sensing is gaining prominence with applications in disease control, health monitoring, and medicine; airports are beginning to use infrared thermometers to screen for fevers, automobile companies are researching how cars can wirelessly detect drowsy drivers, and the medical field is exploring the benefits of how cameras can be used to remotely monitor neonates or detect diseases such as atrial fibrillation or sleep apnea. However, prior research to a large extent has not explored when remote vital sensing methods fail and if they may be disadvantageous to certain physiologies more than others such as age, weight, or gender. New methods in the field should strive to determine the impact of these variables as well as rectify inaccuracies in sensing that may occur if possible. This work explores how skin tone can adversely impact heart-rate detection with cameras and temperature evaluation with thermal cameras. Multimodal fusion and algorithmic techniques are proposed to improve skin tone equity while improving performance of contactless vital sensing methods.

Cover page: Robust and Equitable Non-Contact Health Sensing

Thesis
Peer Reviewed

Polarization-Informed Non-Line-of-Sight Imaging on Diffuse Surfaces

Hassan, Bakari
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2019)

Non-line-of-sight (NLOS) imaging has relevance in search

amp; rescue, medical imaging, remote sensing, and robotics. Although NLOS methods are maturing, NLOS with normal cameras generally requires special occluders in the scene to remove light transport ambiguity. In this paper, it is shown that polarization reveals unique information about occluded environments, and computation in the polarization domain has sparsity benefits that aid the inverse problem. This is demonstrated via non-line-of-sight imaging on rough, everyday surfaces such as office/home walls. If successful, it has the potential to enable direct and indirect occluded light source discrimination and passive shape recovery of hidden objects via shape from polarization.

Cover page: Polarization-Informed Non-Line-of-Sight Imaging on Diffuse Surfaces

Thesis
Peer Reviewed

Diverse Patient Heart Rate Monitoring Using Consumer Camera Systems

Chari, Pradyumna
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2021)

Real world scenes and objects have diverse visual appearance. Such diversity stems from the fundamental physics in how light interacts with matter, across different weather conditions, object types, and even people. These appearance variations mesmerize human beings, but puzzle artificial vision systems, which cannot generalize to such diversity. Through this thesis, we look at one such case of biased performance over diversity- camera based remote heart rate (HR) estimation. HR is an essential clinical measure for the assessment of cardiorespiratory instability. The growing telemedicine market opens up the urgent requirement for scalable yet affordable remote HR estimation. However, existing computer vision methods that estimate HR from facial videos exhibit biased performance against dark skin tones. This is a major concern, since communities of color are disproportionately affected by both COVID-19 and cardiovascular disease. We identify and model the origin of this bias and present a novel physics-driven algorithm that boosts performance on darker skin tones in our reported data. We assess the performance of our method through the creation of the first telemedicine-focused remote vital signs dataset, the VITAL dataset. 432 videos (~864 minutes) of 54 subjects with diverse skin tones are recorded under realistic scene conditions with corresponding vital sign data. Our method mitigates errors due environmental conditions and imparts unbiased performance gains across skin tones, setting the stage for making non-contact HR sensing technologies a viable reality for patients across skin tones.

Cover page: Diverse Patient Heart Rate Monitoring Using Consumer Camera Systems

Thesis
Peer Reviewed

Deep Dive into Mitigating Bias in Remote Plethysmography: A Software and Hardware Approach

Kulkarni, Kimaya Milind
Advisor(s): Kadambi, Achuta

UCLA Electronic Theses and Dissertations (2023)

Camera Imaging-based plethysmography (iPPG) is a rapidly advancing field with great potential for non-invasive physiological monitoring. However, current methods using cameras for iPPG encounter challenges in maintaining consistent performance across various skin tones. This thesis tackles the inherent bias in iPPG and introduces a novel software-driven approach to mitigate this bias. We demonstrate the effectiveness of a well-designed spatial weighting scheme to enhance the quality of estimated plethysmography signals. However, the improvements observed in the software approach are constrained by the bias present in the RGB camera modality itself. In our efforts to address bias, we introduce a pioneering RGB+Radar sensing stack to mitigate bias in the sensing process. Our innovative approach significantly boosts the accuracy of remote plethysmography (rPPG) measurements for individuals with darker skin tones while simultaneously improving performance for those with lighter skin tones. This research represents a significant step towards achieving fair and dependable physiological monitoring across a wide range of skin tones, thereby expanding the influence and accessibility of rPPG technology.

Cover page: Deep Dive into Mitigating Bias in Remote Plethysmography: A Software and Hardware Approach