Physics based vision attempts to model and invert light transport in order to extract information (such as 3D shape and reflectance properties) about a scene from one or more images. In order for the inversion of the model to be tractable, many simplifying assumptions about the physics are made that may or may not hold in practice.
On the other-hand, learning based vision ignores the underlying physics and instead models observations of the world statistically. A prime example of this is deep learning, which has recently revolutionized computer vision tasks such as classification, detection, and segmentation.
These two approaches to vision have traditionally been relatively disjoint, but are beginning to see some overlap. This thesis extends the state-of-the-art on both sides as well as brings them closer together.
First the novel use of imaging fluorescence for 3D reconstruction from shape from shading and photometric stereo is proposed. This is achieved by leveraging the previously unexploited fact that fluorescence emission is isotropic making it an ideal input for algorithms that assume Lambertian reflectance. In addition, fluorescence can be combined with reflectance to resolve the Generalized Bas relief ambiguity in uncalibrated photometric stereo. Furthermore, it is observed that when a material fluoresces a different color than it reflects, inter-reflections do not exist, which typically causes problems for photometric stereo.
Second, photometric stereo is extended to work in participating media by accounting for how scattering affects image formation. The first insight is that in this situation fluorescence can be used to optically remove backscatter which significantly improves the signal-to-noise ratio compared to image subtraction methods. Second, it is justified, through extensive simulations, that forward scatter from the light to the object can be calibrated out and effectively ignored. Finally, using deconvolution to handle forward scatter blur from the object to the camera, a phenomenon which is often ignored in computer vision, is proposed.
Next the problem of single image dynamic refractive distortion correction is tackled. Previous work has attacked this problem using physics based approaches and as such requires additional information, such as high frame rate video or templates, to handle its under-constrained nature. Instead, using deep learning to learn image and distortion priors which can be used to undistort a single image is proposed. The initial attempt to train the model using synthetically generated data failed to generalize to real data, so instead a special new large scale dataset for this problem was collected.
Finally, the failure to train the model using synthetic data prompted the investigation of domain adaptation. A novel framework for unsupervised domain adaptation building off the ideas of adversarial discriminative feature matching and image-to-image translation is proposed. Many previous works can be seen as special cases of this general framework. The method is validated by achieving state-of-the-art results on common domain adaptation benchmarks, but may be particularly useful for traditionally physics based problems where synthetic data is easy to generate but real data is hard to annotate.