Search

Scholarly Works (45 results)

Sort By:

Show:

Thesis
Peer Reviewed

Window Mask for Possible Object Location Generation

Xiang, Yang
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2013)

Possible object location generation is an important pre-process for most object detection algorithms. In this thesis, we design a window sampling algorithm to address possible object location generation problem in two steps. First, we use a two-phase feature space partition method to achieve local descriptor classification and find interest points on image which have high probability to be on object of interest. Then we introduced a way to learn the relationship between object bounding window and bag-of-words representation of local image region, with which we can sample windows that are highly possible to contain an object. We implement the algorithm in MATLAB and test it on Graz-02 dataset, which has three object categories: car, bike, human. The algorithm achieves state-of-the-art performance according to coverage, window quality, number of windows and running time. The MATLAB scripts are merged into one file called ``WindowMaskCode.pdf'', which can be found in supplementary files.

1 supplemental PDF

Cover page: Window Mask for Possible Object Location Generation

Thesis
Peer Reviewed

Visual-Inertial Perception for Robotic Tasks

Tsotsos, Konstantine
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2016)

In spite of extensive consumer interest in domains such as autonomous driving, general purpose visual perception for autonomy remains a challenging problem. Inertial sensors such as gyroscopes and accelerometers, commonplace on smartphones today, offer complementary capabilities to visual sensors, and robotic systems typically fuse information from both visual and inertial sensors to further enhance their capabilities. In this thesis, we explore models and algorithms for constructing a representation from the fusion of visual and inertial data with the goal of supporting autonomy.

We first analyze the observability properties of the standard model for visual-inertial sensor fusion, which combines Structure-from-Motion (SFM, or Visual SLAM) with inertial navigation, and overturn the common consensus that the model is observable. Informed by this analysis we develop robust inference techniques to enable our real-time visual-inertial navigation system implementation to achieve state-of-the-art relative motion estimation performance in challenging and dynamic environments. From the information provided by this process, we construct a representation of the agent's environment to act as a map that significantly improves location search time relative to standard methods, enabling long-term consistent localization within a constrained environment in real-time using commodity computing hardware. Finally, to allow an autonomous agent to reason about its world at a granularity relevant for interaction, we construct a dense surface representation upon this consistent map and develop algorithms to segment the surfaces into objects of potential relevance to the agent's task.

Cover page: Visual-Inertial Perception for Robotic Tasks

Thesis
Peer Reviewed

Unlearning and Privacy in Deep Neural Networks

Golatkar, Aditya
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2023)

We explore the problem of selectively forgetting (or unlearning) a particular subset of the data used for training a deep neural network. While the effects of the data to be forgotten can be hidden from the output of the network, insights may still be gleaned by probing deep into its weights. We propose methods for ``scrubbing'' the weights clean of information about a particular set of training data without requiring retraining from scratch. In the ``white-box'' setting, the weights are modified so that any probing function of the weights is indistinguishable from the same function applied to the weights of a network trained without the data to be forgotten. This condition is a generalized and weaker form of Differential Privacy. Then we improve upon the white-box forgetting method by generalizing it across different readout functions, and show that it can be extended to ensure forgetting in the final activations of the network in a ``black-box'' setting. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the final activations and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the final activations. To improve the deterministic part of the forgetting procedure, we present the first method for linearizing a pre-trained model (Linear Quadratic Fine-tuning) that achieves comparable performance to non-linear fine-tuning on most of real-world image classification tasks tested, thus enjoying the interpretability of linear models without incurring punishing losses in performance. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification which is sufficient to approach the performance of non-linear fine-tuning. We use this to introduce a novel notion of forgetting in mixed-privacy setting, where we know that a ``core'' subset of the training samples does not need to be forgotten. While this variation of the problem is conceptually simple, we show that working in this setting significantly improves the accuracy and guarantees of forgetting methods applied to vision classification tasks. Moreover, our method allows efficient removal of all information contained in non-core data by simply setting to zero a subset of the weights with minimal loss in performance and can achieve close to the state-of-the-art accuracy on large scale vision tasks. To cover the other end of the privacy spectrum, we introduce AdaMix, an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. AdaMix incorporates few-shot training, or cross-modal zero-shot learning, on public data prior to private fine-tuning, to improve the trade-off. AdaMix reduces the error increase from the non-private upper bound from the 167-311\% of the baseline, on average across 6 datasets, to 68-92\% depending on the desired privacy level selected by the user. AdaMix tackles the trade-off arising in visual classification, whereby the most privacy sensitive data, corresponding to isolated points in representation space, are also critical for high classification accuracy. In addition, AdaMix comes with strong theoretical privacy guarantees and convergence analysis.

Cover page: Unlearning and Privacy in Deep Neural Networks

Thesis
Peer Reviewed

LookTel --- Computer Vision Applications for the Visually Impaired

Sudol, Jeremi
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2013)

The modern world is predominantly visual. While most people live and function with vision, many people live their lives with limited or no vision at all. The system presented in this dissertation aims to use computer vision techniques to improve the lives of individuals having severe visual impairments or blindness. By applying real-time image recognition in an accessible manner the system facilitates increased personal independence. The system presented here substantially advances the state of the art both in terms of the speed, accuracy, and robustness to the real-world variability. The real-time on-device recognition speed enables more than just object identification; it permits detection -- finding the desired item among others. The high accuracy and robustness expands the ability for people to participate in commerce and expand roles for economic participation in multiple cultures. We present an extensive evaluation demonstration and a real-world public release of the system.

Cover page: LookTel --- Computer Vision Applications for the Visually Impaired

Thesis
Peer Reviewed

Unsupervised Learning for Object Representations by Watching and Moving

Yang, Yanchao
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2019)

The power of deep neural networks comes mainly from huge labeled datasets. Even though it shines on many computer vision tasks, supervised learning bears little hope to hack into the core of intelligent visual systems. On the other side, unsupervised learning is believed to be the future of AI; however, its performance is always inferior compared to the supervised counterpart. The goal of our research is to develop unsupervised learning algorithms for computer vision tasks while matching or even outperforming the supervised ones. Our key is a representation that is as informative as the supervisory labels, which can be constructed from an unlimited amount of unlabeled data. In theory, this representation contains richer information than the processed supervisory signal. Moreover, we develop algorithms that can utilize existing labeled datasets to expedite the information extraction from the unlimited unlabeled data. Our research is lined up in an order similar to the visual development in early infancy, such that we can also investigate the interplay between different visual functionalities. The final goal is to develop a robotic visual system akin to a human's, that can automatically acquire semantics from concepts of objects fostered by basic perceptions of motion and depth with the minimum amount of human supervision.

Cover page: Unsupervised Learning for Object Representations by Watching and Moving

Thesis
Peer Reviewed

Perspective Distortion Modeling in Face Images and Object Tracking Library

Valente, Joachim
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2014)

In a first chapter we describe a method to model perspective distortion as a one- parameter family of warping functions. This can be used to mitigate its effects on visual recognition, or interactively manipulate the perceived personality. The warps are learned from a novel face dataset and, by comparing orbits spanned by images instead of images themselves, we improve face recognition when small focal lengths are used. Additional applications are presented to image editing, videoconference, and multi-view validation of recognition systems.

A second chapter is devoted to a new versatile and modular open-source cross- platform online object tracking library, designed to be easily usable by the vision community. Object tracking plays a central part in a number of vision problems, and there is no, to date, a ready-to-use and extensible tracking library at the object level.

Cover page: Perspective Distortion Modeling in Face Images and Object Tracking Library

Thesis
Peer Reviewed

Emergent Properties of Deep Neural Networks

Achille, Alessandro
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2019)

We show that information theoretic quantities can be used to control and describe the training process of Deep Neural Networks, and can explain how properties, such as invariance to nuisance variability and disentanglement of semantic factors, emerge naturally in the learned representation. Through its dynamics, stochastic gradient descent (SGD) implicitly regularizes the information in the weights, which can then be used to bound the generalization error through the PAC-Bayes bound. Moreover, the information in the weights can be used to defined both a topology and an asymmetric distance in the space of tasks, which can then be used to predict the training time and the performance on a new task given a solution to a pre-training task. While this information distance models difficulty of transfer in first approximation, we show the existence of non-trivial irreversible dynamics during the initial transient phase of convergence when the network is acquiring information, which makes the approximation fail. This is closely related to critical learning periods in biology, and suggests that studying the initial convergence transient can yield important insight beyond those that can be gleaned from the well-studied asymptotics.

Cover page: Emergent Properties of Deep Neural Networks

Thesis
Peer Reviewed

Uncertainty Calibration for Robotic Navigation and Vision

Tsuei, Stephanie
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2023)

We experimentally demonstrate that an uncertainty-aware framework for robotic navigation and vision is able to navigate a robotic platform around an obstacle without choosing an overly long and conservative path. The framework contains many interconnected experimental pieces, including monocular visual-inertial odometry (VIO) based on an Extended Kalman Filter (EKF), a recurrent neural network that predicts future covariance estimates from the EKF, a model predictive controller that uses uncertainty in its cost function, and an object detection network. Each interconnected piece is an algorithm that makes assumptions about its inputs, which are the outputs of another piece. The rest of the thesis is a systems validation exercise that examines several of these assumptions for validity and finds that they are largely not true. First, we learn that uncertainty estimates of the commonly used EKF are overconfident, but that the overconfidence is systematic and correctable. Next, we examine the distribution of feature track errors and find that not only are the errors not zero-mean Gaussian, the errors are dependent on motion type, speed, and the type of feature tracking algorithm used. We then quantify the effect of attribution errors, Gaussian noise, and drift on performance and uncertainty estimates of the VIO algorithm used in the framework. Finally, we attempt to characterize the uncertainty of image classification networks in a manner appropriate for online navigation. To our knowledge, the proposed architecture is new and this dissertation is the first time a systems validation exercise has focused on uncertainty estimation.

Cover page: Uncertainty Calibration for Robotic Navigation and Vision

Thesis
Peer Reviewed

Sampling Algorithms to Handle Nuisances in Large-Scale Recognition

Karianakis, Nikolaos
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2017)

Convolutional neural networks (CNNs) have risen to be the de-facto paragon for detecting the presence of objects in a scene, as portrayed by an image. CNNs are described as being "approximately invariant" to nuisance transformations such as planar translation, both by virtue of their convolutional architecture and by virtue of their approximation properties that, given sufficient parameters and training data, could in principle yield discriminants that are insensitive to nuisance transformations of the data. The fact that contemporary deep convolutional architectures appear very effective in classifying images as containing a given object regardless of its position, scale, and aspect ratio in large-scale benchmarks suggests that the network can effectively manage such nuisance variability. We conduct an empirical study and show that, contrary to popular belief, at the current level of complexity of convolutional architectures and scale of the data sets used to train them, CNNs are not very effective at marginalizing nuisance variability.

This discovery leaves researchers the choice of investing more effort in the design of models that are less sensitive to nuisances or designing better region proposal algorithms in an effort to predict where the objects of interest lie and center the model around these regions. In this thesis steps towards both directions are made. First, we introduce DSP-CNN, which deploys domain-size pooling in order to transform the neural networks to be scale invariant in the convolutional operator level. Second, motivated by our empirical analysis, we propose novel sampling and pruning techniques for region proposal schemes that improve the end-to-end performance in large-scale classification, detection and wide-baseline correspondence to state-of-the-art levels. Additionally,since a proposal algorithm involves the design of a classifier, whose results are to be fed to another classifier (a Category CNN), it seems natural to leverage on the latter to design the former. Thus, we introduce a method that leverages on filters learned in the lower layers of CNNs to design a binary boosting classifier for generating class-agnostic proposals. Finally, we extend sampling over time by designing a temporal, hard-attention layer which is trained with reinforcement learning, with application in video sequences for person re-identification.

Cover page: Sampling Algorithms to Handle Nuisances in Large-Scale Recognition

Thesis
Peer Reviewed

Learning Task-sufficient Representation of Video Dynamics

Bei, Xinzhu
Advisor(s): Soatto, Stefano

UCLA Electronic Theses and Dissertations (2022)

This dissertation provides a generic solution to model dynamic systems whose hidden state and the transition model are unknown in practice. We build the task-sufficient filtering framework to maintain a finite, abstract, and learnable representation (memory) that is sufficient to update itself, casually and iteratively, and to predict downstream task variables of interest. We show our realization of the framework by recurrent neural networks as universally-approximating function classes to imitate the functionality of a state transition model and a task prediction model.

In addition, we provide practical methodologies to impose generic priors of the physical scene on the hidden representation. We leverage (lower-level) topological and regularity constraints of natural images, such as occlusion relations, to define object regions. Hence, we capture the motion priors associated with different (higher-level) semantic categories, that are combined to describe the dynamics of the whole scene.

The framework takes videos as sequential input streams and produces representations of video dynamics. We show the success of our framework by applying it to solve real-world computer vision tasks, including generic object tracking and video prediction. The learned dynamic models are extensible to multiple circumstances requiring a dynamically and casually updated memory with uncertainty.

Cover page: Learning Task-sufficient Representation of Video Dynamics