Search

Scholarly Works (738 results)

Sort By:

Show:

Thesis
Peer Reviewed

High resolution measurements of kinesin-driven microtubule motility

Malik, Fady,

UC San Francisco Electronic Theses and Dissertations (1993)

Cover page: High resolution measurements of kinesin-driven microtubule motility

Thesis
Peer Reviewed

Advances in Machine Learning: Nearest Neighbour Search, Learning to Optimize and Generative Modelling

Li, Ke
Advisor(s): Malik, Jitendra

UC Berkeley Electronic Theses and Dissertations (2019)

Machine learning is the embodiment of an unapologetically data-driven philosophy that has increasingly become one of the most important drivers of progress in artificial intelligence and beyond. Existing machine learning methods, however, entail making trade-offs in terms of computational efficiency, modelling flexibility and/or formulation faithfulness. In this dissertation, we will cover three different ways in which limitations along each axis can be overcome, without compromising on other axes.

Computational Efficiency

We start with limitations on computational efficiency. Many modern machine learning methods require performing large-scale similarity search under the hood. For example, classifying an input into one of a large number of classes requires comparing the weight vector associated with each class to the activations of the penultimate layer, attending to particular memory cells of a neural net requires comparing the keys associated with each memory cell to the query, and sparse recovery requires comparing each dictionary element to the residual. Similarity search in many cases can be reduced to nearest neighbour search, which is both a blessing and a curse. On the plus side, the nearest neighbour search problem has been extensively studied for more than four decades. On the minus side, no exact algorithm developed over the past four decades can run faster than naive exhaustive search when the intrinsic dimensionality is high, which is almost certainly the case in machine learning. Given this state of affairs, should we give up any hope of doing any better than the naive approach of exhaustive comparing each element one-by-one?

It turns out this pessimism, while tempting, is unwarranted. We introduce a new family of exact randomized algorithms, known as Dynamic Continuous Indexing, which overcomes both the curse of ambient dimensionality and the curse of intrinsic dimensionality: more specifically, DCI simultaneously achieves a query time complexity with a linear dependence on ambient dimensionality, a sublinear dependence on intrinsic dimensionality and a sublinear dependence on dataset size. The key insight is that the curse of intrinsic dimensionality in many cases arises from space partitioning, which is a divide-and-conquer strategy used by most nearest neighbour search algorithms. While space partitioning makes intuitive sense and works well in low dimensions, we argue that it fundamentally fails in high dimensions, because it requires distances between each point and every possible query to be approximately preserved in the data structure. We develop a new indexing scheme that only requires the ordering of nearby points relative to distant points to be approximately preserved, and show that the number of out-of-place points after projecting to just a single dimension is sublinear in the intrinsic dimensionality. In practice, our algorithm achieves a 14 - 116x speedup and a 21x reduction in memory consumption compared to locality-sensitive hashing (LSH).

Modelling Flexibility

Next we move onto probabilistic modelling, which is critical to realizing one of the central objectives of machine learning, which is to model the uncertainty that is inherent in prediction. The community has wrestled with the problem of how to strike the right balance between modelling flexibility and computational efficiency. Simple models can often be learned straightforwardly and efficiently but are not expressive; complex models are expressive, but in general cannot be learned both exactly and efficiently, often because learning requires evaluating some intractable integral. The success of deep learning has motivated the development of probabilistic models that can leverage the inductive bias and modelling power of deep neural nets, such as variational autoencoders (VAEs) and generative adversarial nets (GANs), which belong to a subclass of probabilistic models known as implicit probabilistic models. Implicit probabilistic models are defined by a procedure from drawing samples from them, rather than an explicit of the probability density function. On the positive side, sampling is always easy by definition; on the negative side, learning is difficult because not even the unnormalized complete likelihood can be expressed analytically. So these models must be learned using likelihood-free methods, but none have been shown to be able to learn the underlying distribution with a finite number of samples.

Perhaps the most popular likelihood-free method is the GAN. Unfortunately, GANs suffer from the well-documented issue of mode collapse, where the learned model (generator in the GAN parlance) cannot generate some modes of the true data distribution. We argue this arises from the direction in which generated samples are matched to the real data. Under the GAN objective, each generated sample is made indistinguishable from some data example. Some data examples may not be chosen by any generated sample, resulting in mode collapse. We introduce a new likelihood-free method, known as Implicit Maximum Likelihood Estimation (IMLE) that overcomes mode collapse by inverting the direction - instead of ensuring each generated sample has a similar data example, our method ensures that each data example has a similar generated sample. This can be shown to be equivalent to maximizing a lower bound on the log-likelihood when the model class is richly parameterized and the density is smooth in parameters and data, hence the name.

Compared to VAEs, which are not likelihood-free, IMLE eliminates the need for an approximate posterior and avoids the bias towards parameters where the true posteriors are less informative, a phenomenon known as "posterior collapse''.

Formulation Faithfulness

Finally we introduce a novel formulation that can enable the automatic discovery of new iterative gradient-based optimization algorithms, which have become the workhorse of modern machine learning. This effectively allows us to apply machine learning to improve machine learning, which has been a dream of machine learning researchers since the early days of the field. The key challenge, however, is that it is unclear how to represent a complex object like an algorithm in a way that is amenable to machine learning. Prior approaches represent algorithms as imperative programs, i.e.: sequences of elementary operations, and therefore induces a search space whose size is exponential in the length of the optimal program. Searching in this space is unfortunately not tractable for anything but the simplest and shortest algorithms. Other approaches enumerate a small set of manually designed algorithms and search for the best algorithm within this set. Searching in this space is tractable, but the optimal algorithm may lie outside this space. It remains an open question as to how to parameterize the space of possible algorithms in a way that is both complete and efficiently searchable.

We get around this issue by observing that an optimization algorithm can be uniquely characterized by its update formula - different iterative optimization algorithms only differ in their choice of the update formula. In gradient descent, for example, it is taken to be a scaled negative gradient, whereas in gradient descent with momentum, it is taken to be a scaled exponentially-weighted average of the history of gradients. Therefore, if we can learn the update formula, we can then automatically discover new optimization algorithms. The update formula can be formulated as a mapping from the history of gradients, iterates and objective values to the update step, which can be approximated with a neural net. We can then learn the optimization algorithm by learning the parameters of the neural net.

Cover page: Advances in Machine Learning: Nearest Neighbour Search, Learning to Optimize and Generative Modelling

Thesis
Peer Reviewed

Learning to Reconstruct 3D Objects

Kar, Abhishek
Advisor(s): Malik, Jitendra

UC Berkeley Electronic Theses and Dissertations (2017)

Ever since the dawn of computer vision, 3D reconstruction has been a core problem, inspiring early seminal works and leading to numerous real world applications. Much recent progress in the field however, has been driven by visual recognition systems powered by statistical learning techniques - more recently with deep convolutional neural networks (CNNs). In this thesis, we attempt to bridge the worlds of geometric 3D reconstruction and learning based recognition by learning to leverage various 3D perception cues from image collections for the task of reconstructing 3D objects.

In Chapter 2, we present a system that is able to learn intra-category regularities in object shapes by building category-specific deformable 3D models from 2D recognition datasets enabling fully automatic single view 3D reconstruction for novel instances. In Chapter 3, we demonstrate how predicting the amodal extent of objects in images and reasoning about their co-occurrences can help us infer their real world heights. Finally, in Chapter 4, we present Learnt Stereo Machines (LSM), an end-to-end learnt framework using convolutional neural networks, which unifies a number of paradigms in 3D object reconstruction- single and multi-view reconstruction, coarse and dense outputs and geometric and semantic reasoning. We will conclude with several promising future directions for learning based 3D reconstruction.

Cover page: Learning to Reconstruct 3D Objects

Article
Peer Reviewed

Strategies for tone identification in observers with absolute pitch

Proceedings of the Annual Meeting of the Cognitive Science Society, Volume 33 (2011)

Thesis
Peer Reviewed

Bridging the Gap between Humans and Machines in 3D Object Perception

Collins, Jasmine
Advisor(s): Malik, Jitendra

UC Berkeley Electronic Theses and Dissertations (2023)

Humans possess a remarkable ability to extract general object representations from a single image, capturing not only shape and texture, but also 3D form. In contrast, 3D reasoning in many computer vision systems is often limited. This thesis present three efforts aimed towards bridging this gap in 3D object perception. First we introduce a new dataset that focuses on real-world, object-centered 3D understanding. The dataset provides a diverse set of objects corresponding to real household objects, with varying geometries and physically-based rendering materials. It also includes additional annotations describing each object, making it a valuable resource for training and evaluating computer vision models. Next, we design a method for automatically inferring the articulation of 3D objects. The method enables the interaction of 3D objects and can be used to generate more realistic and dynamic scenes. By understanding how different parts of an object move and interact, computer vision systems can better model and reason about complex 3D scenes in simulation. Finally, we investigate the effectiveness of contrastive learning with 3D data augmentation to generate multiple views of objects, a departure from the typical method of training single view images. We show that generating multiple views of objects can help computer vision systems learn better representations and improve their overall object understanding in terms of classification and shape perception.These contributions represent efforts towards bridging the gap between human and machine 3D object perception, ultimately enabling them to understand 3D objects from single images in ways that are more aligned with human perception.

Cover page: Bridging the Gap between Humans and Machines in 3D Object Perception

Article

A Machine Vision Based Surveillance System for California Roads

Working Papers (1995)

In this report we address the problem of automation of heavy-duty vehicles. After a brief description of the dynamic model used in our design and simulations, we develop nonlinear controllers with adaptation, first for speed control and then for vehicle follower longitudinal control. We consider both autonomous operation as well as intervehicle communication, and evaluate the performance of our controllers in several different scenarios through simulation.

Cover page: A Machine Vision Based Surveillance System for California Roads

Article

Robust Computation of Optical Flow in a Multi-Scale Differential Framework

Working Papers (1993)

We have developed a new algorithm for computing optical flow in a differential framework. The image sequence is first convolved with a set of linear, separable spatiotemporal filters similar to those that have been used in other early vision problems such as texture and stereopsis. Our analysis of the measurement errors leads us to develop an algorithm based on a robust version of total least squares. Each optical flow vector computed has an associated reliability measure which can be used in subsequent processing. The performance of the algorithm on the data set used by Barron et al. (CVPR 1992) compares favorably with other techniques. In addition to being separable, the filters used are also causal, incorporating only past time frames. The algorithm is fully parallel and has been implemented on a multiple processor machine.

By being fully parallel, the algorithm can be performed by an array of processors in real time. In addition, the differential method is computationally less expensive than matching methods for computing visual motion. The output of the linear filters can also be used in other visual tasks such as stereo and recognition. Thus, this approach to motion detection can be part of a real time vision application system in which linear filters provide a basis for visual tasks such as passive ranging and moving object detection. For vehicle surveillance, the system provides individual vehicle speeds and directions. For autonomous vehicles, the system would provide both stereo correspondence for range information andoptical flow for collision avoidance in a single computational framework.

Cover page: Robust Computation of Optical Flow in a Multi-Scale Differential Framework

Thesis
Peer Reviewed

Recognition Using Regions

Gu, Chunhui
Advisor(s): Malik, Jitendra

UC Berkeley Electronic Theses and Dissertations (2012)

Multi-scale window scanning has been popular in object detection but it generalizes poorly to complex features (e.g. nonlinear SVM kernel), deformable objects (e.g. animals), and finer-grained tasks (e.g. segmentation). In contrast to that, regions are appealing as image primitives for recognition because: (1) they encode object shape and scale naturally; (2) they are only mildly affected by background clutter; and (3) they significantly reduce the set of possible object locations in images.

In this dissertation, we propose three novel region-based frameworks to detect and segment target objects jointly, using the region detector of Arbelaez et. al TPAMI2010 as input. This detector produces a hierarchical region tree for each image, where each region is represented by a rich set of image cues (shape, color and texture). Our first framework introduces a generalized Hough voting scheme to generate hypotheses of object locations and scales directly from region matching. Each hypothesis is followed by a verification classifier and a constrained segmenter. This simple yet effective framework performs highly competitively in both detection and segmentation tasks in the ETHZ shape and Caltech 101 databases.

Our second framework encodes image context through the region tree configuration. We describe each leaf of the tree by features of its ancestral set, the set of regions on the path linking the leaf to the root. This ancestral set consists of all regions containing the leaf and thus provides context as inclusion relation. This property distinguishes our work from all others that encode context either by a global descriptor (e.g. GIST) or by pairwise neighboring relation (e.g. Conditional Random Field).

Intra-class variation has been one of the hardest barriers in the category-level recognition, and we approach this problem in two steps. The first step studies one prominent type of intra-class variation, viewpoint variation, explicitly. We propose to use a mixture of holistic templates and discriminative learning for joint viewpoint classification and category detection. A number of components are learned in the mixture and they are associated with canonical viewpoints of the object through different levels of supervision. In addition, this approach has a natural extension to the continuous 3D viewpoint prediction by discriminatively learning a linear appearance model locally at each discrete view. Our systems significantly outperform the state of the arts on two 3D databases in the discrete case, and an everyday-object database that we collected on our own in the continuous case.

The success of modeling object viewpoints motivates us to tackle the generic variation problem through component models, where each component characterizes not only a particular viewpoint of objects, but also a particular subcategory or pose. Interestingly, this approach combines naturally with our region-based object proposals. In our third framework, we form visual clusters from training data that are tight in appearance and configuration spaces. We train individual classifiers for each component and then learn to aggregate them at the category level. Our multi-component approach obtains highly competitive results on the challenging VOC PASCAL 2010 database. Furthermore, our approach allows the transfer of finer-grained semantic information from the components, such as keypoint locations and segmentation masks.

Thesis
Peer Reviewed

Computational Sensorimotor Learning

Agrawal, Pulkit
Advisor(s): Malik, Jitendra

UC Berkeley Electronic Theses and Dissertations (2018)

Our fascination with human intelligence has historically influenced AI research to directly build autonomous agents that can solve intellectually challenging problems such as chess and GO. The same philosophy of direct optimization has percolated in the design of systems for image/speech recognition or language translation. But, the AI systems of today are brittle and very different from humans in the way they solve problems as evidenced by their severely limited ability to adapt or generalize. Evolution took a very long time to evolve the necessary sensorimotor skills of an ape (approx. 3.5 billion years) and relatively very short amount of time to develop apes into present-day humans (approx. 18 million years) that can reason and make use of language. There is probably a lesson to be learned here: by the time organisms with simple sensorimotor skills evolved, they possibly also developed the necessary apparatus that could easily support more complex forms of intelligence later on. In other words, by spending a long time solving simple problems, evolution prepared agents for more complex problems. It is probably the same principle at play, wherein humans rely on what they already to know to find solutions to new challenges. The principle of incrementally increasing complexity as evidenced in evolution, child development and the way humans learn may, therefore, be vital to building human-like intelligence.

The current prominent theory in developmental psychology suggests that seemingly frivolous play is a mechanism for infants to conduct experiments for incrementally increasing their knowledge. Infant's experiments such as throwing objects, hitting two objects against each other or putting them in mouth help them understand how forces affect objects, how do objects feel, how different materials interact, etc. In a way, such play prepares infants for future life by laying down the foundation of a high-level framework of experimentation to quickly understand how things work in new (and potentially non-physical/abstract) environments for constructing goal-directed plans.

I have used ideas from infant development to build mechanisms that allow robots to learn about their environment by experimentation. Results show that such learning allows the agent to adapt to new environments and reuse its past knowledge to succeed at novel tasks quickly.

Cover page: Computational Sensorimotor Learning

Thesis
Peer Reviewed

Opioid Overdose Screening and Naloxone Co-Prescribing to Curb Overdose Deaths

Malik, Shamsah
Advisor(s): Glasner, Suzette

UCLA Electronic Theses and Dissertations (2021)

Background: California’s naloxone law (AB 2760) mandates all providers to screen for overdose risk and co-prescribe naloxone to patients at risk of opioid overdose. However, naloxone co-prescription rates among general medicine providers remain low. This can be attributed to lack of knowledge regarding the overdose risk criteria and lack of a consistent standardized approach to prescribing opioids. Objective: This quality improvement (QI) project evaluated the impact of a naloxone provider education intervention coupled with a standardized approach for identifying patients at risk of opioid overdose, on provider knowledge and rates of naloxone co-prescription and overdose education when appropriate. Design/Setting: The project was designed as a pre and posttest analysis of change in opioid screening and naloxone prescriptions among the hospitalist providers at a Los Angeles County (LAC) hospital. Methods/Intervention: A survey assessing knowledge, attitudes and barriers for naloxone prescribing was administered at baseline and after the naloxone provider intervention. All providers were given a one-hour educational intervention focused on a standardized 5-Step process to be adopted for opioid screening and naloxone prescribing. Data were collected from medical charts on all patients discharged with an opioid prescription to determine if: (1) they met overdose risk criteria and (2) were subsequently co-prescribed naloxone, both in the six months before and four months after the educational intervention. For survey analysis, Friedman test was used across three time points and Wilcoxon's Signed-Rank with Bonferroni correction was used for pair-wise comparisons. Chi-square analysis was used to compare prescriber data pre- and post-intervention. Results: More naloxone prescriptions were written during the post-education phase, relative to baseline (68% vs 10%). Providers also educated more patients regarding opioid overdose and naloxone in the post-education phase (44% vs 6%). Providers screened more consistently for overdose risk post-intervention compared with baseline; thus, providing naloxone (85% vs 11%) and overdose education (55% vs 7%) to patients meeting criteria for overdose appropriately. Conclusion: Provider education and utilization of a standardized screening protocol customized for the provider context facilitated appropriate overdose screening, naloxone co-prescription and overdose education to patients. Keywords: opioids, overdose, overdose-risk criteria, naloxone, overdose education

Cover page: Opioid Overdose Screening and Naloxone Co-Prescribing to Curb Overdose Deaths