Search

Scholarly Works (24 results)

Sort By:

Show:

Thesis
Peer Reviewed

Extracting Actionable Information From Bug Reports

Zhou, Bo
Advisor(s): Gupta, Rajiv

UC Riverside Electronic Theses and Dissertations (2016)

Finding and fixing bugs is a major but time- and effort-consuming task for software quality assurance in software development process. When a bug is filed, valuable multi-dimensional information is captured by the bug report and stored in the bug tracking system. However, developers and researchers have so far used only part of this information (e.g., a detailed description of a failure and occasionally hint at the location of the fault in the code), and for limited purposes, e.g., finding and fixing bugs, detecting duplicate bug reports, or improving bug triagging accuracy. We contend that this information is useful not only for software testing and debugging but also for product understanding, software evolution, and software management. This dissertation makes several advances in extracting actionable information from bug reports using data mining and nature language processing techniques. Both software developers and researchers can benefit from our approach.

We first focus on differences in bugs and bug-fixing processes between desktop and smartphone applications. Specifically, we focus on two main thrusts: a quantitative analysis to discover similarities and differences between desktop and smartphone bug reports/processes, and a qualitative analysis where we extract topics from bug reports to understand bugs' nature, categories, as well as differences between platforms.

Next, we present an approach whose focus is understanding the differences between concurrency and non-concurrency bugs, the differences among various concurrency bug classes, and predicting bug quantity, type, and location, from patches, bug reports and bug-fix metrics.

In addition, we found bugs of different severities have so far been put into the same category, but their characteristics differ significantly. Moreover, the nature of issues with the same severity, e.g., high-severity, differs markedly between desktops and smartphones. To understand these differences, we perform an empirical study on 72 Android and desktop projects. We study how severity changes, quantify the differences between classes in terms of bug-fixing attributes and analyze how the topics differ across classes on each platform over time.

Finally, we propose a novel delta debugging technique to reduce the length of event traces by using a record

amp;replay scheme. When we capture the event sequence while executing the application, an event dependency graph (EDG) will be generated. Then we use the EDG to guide the delta debugging algorithm by eliminating irrelevant events. Therefore, the debugging process can be improved significantly if events that are irrelevant to the crash are filtered out.

Cover page: Extracting Actionable Information From Bug Reports

Thesis
Peer Reviewed

Novel Bayesian Methods in Neuroscience

Zhou, Bo
Advisor(s): Shahbaba, Babak

UC Irvine Electronic Theses and Dissertations (2015)

For an individual to successfully complete the task of decision-making, a set of temporally-organized events must occur: stimuli must be detected,

potential outcomes must be evaluated, behaviors must be executed or inhibited, and outcomes

(such as reward or punishment) must be experienced. Due to the complexity of this process,

it is very likely the case that decision-making is encoded by the temporally-precise interactions

among a population of neurons. Most existing statistical models, however, are inadequate for analyzing such sophisticated phenomenon as they either analyze a small number of neurons (e.g., pairwise analysis) or only provide an aggregated measure of interactions by assuming a constant dependence structure among neurons over time.

We start by proposing a scalable hierarchical semi-parametric Bayesian model to capture dependencies among multiple neurons by detecting their co-firing (possibly with some lag time). To this end, we model the spike train ( sequence of 1's (spike) and 0's (silence) ) for each neuron using the logistic function of a continuous latent variable with a Gaussian Process prior. Then we model the joint probability distribution of multiple neurons as a function of their corresponding marginal distribution using a parametric copula model. Our approach provides a flexible framework for modeling the underlying firing rates of each neuron. It also also allows us to make inference regarding both contemporaneous and lagged synchrony. We evaluate our approach using several simulation studies and apply it to analyze real data collected from an experiment designed for investigating the role of the prefrontal cortex of rats in reward-seeking behaviors.

Next, we propose a non-stationary Bayesian model to capture the dynamic nature of neuronal activity (such as the time-varying strength

of the interactions among neurons). Our proposed method yields results that provide new insights into the dynamic nature of population coding in the prefrontal cortex during decision making. In our analysis, we note that while some neurons in the prefrontal cortex do not synchronize their firing activity until the presence of a reward, a different set of neurons synchronize their

activity shortly after the onset of stimulus. These differentially synchronizing sub-populations of

neurons suggests a continuum of population representation of the reward-seeking task. Our analyses also suggest that the degree of synchronization differs between the

rewarded and non-rewarded conditions.

Finally we propose a novel statistical model for detecting neuronal communities involved in decision-making process. Our method characterizes the non-stationary activity of multiple neurons during a basic cognitive task by modeling their joint probability distribution dynamically. Our proposed model can capture the time-varying dependence structure among neurons while allowing the neuronal activity to change over time. This way, we are able to identify time-varying neuronal communities. By identifying communities of neurons that vary under different decisions, we expect our method to provide insights into the decision-making process in particular as well as into a broad range of cognitive functions.

Cover page: Novel Bayesian Methods in Neuroscience

Article
Peer Reviewed

Spherical Hamiltonian Monte Carlo for Constrained Target Distributions.

UC Irvine Previously Published Works (2014)

Statistical models with constrained probability distributions are abundant in machine learning. Some examples include regression models with norm constraints (e.g., Lasso), probit models, many copula models, and Latent Dirichlet Allocation (LDA) models. Bayesian inference involving probability distributions confined to constrained domains could be quite challenging for commonly used sampling algorithms. For such problems, we propose a novel Markov Chain Monte Carlo (MCMC) method that provides a general and computationally efficient framework for handling boundary conditions. Our method first maps the D-dimensional constrained domain of parameters to the unit ball [Formula: see text], then augments it to a D-dimensional sphere S^D such that the original boundary corresponds to the equator of S^D . This way, our method handles the constraints implicitly by moving freely on the sphere generating proposals that remain within boundaries when mapped back to the original space. To improve the computational efficiency of our algorithm, we divide the dynamics into several parts such that the resulting split dynamics has a partial analytical solution as a geodesic flow on the sphere. We apply our method to several examples including truncated Gaussian, Bayesian Lasso, Bayesian bridge regression, and a copula model for identifying synchrony among multiple neurons. Our results show that the proposed method can provide a natural and efficient framework for handling several types of constraints on target distributions.

Cover page: Spherical Hamiltonian Monte Carlo for Constrained Target Distributions.

Article
Peer Reviewed

Alternative NF-κB Isoforms in the Drosophila Neuromuscular Junction and Brain

UC San Diego Previously Published Works (2015)

The Drosophila NF-κB protein Dorsal is expressed at the larval neuromuscular junction, where its expression appears unrelated to known Dorsal functions in embryonic patterning and innate immunity. Using confocal microscopy with domain-specific antisera, we demonstrate that larval muscle expresses only the B isoform of Dorsal, which arises by intron retention. We find that Dorsal B interacts with and stabilizes Cactus at the neuromuscular junction, but exhibits Cactus independent localization and an absence of detectable nuclear translocation. We further find that the Dorsal-related immune factor Dif encodes a B isoform, reflecting a conservation of B domains across a range of insect NF-κB proteins. Carrying out mutagenesis of the Dif locus via a site-specific recombineering approach, we demonstrate that Dif B is the major, if not sole, Dif isoform in the mushroom bodies of the larval brain. The Dorsal and Dif B isoforms thus share a specific association with nervous system tissues as well as an alternative protein structure.

Cover page: Alternative NF-κB Isoforms in the Drosophila Neuromuscular Junction and Brain

Article
Peer Reviewed

Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome.

UC Santa Cruz Previously Published Works (2023)

The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as pan-conserved segment tags (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.

Cover page: Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome.

Article
Peer Reviewed

Circulant Arrays on Cyclic Subgroups of Finite Fields: Rank Analysis and Construction of Quasi-Cyclic LDPC Codes

UC Davis Previously Published Works (2010)

This paper consists of three parts. The first part presents a large class of new binary quasi-cyclic (QC)-LDPC codes with girth of at least 6 whose parity-check matrices are constructed based on cyclic subgroups of finite fields. Experimental results show that the codes constructed perform well over the binary-input AWGN channel with iterative decoding using the sum-product algorithm (SPA). The second part analyzes the ranks of the parity-check matrices of codes constructed based on finite fields with characteristic of 2 and gives combinatorial expressions for these ranks. The third part identifies a subclass of constructed QC-LDPC codes that have large minimum distances. Decoding of codes in this subclass with the SPA converges very fast.

Article
Peer Reviewed

One-Shot Learning With Attention-Guided Segmentation in Cryo-Electron Tomography.

UC Irvine Previously Published Works (2020)

Cryo-electron Tomography (cryo-ET) generates 3D visualization of cellular organization that allows biologists to analyze cellular structures in a near-native state with nano resolution. Recently, deep learning methods have demonstrated promising performance in classification and segmentation of macromolecule structures captured by cryo-ET, but training individual deep learning models requires large amounts of manually labeled and segmented data from previously observed classes. To perform classification and segmentation in the wild (i.e., with limited training data and with unseen classes), novel deep learning model needs to be developed to classify and segment unseen macromolecules captured by cryo-ET. In this paper, we develop a one-shot learning framework, called cryo-ET one-shot network (COS-Net), for simultaneous classification of macromolecular structure and generation of the voxel-level 3D segmentation, using only one training sample per class. Our experimental results on 22 macromolecule classes demonstrated that our COS-Net could efficiently classify macromolecular structures with small amounts of samples and produce accurate 3D segmentation at the same time.

Cover page: One-Shot Learning With Attention-Guided Segmentation in Cryo-Electron Tomography.

Article
Peer Reviewed

A Semiparametric Bayesian Model for Detecting Synchrony Among Multiple Neurons

UC Irvine Previously Published Works (2014)

We propose a scalable semiparametric Bayesian model to capture dependencies among multiple neurons by detecting their cofiring (possibly with some lag time) patterns over time. After discretizing time so there is at most one spike at each interval, the resulting sequence of 1s (spike) and 0s (silence) for each neuron is modeled using the logistic function of a continuous latent variable with a gaussian process prior. For multiple neurons, the corresponding marginal distributions are coupled to their joint probability distribution using a parametric copula model. The advantages of our approach are as follows. The nonparametric component (i.e., the gaussian process model) provides a flexible framework for modeling the underlying firing rates, and the parametric component (i.e., the copula model) allows us to make inferences regarding both contemporaneous and lagged relationships among neurons. Using the copula model, we construct multivariate probabilistic models by separating the modeling of univariate marginal distributions from the modeling of a dependence structure among variables. Our method is easy to implement using a computationally efficient sampling algorithm that can be easily extended to high-dimensional problems. Using simulated data, we show that our approach could correctly capture temporal dependencies in firing rates and identify synchronous neurons. We also apply our model to spike train data obtained from prefrontal cortical areas.

Cover page: A Semiparametric Bayesian Model for Detecting Synchrony Among Multiple Neurons

Article
Peer Reviewed

A Dynamic Bayesian Model for Characterizing Cross-Neuronal Interactions During Decision-Making

UC Irvine Previously Published Works (2016)

The goal of this paper is to develop a novel statistical model for studying cross-neuronal spike train interactions during decision making. For an individual to successfully complete the task of decision-making, a number of temporally-organized events must occur: stimuli must be detected, potential outcomes must be evaluated, behaviors must be executed or inhibited, and outcomes (such as reward or no-reward) must be experienced. Due to the complexity of this process, it is likely the case that decision-making is encoded by the temporally-precise interactions between large populations of neurons. Most existing statistical models, however, are inadequate for analyzing such a phenomenon because they provide only an aggregated measure of interactions over time. To address this considerable limitation, we propose a dynamic Bayesian model which captures the time-varying nature of neuronal activity (such as the time-varying strength of the interactions between neurons). The proposed method yielded results that reveal new insight into the dynamic nature of population coding in the prefrontal cortex during decision making. In our analysis, we note that while some neurons in the prefrontal cortex do not synchronize their firing activity until the presence of a reward, a different set of neurons synchronize their activity shortly after stimulus onset. These differentially synchronizing sub-populations of neurons suggests a continuum of population representation of the reward-seeking task. Secondly, our analyses also suggest that the degree of synchronization differs between the rewarded and non-rewarded conditions. Moreover, the proposed model is scalable to handle data on many simultaneously-recorded neurons and is applicable to analyzing other types of multivariate time series data with latent structure. Supplementary materials (including computer codes) for our paper are available online.

Cover page: A Dynamic Bayesian Model for Characterizing Cross-Neuronal Interactions During Decision-Making

Article
Peer Reviewed

Structure Detection in Three-Dimensional Cellular Cryoelectron Tomograms by Reconstructing Two-Dimensional Annotated Tilt Series.

UC Irvine Previously Published Works (2022)

The revolutionary technique cryoelectron tomography (cryo-ET) enables imaging of cellular structure and organization in a near-native environment at submolecular resolution, which is vital to subsequent data analysis and modeling. The conventional structure detection process first reconstructs the three-dimensional (3D) tomogram from a series of two-dimensional (2D) projections and then directly detects subcellular components found within the tomogram. However, this process is challenging due to potential structural information loss during the tomographic reconstruction and the limited scope of existing methods since most major state-of-the-art object detection methods are designed for 2D rather than 3D images. Therefore, in this article, as an alternative approach to complement the conventional process, we propose a novel 2D-to-3D framework that detects structures within 2D projection images before reconstructing the results back to 3D. We implemented the proposed framework as three specific algorithms for three individual tasks: semantic segmentation, edge detection, and object localization. As experimental validation of the 2D-to-3D framework for cryo-ET data, we applied the algorithms to the segmentation of mitochondrial calcium phosphate granules, detection of spherical edges, and localization of mitochondria. Quantitative and qualitative results show better performance for prediction tasks of segmentation on the 2D projections and promising performance on object localization and edge detection, paving the way for future studies in the exploration of cryo-ET for in situ structural biology.

Cover page: Structure Detection in Three-Dimensional Cellular Cryoelectron Tomograms by Reconstructing Two-Dimensional Annotated Tilt Series.