Search

Scholarly Works (21 results)

Sort By:

Show:

Article
Peer Reviewed

Unlocking a dark past

UC Berkeley Previously Published Works (2018)

A transcription factor called SALL4 could be the missing link between thalidomide and the limb defects caused by the drug.

Creative Commons 'BY' version 4.0 license

Thesis
Peer Reviewed

Ubiquitin-Dependent Control of Myogenic Development: Mechanistic Insights into Getting Huge, and Staying Huge

Rodriguez Perez, Fernando
Advisor(s): Rape, Michael

UC Berkeley Electronic Theses and Dissertations (2020)

Metazoan development is dependent on the robust spatiotemporal execution of stem cell cell-fate determination programs. Although changes in transcriptional and translational landscapes have been well characterized throughout many differentiation paradigms, their regulatory mechanisms remain poorly understood. Ubiquitin has recently been found to be a key modulator of developmental programs. Ubiquitylation of target proteins occurs through a cascade of enzymatic reactions beginning with a ubiquitin activating enzyme (E1) which transfer the ubiquitin moiety to a ubiquitin conjugating enzyme (E2). The reaction is finalized by the transfer of ubiquitin to its target protein by a ubiquitin ligase (E3). Post-translational modification of proteins can lead to several different outcomes, depending on the context of the modification, known as the ubiquitin code. The precise spatiotemporal execution of ubiquitylation is critical for organismal development and homeostasis. Due to the modular and reversible nature of ubiquitylation, it is an ideal moiety in the control of a plethora of cellular processes.

Cell-cell fusion is a frequent and essential event during development, whose dysregulation causes diseases ranging from infertility to muscle weakness. Critical to this process, cells repeatedly need to remodel their plasma membrane through orchestrated formation and disassembly of cortical actin filaments. In Chapter 2, I describe the identification of a ubiquitin-dependent toggle switch that establishes reversible actin bundling during mammalian cell fusion. My work identified KCTD10 as a modulator of the EPS8-IRSp53 complex, which stabilizes cortical actin bundles at sites of cell contact to push fusing cells towards each other. This work highlights how cytoskeletal rearrangements during development are precisely controlled, raising the possibility of modulating the efficiency of cell fusion for therapeutic benefit.

Organismal development must rely on the timely and robust execution of quality control responses. However, how these responses modulate metazoan development is poorly understood. Showcasing the versatility of ubiquitin signaling, Chapters 3 and 4 provide insight into the role of ubiquitin in controlling stress and quality control responses. Chapter 3 describes the reductive stress response, in which FEM1B senses and reacts to persistent depletion of reactive oxygen species. Loss of ROS is detrimental for development, as it inhibits myogenesis. Concomitant to this stress response is the identification of multimerization quality control, regulated by BTBD9. MQC surveys multimeric BTB complex composition, ensuring that multimeric complexes contain the correct stoichiometries and compositions. MQC is critical for development, as loss of MQC als prevents myogenesis. These two chapters showcase the integration of ubiquitin signaling, stress/quality control pathways, and development. These writings provide a more holistic understanding into the robust regulatory underpinnings of organismal formation

Cover page: Ubiquitin-Dependent Control of Myogenic Development: Mechanistic Insights into Getting Huge, and Staying Huge

Article
Peer Reviewed

Jupyter: Thinking and Storytelling With Code and Data

UC Berkeley Previously Published Works (2021)

Project Jupyter is an open-source project for interactive computing widely used in data science, machine learning, and scientific computing. We argue that even though Jupyter helps users perform complex, technical work, Jupyter itself solves problems that are fundamentally human in nature. Namely, Jupyter helps humans to think and tell stories with code and data. We illustrate this by describing three dimensions of Jupyter: 1) interactive computing; 2) computational narratives; and 3) the idea that Jupyter is more than software. We illustrate the impact of these dimensions on a community of practice in earth and climate science.

Cover page: Jupyter: Thinking and Storytelling With Code and Data

Thesis
Peer Reviewed

Towards Scientifically Contextualized Computer Vision through Studies in the Cryosphere

UC Berkeley Electronic Theses and Dissertations (2024)

There is a growing need for methodologies that integrate machine learning with scientific domain expertise, enabling the construction of interpretable models while contextualizing rare phenomena within the broader scientific landscape. With the exponentially growing influx of publicly available Earth observation data, such methods can be deployed to help answer open questions in Earth's cryosphere, which will enable better constraints of future sea level rise in the coming century.

This research develops scientifically contextualized computer vision methods in low-shot learning regimes, helping to address critical questions at the ice-bed and ice-ocean interfaces. I focus on the classification of rare phenomena that are characterized by prehistoric and contemporary ice sheets. I develop a scientifically-driven filtering method to automate the detection of subglacial bedforms formed during the last glacial maximum, which were shaped by the dynamics of the ice sheets that once flowed above them. These bedforms compose ~2% of the overall training set, and the proposed prefiltering approach can be modularly inserted into exisiting pipelines, enabling the automatic detection of these bedforms from publically available digital elevation models with up to 94% accuracy.

I present a concise tiling strategy that I developed to prepare large satellite imagery for use with machine learning algorithm training on GPUs. Tiling is the industry standard, but previous methods that attempted to preserve semantic context created redundancies within the training data, altering the model outcomes. By uniquely permuting every extension of the dataset, I eliminate redundancies, and find that, with distinct transformations, I can extend the dataset further than previous approaches, and that this extension doesn't alter the structure of the training data. Applying this preprocessing step alone, without any changes to model architecture or optimization, improves the performance on underrepresented classes by up to 15.8%.

I use this method to prepare manually labeled imagery to search for persistent polynyas, which are a rare phenomena along the western coast of Antarctica. Polynyas are areas of open water within the sea ice, and when opened thermodynamically by warm water plumes that arise from under the marginal ice shelves, they can persist in the same location from year to year. They also offer a surface view into subsurface ice-ocean interactions that would be otherwise hard to monitor in satellite imagery. However their small size and relative rarity, means that these polynyas make up a fractional portion of the overall scene in satellite imagery and are easily confounded with other areas of open water which occur more frequently when off-the-shelf machine learning methods are applied. Due to their physics of their formation pathways, such polynyas typically occur right at the ice front interface, and I use this geometric constraint to build a geophysically contextualized objective into the loss function of existing object detection architectures, allowing for the rapid detection of the persistent polynya census in the Amundsen Sea embayment. I recover all eleven previously characterized polynyas in the region, and find eight new polynyas. While this approach was specified in our particular model to the geomorphometry that informs persistent polynya formation, such an approach is easly generalized to aid in the detection of any underrepresented target for which prior knowledge of semantic contextualizations governs the geometry of the scene.

Thesis
Peer Reviewed

Statistical Characterization and Development of Snowpack Predictions

UC Berkeley Electronic Theses and Dissertations (2024)

Accurate high-resolution spatiotemporal data on environmental variables is critical for informing natural resource management and guiding climate change mitigation and adaptation practices. From a hydrological perspective, mountain snowpack is a primary source of freshwater for meeting societal drinking and agricultural needs in various regions. As a result of climate change, these water resources are becoming increasingly vulnerable and declining at a steady pace. To support informed decision-making around water resource planning, accurate data on regular fine-scale spatial and temporal intervals is crucial. Several such gridded data products on key snowpack properties such as snow water equivalent (SWE) and snow depth (SD) have been introduced and provide great utility to the scientific community and practitioners in the environmental space. However, errors and uncertainties in these gridded snow products are not well understood and assumptions underlying the generation of these products are inadequately investigated. In this dissertation, questions around uncertainty and representativeness of gridded snow data are examined and quantified, and a novel statistical approach to estimating SWE is introduced.

The first chapter introduces key terms, concepts, and datasets used throughout the dissertation. The second chapter quantifies error and uncertainty underlying a widely-used gridded SWE product. The third chapter discusses limitations in this gridded product and proposes an alternative framework for empirical SWE prediction that is intuitive, scalable, and statistically sound. Finally, the fourth chapter explores questions around representativeness of point measurements and gridded estimates of true SD and SWE, addressing concerns associated with standard point-to-grid comparisons commonly used in evaluation methodologies.

Thesis
Peer Reviewed

Physics-Informed Machine Learning for the Earth Sciences: Applications to Glaciology and Paleomagnetism

UC Berkeley Electronic Theses and Dissertations (2024)

This dissertation studies the application of machine learning in the fields of Glaciology and Paleomagnetism. In the past few years, there have been significant advances in introducing physical constraints in the form of inductive biases in data-driven approaches coming from statistics and machine learning. This gave rise to the field of physics-informed machine learning, which we will introduce in Chapter 1. Chapters 3 and 4 will cover the application of neural differential equations for ice flow modelling, showcasing how the differentiable programming techniques introduced in Chapter 2 have been successfully applied for the inversion and calibration of the internal ice viscosity of mountain glaciers with different climates. This led to the development of ODINN.jl, a multilanguage Julia-Python package for the modelling of global glacier-climate interactions. We will finalize our discussion in Chapter 5 with the quantification of errors involved in paleomagnetic sampling and further applications of non-parametric regression based on neural differential equations.

Cover page: Physics-Informed Machine Learning for the Earth Sciences: Applications to Glaciology and Paleomagnetism

Article
Peer Reviewed

Teaching Computing with the IPython Notebook

UC Berkeley Previously Published Works (2014)

Article
Peer Reviewed

CoCoTools: open-source software for building connectomes using the CoCoMac anatomical database.

UC Berkeley Previously Published Works (2014)

Neuroanatomical tracer studies in the nonhuman primate macaque monkey are a valuable resource for cognitive neuroscience research. These data ground theories of cognitive function in anatomy, and with the emergence of graph theoretical analyses in neuroscience, there is high demand for these data to be consolidated into large-scale connection matrices ("macroconnectomes"). Because manual review of the anatomical literature is time consuming and error prone, computational solutions are needed to accomplish this task. Here we describe the "CoCoTools" open-source Python library, which automates collection and integration of macaque connectivity data for visualization and graph theory analysis. CoCoTools both interfaces with the CoCoMac database, which houses a vast amount of annotated tracer results from 100 years (1905-2005) of neuroanatomical research, and implements coordinate-free registration algorithms, which allow studies that use different parcellations of the brain to be translated into a single graph. We show that using CoCoTools to translate all of the data stored in CoCoMac produces graphs with properties consistent with what is known about global brain organization. Moreover, in addition to describing CoCoTools' processing pipeline, we provide worked examples, tutorials, links to on-line documentation, and detailed appendices to aid scientists interested in using CoCoTools to gather and analyze CoCoMac data.

Cover page: CoCoTools: open-source software for building connectomes using the CoCoMac anatomical database.

Article
Peer Reviewed

A recurrent neural network for classification of unevenly sampled variable stars

UC Berkeley Previously Published Works (2018)

Astronomical surveys of celestial sources produce streams of noisy time series measuring flux versus time ('light curves'). Unlike in many other physical domains, however, large (and source-specific) temporal gaps in data arise naturally due to intranight cadence choices as well as diurnal and seasonal constraints 1-5 . With nightly observations of millions of variable stars and transients from upcoming surveys 4,6, efficient and accurate discovery and classification techniques on noisy, irregularly sampled data must be employed with minimal human-in-the-loop involvement. Machine learning for inference tasks on such data traditionally requires the laborious hand-coding of domain-specific numerical summaries of raw data ('features') 7 . Here, we present a novel unsupervised autoencoding recurrent neural network 8 that makes explicit use of sampling times and known heteroskedastic noise properties. When trained on optical variable star catalogues, this network produces supervised classification models that rival other best-in-class approaches. We find that autoencoded features learned in one time-domain survey perform nearly as well when applied to another survey. These networks can continue to learn from new unlabelled observations and may be used in other unsupervised tasks, such as forecasting and anomaly detection.

Cover page: A recurrent neural network for classification of unevenly sampled variable stars

Article
Peer Reviewed

Double dissociation of two cognitive control networks in patients with focal brain lesions.

UC San Francisco Previously Published Works (2010)

Neuroimaging studies of cognitive control have identified two distinct networks with dissociable resting state connectivity patterns. This study, in patients with heterogeneous damage to these networks, demonstrates network independence through a double dissociation of lesion location on two different measures of network integrity: functional correlations among network nodes and within-node graph theory network properties. The degree of network damage correlates with a decrease in functional connectivity within that network while sparing the nonlesioned network. Graph theory properties of intact nodes within the damaged network show evidence of dysfunction compared with the undamaged network. The effect of anatomical damage thus extends beyond the lesioned area, but remains within the bounds of the existing network connections. Together this evidence suggests that networks defined by their role in cognitive control processes exhibit independence in resting data.

Cover page: Double dissociation of two cognitive control networks in patients with focal brain lesions.