Search

Article
Peer Reviewed

Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks

UC Irvine Previously Published Works (2016)

Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. In this work, we show that meaningful physical content can be revealed by transforming the raw data into a learned high-level representation using deep neural networks, with measurements taken at the Daya Bay Neutrino Experiment as a case study. We further show how convolutional deep neural networks can provide an effective classification filter with greater than 97% accuracy across different classes of physics events, significantly better than other machine learning approaches.

Cover page: Revealing Fundamental Physics from the Daya Bay Neutrino Experiment using Deep Neural Networks

Article
Peer Reviewed

A Pattern Recognition Algorithm for Quantum Annealers

UC Berkeley Previously Published Works (2020)

The reconstruction of charged particles will be a key computing challenge for the high-luminosity Large Hadron Collider (HL-LHC) where increased data rates lead to a large increase in running time for current pattern recognition algorithms. An alternative approach explored here expresses pattern recognition as a quadratic unconstrained binary optimization (QUBO), which allows algorithms to be run on classical and quantum annealers. While the overall timing of the proposed approach and its scaling has still to be measured and studied, we demonstrate that, in terms of efficiency and purity, the same physics performance of the LHC tracking algorithms can be achieved. More research will be needed to achieve comparable performance in HL-LHC conditions, as increasing track density decreases the purity of the QUBO track segment classifier.

Cover page: A Pattern Recognition Algorithm for Quantum Annealers

Article

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures:

LBL Publications (2016)

Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(log n) algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing 50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude; thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems.

Article
Peer Reviewed

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.

LBL Publications (2016)

Article
Peer Reviewed

Interactive Distributed Deep Learning with Jupyter Notebooks

UC Berkeley Previously Published Works (2018)

Deep learning researchers are increasingly using Jupyter notebooks to implement interactive, reproducible workflows with embedded visualization, steering and documentation. Such solutions are typically deployed on small-scale (e.g. single server) computing systems. However, as the sizes and complexities of datasets and associated neural network models increase, high-performance distributed systems become important for training and evaluating models in a feasible amount of time. In this paper we describe our vision for Jupyter notebook solutions to deploy deep learning workloads onto high-performance computing systems. We demonstrate the effectiveness of notebooks for distributed training and hyper-parameter optimization of deep neural networks with efficient, scalable backends.

Cover page: Interactive Distributed Deep Learning with Jupyter Notebooks

Peer Reviewed

The NERSC Cori HPC System

NERSC (2019)

Article

Experiences with the Burst Buffer at NERSC:

LBL Publications (2016)

NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has over 6500 users in 750 different projects spanning a wide variety of scientific applications, including climate modeling, combustion, fusion, astrophysics, computational biology, and many more. The applications of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here experiences with the first year of the NERSC Burst Buffer. A number of research projects have had early access to the Burst Buffer and exercise its different capabilities to enable new scientific advancements. We present in-depth performance results and lessons-learned from these real applications as well as benchmark results and system configuration experiences.

Cover page: Experiences with the Burst Buffer at NERSC:

Article
Peer Reviewed

2019 Computing Sciences Strategic Plan

UC Berkeley Previously Published Works (2021)

Article
Peer Reviewed

Machine Learning in High Energy Physics Community White Paper

UC San Diego Previously Published Works (2018)

Machine learning is an important applied research area in particle physics, beginning with applications to high-level physics analysis in the 1990s and 2000s, followed by an explosion of applications in particle and event identification and reconstruction in the 2010s. In this document we discuss promising future research and development areas in machine learning in particle physics with a roadmap for their implementation, software and hardware resource requirements, collaborative initiatives with the data science community, academia and industry, and training the particle physics community in data science. The main objective of the document is to connect and motivate these areas of research and development with the physics drivers of the High-Luminosity Large Hadron Collider and future neutrino experiments and identify the resource needs for their implementation. Additionally we identify areas where collaboration with external communities will be of great benefit.

Cover page: Machine Learning in High Energy Physics Community White Paper

Article
Peer Reviewed

A Roadmap for HEP Software and Computing R&D for the 2020s

UC San Diego Previously Published Works (2019)

Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.

Cover page: A Roadmap for HEP Software and Computing R&D for the 2020s