Skip to main content
eScholarship
Open Access Publications from the University of California

Regen: An object layout regenerator on large-scale production HPC systems

(2025)

This article proposes an object layout regenerator called Regen which regenerates and removes the object layout dynamically to improve the read performance of applications. Regen first detects frequent access patterns from the I/O requests of the applications. Second, Regen reorganizes the objects and regenerates or preallocates new object layouts according to the identified access patterns. Finally, Regen removes or reuses the obsolete or regenerated object layouts as necessary. As a result, Regen accelerates access to objects by providing a flexible object layout. We implement Regen as a framework on top of Proactive Data Container (PDC) and evaluate it on Cori supercomputer, a production-scale HPC system, by using realistic HPC I/O benchmarks. The experimental results show that Regen improves the I/O performance by up to 16.92× compared with an existing system.

Cover page of Salk Institute for Biological Studies Requirements Analysis Report

Salk Institute for Biological Studies Requirements Analysis Report

(2024)

EPOC uses the Deep Dive process to discuss and analyze current and planned science, research, or education activities and the anticipated data output of a particular use case, site, or project to help inform the strategic planning of a campus or regional networking environment. This includes understanding future needs related to network operations, network capacity upgrades, and other technological service investments. A Deep Dive comprehensively surveys major research stakeholders’ plans and processes in order to investigate data management requirements over the next 5–10 years. Between February and March 2024, staff members from the Engagement and Performance Operations Center (EPOC) met with researchers and staff from the Salk Institute for Biological Studies (Salk) for the purpose of a Deep Dive into scientific and research drivers. The goal of this activity was to help characterize the requirements for a number of campus use cases, and to enable cyberinfrastructure support staff better to understand the needs of the researchers within the community. Material for this event included the written documentation from each of the profiled research areas, documentation about the current state of technology support, and a write-up of the discussion that took place via e-mail and video conferencing. The case studies highlighted the ongoing challenges and opportunities that Salk Institute for Biological Studies have in supporting a cross-section of established and emerging research use cases. Each case study mentioned unique challenges which were summarized into common needs.

Cover page of New York-Presbyterian and Columbia University Irving Medical Center Requirements Analysis Report

New York-Presbyterian and Columbia University Irving Medical Center Requirements Analysis Report

(2024)

EPOC uses the Deep Dive process to discuss and analyze current and planned science, research, or education activities and the anticipated data output of a particular use case, site, or project to help inform the strategic planning of a campus or regional networking environment. This includes understanding future needs related to network operations, network capacity upgrades, and other technological service investments. A Deep Dive comprehensively surveys major research stakeholders’ plans and processes in order to investigate data management requirements over the next 5–10 years. Between February and June 2024, staff members from the Engagement and Performance Operations Center (EPOC) met with researchers and staff from New York-Presbyterian (NYP), Columbia University Irving Medical Center (CUIMC), and NYSERNet for the purpose of a Deep Dive into scientific and research drivers. The goal of this activity was to help characterize the requirements for a number of campus use cases, and to enable cyberinfrastructure support staff to better understand the needs of the researchers within the community. Material for this event included the written documentation from each of the profiled research areas, documentation about the current state of technology support, and a write-up of the discussion that took place via e-mail and video conferencing. The case studies highlighted the ongoing challenges and opportunities that NYP and CUIMC have in supporting a cross-section of established and emerging research use cases. Each case study mentioned unique challenges which were summarized into common needs.

Cover page of Nuclear Physics Network Requirements Review Final Report

Nuclear Physics Network Requirements Review Final Report

(2024)

The Energy Sciences Network (ESnet) is the high-performance network user facility for the US Department of Energy (DOE) Office of Science (SC) and delivers highly reliable data transport capabilities optimized for the requirements of data-intensive science. In essence, ESnet is the circulatory system that enables the DOE science mission by connecting all its laboratories and facilities in the US and abroad. ESnet is funded and stewarded by the Advanced Scientific Computing Research (ASCR) program and managed and operated by the Scientific Networking Division at Lawrence Berkeley National Laboratory (LBNL). ESnet is widely regarded as a global leader in the research and education networking community. ESnet interconnects DOE national laboratories, user facilities, and major experiments so that scientists can use remote instruments and computing resources as well as share data with collaborators, transfer large datasets, and access distributed data repositories. ESnet is specifically built to provide a range of network services tailored to meet the unique requirements of the DOE’s data-intensive science. Between July 2023 and October 2023, ESnet and the Nuclear Physics program (NP) of the DOE SC organized an ESnet requirements review of NP-supported activities. Preparation for these events included identification of key stakeholders: program and facility management, research groups, and technology providers. Each stakeholder group was asked to prepare formal case study documents about its relationship to the NP program to build a complete understanding of the current, near-term, and long-term status, expectations, and processes that will support the science going forward.

Cover page of Fusion Energy Sciences Network Requirements Review: Mid Cycle Update

Fusion Energy Sciences Network Requirements Review: Mid Cycle Update

(2024)

The Energy Sciences Network (ESnet) is the high-performance network user facility for the US Department of Energy (DOE) Office of Science (SC) and delivers highly reliable data transport capabilities optimized for the requirements of data-intensive science. In essence, ESnet is the circulatory system that enables the DOE science mission by connecting all its laboratories and facilities in the US and abroad. ESnet is funded and stewarded by the Advanced Scientific Computing Research (ASCR) program and managed and operated by the Scientific Networking Division at Lawrence Berkeley National Laboratory (LBNL). ESnet is widely regarded as a global leader in the research and education networking community. ESnet interconnects DOE national laboratories, user facilities, and major experiments so that scientists can use remote instruments and computing resources as well as share data with collaborators, transfer large datasets, and access distributed data repositories. ESnet is specifically built to provide a range of network services tailored to meet the unique requirements of the DOE’s data-intensive science. In May 2023, the Energy Sciences Network (ESnet) and the Fusion Energy Sciences program (FES) of the DOE SC organized an interim ESnet requirements review of FES-supported activities to follow up on the work started during the 2021 FES Network Requirements Review. Preparation for these events included checking back with the key stakeholders: program and facility management, research groups, and technology providers. Each stakeholder group was asked to prepare updates to its previously submitted case study documents, so that ESnet could update the understanding of any changes to the current, near-term, and long-term status, expectations, and processes that will support the science activities of the program.

Cover page of High Energy Physics Network Requirements Review: Two-Year Update

High Energy Physics Network Requirements Review: Two-Year Update

(2024)

The Energy Sciences Network (ESnet) is the high-performance network user facility for the US Department of Energy (DOE) Office of Science (SC) and delivers highly reliable data transport capabilities optimized for the requirements of data-intensive science. In essence, ESnet is the circulatory system that enables the DOE science mission by connecting all its laboratories and facilities in the US and abroad. ESnet is funded and stewarded by the Advanced Scientific Computing Research (ASCR) program and managed and operated by the Scientific Networking Division at Lawrence Berkeley National Laboratory (LBNL). ESnet is widely regarded as a global leader in the research and education networking community. ESnet interconnects DOE national laboratories, user facilities, and major experiments so that scientists can use remote instruments and computing resources as well as share data with collaborators, transfer large datasets, and access distributed data repositories. ESnet is specifically built to provide a range of network services tailored to meet the unique requirements of the DOE’s data-intensive science. In July 2023, the Energy Sciences Network (ESnet) and the High Energy Physics program (HEP) of the DOE SC organized an interim ESnet requirements review of HEP-supported activities, to follow up on the work started during the 2020 HEP Network Requirements Review. Preparation for these events included checking back with the key stakeholders: program and facility management, research groups, and technology providers. Each stakeholder group was asked to prepare updates to their previously submitted case study documents, so that ESnet could update the understanding of any changes to the current, near-term, and long-term status, expectations, and processes that will support the science activities of the program.

Cover page of Designing, Constructing, and Operating an IPv6 Network at SC23: A case study in implementing the IPv6 protocol on a heterogenous network that supports the SC23 conference

Designing, Constructing, and Operating an IPv6 Network at SC23: A case study in implementing the IPv6 protocol on a heterogenous network that supports the SC23 conference

(2024)

IPv6 is the current version of IP, the protocol that is used to route traffic across internet connections. This standard was originally developed as a new approach to mitigate concerns about address exhaustion and allow for near infinite scalability. While this protocol has gained significant support in mobile and broadband networks, as well as being the default for networks in emerging economies, it has yet to be fully adopted as a standard deployment model. Complications include legacy devices unable to support the proposed changes, as well as potential challenges that exist between devices that may not be able to fully implement current standards or configuration norms. The SCinet volunteers who deliver advanced networking to support the SC Conference set an ambitious goal of deploying an IPv6-only network at SC23. While the necessary technology is widely available and understood, the implications of deployment to support more than 15,000 users, each with multiple devices of different operating environments and ages, presents a unique technology and policy challenge. This paper will highlight the effort put into designing, implementing, and operating this innovative IPv6-only environment.

Cover page of FabFed: Tool-Based Network Federation for Testbed of Testbeds - Paradigm and Practice

FabFed: Tool-Based Network Federation for Testbed of Testbeds - Paradigm and Practice

(2024)

Approaching the end of the FABRIC project construction phase, many experimenters expressed a need for integrating heterogeneous types of resources from external testbed and cloud providers. This prompted research in cross-testbed federation paradigms, practically in pursuit of the vision of 'testbed of testbeds'. With past experience and lessons learned, we propose to adopt a 'tool-based federation paradigm' with the hypothesis that a tool-based federation approach is viable and performant for automating large, complex cross-testbed experiments. In this paper, we discuss the challenges and solutions in developing the FABRIC Federation Extension (FabFed), a software framework that implements the tool-based federation approach and enables FABRIC users to run large experiments across multiple testbed and cloud providers. We validate our approach through extensive use of FabFed to build complex experiments across both the FABRIC and partner testbeds. We also share our observations and draw insights about the usability and performance characteristics of our approach through real-world operations and experimental quantitative analysis.