Search

Article

Transport requirements for high performance network applications that are NOT FTP

LBL Publications (2003)

The exponential growth of high bandwidth global networks has rapidly exposed the design limitations of the TCP protocol. The congestion avoidance behavior of TCP has been implicated as a likely culprit for poor single-stream TCP throughput on high bandwidth-delay-product networks. The network research community has offered wide range of alternatives to fix TCP's congestion avoidance behavior, but every available technique has focused on high-throughput for bulk flows like file transfers. This paper focuses on the requirements of interactive visualization and control applications that have been neglected in the design of these new protocols.

Cover page: Transport requirements for high performance network applications that are NOT FTP

Article

Interactive Remote and Distributed Visualization of Fusion Simulation Results

LBL Publications (2004)

The NERSC center has unique resources that are especially tailored for interactive, high performance remote visualization. The central interactive visualization resource is Escher.nersc.gov, which is a large symmetric multiprocessor equipped with multiple gigabit interconnects to NERSC storage systems, as well as a large amount of main memory and substantial directly attached storage. Using Escher, we employ a pipelined architecture to support the demands of interactive, high-performance remote visualization. The pipelined architecture refers to an assembly-line organization of software "workers" that each contribute to an overall work flow. The first worker in the assembly line the server runs interactively in parallel on Escher. Parallel execution of the server provides substantial I/O and processing capabilities where it is needed close to the data. The second worker in the assembly line runs on the scientists workstation in their office, and performs 3D rendering at interactive rates. Visualization results geometry are transmitted between the two stages in the pipeline. This combination has proven effective at providing interactive 3D scientific visualization capabilities to remotely located NERSC users. Amortizing data I/O and visualization processing over parallel processors located close to the data provides capabilities that are simply not available on any desktop platform. These capabilities are an example of NERSCs commitment to providing the best possible tools and infrastructure to the computational science community.

Cover page: Interactive Remote and Distributed Visualization of Fusion Simulation
Results

Article
Peer Reviewed

Tiling as a Durable Abstraction for Parallelism and Data Locality

LBL Publications (2013)

Article
Peer Reviewed

Query-Driven Visualization of Large Data Sets

LBL Publications (2005)

We present a practical and general-purpose approach to large and complex visual data analysis where visualization processing, rendering and subsequent human interpretation is constrained to the subset of data deemed interesting by the user. In many scientific data analysis applications, "interesting" data can be defined by compound Boolean range queries of the form (temperature > 1000) AND (70 < pressure < 90). As data sizes grow larger, a central challenge is to answer such queries as efficiently as possible. Prior work in the visualization community has focused on answering range queries for scalar fields within the context of accelerating the search phase of isosurface algorithms. In contrast, our work describes an approach that leverages state-of-the-art indexing technology from the scientific data management community called "bitmap indexing." Our implementation, which we call "DEX" (short for dextrous data explorer), uses bitmap indexing to efficiently answer multivariate, multidimensional data queries to provide input to a visualization pipeline. We present an analysis overview and bench-mark results that show bitmap indexing offers significant storage and performance improvements when compared to previous approaches for accelerating the search phase of isosurface algorithms. More importantly, since bitmap indexing supports complex multi-dimensional, multivariate range queries, it is more generally applicable to scientific data visualization and analysis problems. In addition to benchmark performance and analysis, we apply DEX to a typical scientific visualization problem encountered in combustion simulation data analysis. © 2005 IEEE.

Cover page: Query-Driven Visualization of Large Data Sets

Article
Peer Reviewed

Auto-Tuning the 27-point Stencil for Multicore

LBL Publications (2009)

This study focuses on the key numerical technique of stencil computations, used in many different scientific disciplines, and illustrates how auto-tuning can be used to produce very efficient implementations across a diverse set of current multicore architectures.

Cover page: Auto-Tuning the 27-point Stencil for Multicore

Article

Science Driven Supercomputing Architectures: Analyzing Architectural Bottlenecks with Applications and Benchmark Probes

LBL Publications (2005)

There is a growing gap between the peak speed of parallel computing systems and the actual delivered performance for scientific applications. In general this gap is caused by inadequate architectural support for the requirements of modern scientific applications, as commercial applications and the much larger market they represent, have driven the evolution of computer architectures. This gap has raised the importance of developing better benchmarking methodologies to characterize and to understand the performance requirements of scientific applications, to communicate them efficiently to influence the design of future computer architectures. This improved understanding of the performance behavior of scientific applications will allow improved performance predictions, development of adequate benchmarks for identification of hardware and application features that work well or poorly together, and a more systematic performance evaluation in procurement situations. The Berkeley Institute for Performance Studies has developed a three-level approach to evaluating the design of high end machines and the software that runs on them: 1) A suite of representative applications; 2) A set of application kernels; and 3) Benchmarks to measure key system parameters. The three levels yield different type of information, all of which are useful in evaluating systems, and enable NSF and DOE centers to select computer architectures more suited for scientific applications. The analysis will further allow the centers to engage vendors in discussion of strategies to alleviate the present architectural bottlenecks using quantitative information. These may include small hardware changes or larger ones that may be out interest to non-scientific workloads. Providing quantitative models to the vendors allows them to assess the benefits of technology alternatives using their own internal cost-models in the broader marketplace, ideally facilitating the development of future computer architectures more suited for scientific computations. The three levels also come with vastly different investments: the benchmarking efforts require significant rewriting to effectively use a given architecture, which is much more difficult on full applications than on smaller benchmarks.

Cover page: Science Driven Supercomputing Architectures: Analyzing Architectural Bottlenecks with
Applications and Benchmark Probes

Peer Reviewed

Auto-tuning stencil computations on multicore and accelerators

UC Berkeley Previously Published Works (2010)

The recent transformation from an environment where gains in computational performance came from increasing clock frequency and other hardware engineering innovations, to an environment where gains are realized through the deployment of ever increasing numbers of modest performance cores has profoundly changed the landscape of scientific application programming. This exponential increase in core count represents both an opportunity and a challenge: access to petascale simulation capabilities and beyond will require that this concurrency be efficiently exploited. The problem for application programmers is further compounded by the diversity of multicore architectures that are now emerging [4]. From relatively complex out-of-order CPUs with complex cache structures, to relatively simple cores that support hardware multithreading, to chips that require explicit use of software controlled memory, designing optimal code for these different platforms represents a serious impediment. An emerging solution to this problem is auto-tuning: the automatic generation of many versions of a code kernel that incorporate various tuning strategies, and the benchmarking of these to select the highest performing version. Typical tuning strategies might include: maximizing incore performance with loop unrolling and restructuring; maximizing memory bandwidth by exploiting non-uniform memory access (NUMA), engaging prefetch by directives; and minimizing memory traffic by cache blocking or array padding. Often a key parameter is associated with each tuning strategy (e.g., the amount of loop unrolling or the cache blocking factor), and these parameters must be explored in addition to the layering of the basic strategies themselves.

Cover page: Auto-tuning stencil computations on multicore and accelerators

Article

Recent Progress on the Marylie/Impact Beam Dynamics Code

LBL Publications (2006)

Article

SciDAC Advances and Applications in Computational Beam Dynamics

Lawrence Berkeley National Laboratory (2005)

SciDAC has had a major impact on computational beam dynamics and the design of particle accelerators. Particle accelerators -- which account for half of the facilities in the DOE Office of Science Facilities for the Future of Science 20 Year Outlook -- are crucial for US scientific, industrial, and economic competitiveness. Thanks to SciDAC, accelerator design calculations that were once thought impossible are now carried routinely, and new challenging and important calculations are within reach. SciDAC accelerator modeling codes are being used to get the most science out of existing facilities, to produce optimal designs for future facilities, and to explore advanced accelerator concepts that may hold the key to qualitatively new ways of accelerating charged particle beams. In this poster we present highlights from the SciDAC Accelerator Science and Technology (AST) project Beam Dynamics focus area in regard to algorithm development, software development, and applications.

Cover page: SciDAC Advances and Applications in Computational Beam Dynamics