Search

Scholarly Works (28 results)

Sort By:

Show:

Article

An overview of SuperLU: Algorithms, implementation, and user interface

Li, Xiaoye S.

Lawrence Berkeley National Laboratory (2003)

We give an overview of the algorithms, design philosophy, and implementation techniques in the software SuperLU, for solving sparse unsymmetric linear systems. In particular, we highlight the differences between the sequential SuperLU (including its multithreaded extension) and parallel SuperLU_DIST. These include the numerical pivoting strategy, the ordering strategy for preserving sparsity, the ordering in which the updating tasks are performed, the numerical kernel, and the parallelization strategy. Because of the scalability concern, the parallel code is drastically different from the sequential one. We describe the user interfaces ofthe libraries, and illustrate how to use the libraries most efficiently depending on some matrix characteristics. Finally, we give some examples of how the solver has been used in large-scale scientific applications, and the performance.

Cover page: An overview of SuperLU: Algorithms, implementation, and user interface

Article

Performance analysis of parallel supernodal sparse LU factorization

LBL Publications (2004)

We investigate performance characteristics for the LU factorization of large matrices with various sparsity patterns. We consider supernodal right-looking parallel factorization on a bi-dimensional grid of processors, making use of static pivoting. We develop a performance model and we validate it using the implementation in SuperLU_DIST, the real matrices and the IBM Power3 machine at NERSC. We use this model to obtain performance bounds on parallel computers, to perform scalability analysis and to identify performance bottlenecks. We also discuss the role of load balance and data distribution in this approach.

Cover page: Performance analysis of parallel supernodal sparse LU
factorization

Article

Performance evaluation and enhancement of SuperLU_DIST 2.0

LBL Publications (2003)

We present the runtime comparison of the two versions of Super LU_DIST, using up to 128 processors of the IBM SP at NERSC. One version provides the global input interface, and another provides the distributed input interface. The comparison includes the total runtime of the solver with both 32-bit and 64-bit addressing modes, the time breakdown for different phases of the solver. We also present an in-depth comparison off our sparse matrix-vector multiplication methods in the context of iterative refinement. Finally, we describe our Fortran 90 interface that enhances the usability of the software.

Cover page: Performance evaluation and enhancement of SuperLU_DIST 2.0

Article

Towards an Accurate Performance Modeling of Parallel Sparse Factorization

Lawrence Berkeley National Laboratory (2006)

We present a performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based, high-end parallel architectures. Our model characterizes the algorithmic behavior bytaking account the underlying processor speed, memory system performance, as well as the interconnect speed. The model is validated using the SuperLU_DIST linear system solver, the sparse matrices from real applications, and an IBM POWER3 parallel machine. Our modeling methodology can be easily adapted to study performance of other types of sparse factorizations, such as Cholesky or QR.

Cover page: Towards an Accurate Performance Modeling of Parallel Sparse Factorization

Article

A new scheduling algorithm for parallel sparse LU factorization with static pivoting

Lawrence Berkeley National Laboratory (2002)

In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU_DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.

Cover page: A new scheduling algorithm for parallel sparse LU factorization with
static pivoting

Article

A Comparison of three high-precision quadrature schemes

Lawrence Berkeley National Laboratory (2003)

Article

SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

Lawrence Berkeley National Laboratory (2002)

In this paper, we present the main algorithmic features in the software package SuperLU_DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with focus on scalability issues, and demonstrate the parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication pattern for sparse Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we designed highly parallel and scalable algorithms for both LU decomposition and triangular solve and we show that they are suitable for large-scale distributed memory machines.

Cover page: SuperLU_DIST: A scalable distributed-memory sparse direct solver for
unsymmetric linear systems

Article

Towards an Automatic and Application-Based Eigensolver Selection

LBL Publications (2005)

Article

An Implementation and Evaluation of the AMLS Method for Sparse Eigenvalue Problems

Lawrence Berkeley National Laboratory (2008)

We describe an efficient implementation and present a performance study of an algebraic multilevel sub-structuring (AMLS) method for sparse eigenvalue problems. We assess the time and memory requirements associated with the key steps of the algorithm, and compare itwith the shift-and-invert Lanczos algorithm in computational cost. Our eigenvalue problems come from two very different application areas: the accelerator cavity design and the normal mode vibrational analysis of the polyethylene particles. We show that the AMLS method, when implemented carefully, is very competitive with the traditional method in broad application areas, especially when large numbers of eigenvalues are sought.

Cover page: An Implementation and Evaluation of the AMLS Method for Sparse Eigenvalue Problems

Article

ARPREC: An arbitrary precision computation package

LBL Publications (2002)

This paper describes a new software package for performing arithmetic with an arbitrarily high level of numeric precision. It is based on the earlier MPFUN package \cite mpf90, enhanced with special IEEE floating-point numerical techniques and several new functions. This package is written in C++ code for high performance and broad portability and includes both C++ and Fortran-90 translation modules, so that conventional C++ and Fortran-90 programs can utilize the package with only very minor changes. This paper includes a survey of some of the interesting applications of this package and its predecessors.

Cover page: ARPREC: An arbitrary precision computation package