Search

Scholarly Works (22 results)

Sort By:

Show:

Article

High-performance combinatorial algorithms

Pinar, Ali

LBL Publications (2003)

Cover page: High-performance combinatorial algorithms

Article
Peer Reviewed

Finding nonoverlapping substructures of a sparse matrix

LBL Publications (2004)

Many applications of scientific computing rely on computations on sparse matrices, thus the design of efficient implementations of sparse matrix kernels is crucial for the overall efficiency of these applications. Due to the high compute-to-memory ratio and irregular memory access patterns, the performance of sparse matrix kernels is often far away from the peak performance on a modern processor. Alternative data structures have been proposed, which split the original matrix A into A_d and A_s, so that A_d contains all dense blocks of a specified size in the matrix, and A_s contains the remaining entries. This enables the use of dense matrix kernels on the entries of A_d producing better memory performance. In this work, we study the problem of finding a maximum number of non overlapping rectangular dense blocks in a sparse matrix, which has not been studied in the sparse matrix community. We show that the maximum non overlapping dense blocks problem is NP-complete by u sing a reduction from the maximum independent set problem on cubic planar graphs. We also propose a 2/3-approximation algorithm for 2 times 2 blocks that runs in linear time in the number of nonzeros in the matrix. We discuss alternatives to rectangular blocks such as diagonal blocks and cross blocks and present complexity analysis and approximation algorithms.

Cover page: Finding nonoverlapping substructures of a sparse matrix

Article
Peer Reviewed

Interprocessor communication with limited memory

LBL Publications (2003)

Many parallel applications require periodic redistribution of workloads and associated data. In a distributed memory computer, this redistribution can be difficult if limited memory is available for receiving messages. We propose a model for optimizing the exchange of messages under such circumstances which we call the \it minimum phase remapping problem. We first show that the problem is NP-Complete, and then analyze several methodologies for addressing it. First, we show how the problem can be phrased as an instance of multi-commodity flow. Next, we study a continuous approximation to the problem. We show that this continuous approximation has a solution which requires at most two more phases than the optimal discrete solution, but the question of how to consistently obtain a good discrete solution from the continuous problem remains open. We also devise simple and practical approximation algorithm for the problem with a bound of 1.5 times the optimal number of phases. We also present an empirical study of variations of our algorithms which indicate that our approaches are quite practical.

Cover page: Interprocessor communication with limited memory

Article

Exploiting Flexibly Assignable Work to Improve Load Balance

LBL Publications (2002)

In many applications of parallel computing, distribution of the data unambiguously implies distribution of work among processors. But there are exceptions where some tasks can be assigned to one of several processors without altering the total volume of communication. In this paper, we study the problem of exploiting this flexibility in assignment of tasks to improve load balance. We first model the problem in terms of network flow and use combinatorial techniques for its solution. Our parametric search algorithms use maximum flow algorithms for probing on a candidate optimal solution value. We describe two algorithms to solve the assignment problem with log W_T and bar P bar probe calls, where W_T and bar P bar, respectively, denote the total workload and number of processors. We also define augmenting paths and cuts for this problem, and show that any algorithm based on augmenting paths can be used to find an optimal solution for the task assignment problem. We then consider a continuous version of the problem, and formulate it as a linearly constrained optimization problem, i.e., min bar Ax bar_infty,; rms.t.; Bx=d. To avoid solving an intractable infty-norm optimization problem, we show that in this case minimizing the 2-norm is sufficient to minimize the infty-norm, which reduces the problem to the well-studied linearly-constrained least squares problem. The continuous version of the problem has the advantage of being easily amenable to parallelization.

Cover page: Exploiting Flexibly Assignable Work to Improve Load Balance

Article
Peer Reviewed

Fast Optimal Load Balancing Algorithms for 1D Partitioning

LBL Publications (2002)

One-dimensional decomposition of nonuniform workload arrays for optimal load balancing is investigated. The problem has been studied in the literature as "chains-on-chains partitioning" problem. Despite extensive research efforts, heuristics are still used in parallel computing community with the "hope" of good decompositions and the "myth" of exact algorithms being hard to implement and not runtime efficient. The main objective of this paper is to show that using exact algorithms instead of heuristics yields significant load balance improvements with negligible increase in preprocessing time. We provide detailed pseudocodes of our algorithms so that our results can be easily reproduced. We start with a review of literature on chains-on-chains partitioning problem. We propose improvements on these algorithms as well as efficient implementation tips. We also introduce novel algorithms, which are asymptotically and runtime efficient. We experimented with data sets from two different applications: Sparse matrix computations and Direct volume rendering. Experiments showed that the proposed algorithms are 100 times faster than a single sparse-matrix vector multiplication for 64-way decompositions on average. Experiments also verify that load balance can be significantly improved by using exact algorithms instead of heuristics. These two findings show that exact algorithms with efficient implementations discussed in this paper can effectively replace heuristics.

Cover page: Fast Optimal Load Balancing Algorithms for 1D Partitioning

Article

Sequence Compaction to Preserve Transition Frequencies

LBL Publications (2002)

Simulation-based power estimation is commonly used for its high accuracy despite excessive computation times. Techniques have been proposed to speed it up by compacting an input sequence while preserving its power-consumption characteristics. We propose a novel method to compact a sequence that preserves transition frequencies. We prove the problem is NP-Complete, and propose a graph model to reduce it to that of finding a heaviest weighted trail on a directed graph, along with a heuristic utilizing this model. We also propose using multiple sequences for better accuracy with even shorter sequences. Experiments showed that power dissipation can be estimated with an error of only 2.3 percent, while simulation times are reduced by 10. Proposed methods effectively preserve transition frequencies and generated solutions that are very close to an optimal. Experiments also showed that multiple sequences granted more accurate results with even shorter sequences.

Cover page: Sequence Compaction to Preserve Transition Frequencies

Article
Peer Reviewed

Finding Nonoverlapping Dense Blocks of a Sparse Matrix

LBL Publications (2004)

Many applications of scientific computing rely on computations on sparse matrices. The design of efficient implementations of sparse matrix kernels is crucial for the overall efficiency of these applications. Due to the high compute-to-memory ratio and irregular memory access patterns, the performance of sparse matrix kernels is often far away from the peak performance on a modern processor. Alternative data structures have been proposed, which split the original matrix A into A_d and A_s, so that A_d contains all dense blocks of a specified size in the matrix, and A_s contains the remaining entries. This enables the use of dense matrix kernels on the entries of A_d producing better memory performance. In this work, we study the problem offinding a maximum number of nonoverlapping dense blocks in a sparse matrix, which is previously not studied in the sparse matrix community. We show that the maximum nonoverlapping dense blocks problem is NP-complete by using a reduction from the maximum independent set problem on cubic planar graphs. We also propose a 2/3-approximation algorithm that runs in linear time in the number of nonzeros in the matrix. This extended abstract focuses on our results for 2x2 dense blocks. However we show that our results can be generalized to arbitrary sized dense blocks, and many other oriented substructures, which can be exploited to improve the memory performance of sparse matrix operations.

Cover page: Finding Nonoverlapping Dense Blocks of a Sparse Matrix

Article

Improving load balance with flexibly assignable tasks

LBL Publications (2003)

In many applications of parallel computing, distribution of the data unambiguously implies distribution of work among processors. But there are exceptions where some tasks can be assigned to one of several processors without altering the total volume of communication. In this paper, we study the problem of exploiting this flexibility in assignment of tasks to improve load balance. We first model the problem in terms of network flow and use combinatorial techniques for its solution. Our parametric search algorithms use maximum flow algorithms for probing on a candidate optimal solution value. We describe two algorithms to solve the assignment problem with \logW_T and vbar P vbar probe calls, w here W_T and vbar P vbar, respectively, denote the total workload and number of proce ssors. We also define augmenting paths and cuts for this problem, and show that anyalgorithm based on augmenting paths can be used to find an optimal solution for the task assignment problem. We then consider acontinuous version of the problem, and formulate it as a linearly constrained optimization problem, i.e., \min\|Ax\|_\infty,\; \rms.t. \; Bx=d. To avoid solving an intractable \infty-norm optimization problem, we show that in this case minimizing the 2-norm is sufficient to minimize the \infty-norm, which reduces the problem to the well-studied linearly-constrained least squares problem. The continuous version of the problem has the advantage of being easily amenable to parallelization. Our experiments with molecular dynamics and overlapped domain decomposition applications proved the effectiveness of our methods with significant improvements in load balance. We also discuss how our techniques can be enhanced for heterogeneous systems.

Cover page: Improving load balance with flexibly assignable tasks

Article

Computing Criticality of Lines in Power Systems

LBL Publications (2006)

We propose a computationally efficient method based on nonlinear optimization to identify critical lines, failure of which can cause severe blackouts. Our method computes criticality measure for all lines at a time, as opposed to detecting a single vulnerability, providing a global view of the system. This information on criticality of lines can be used to identify multiple contingencies by selectively exploring multiple combinations of broken lines. The effectiveness of our method is demonstrated on the IEEE 30 and 118 bus systems, where we can very quickly detect the most critical lines in the system and identify severe multiple contingencies.

Cover page: Computing Criticality of Lines in Power Systems

Article

The Inhibiting Bisection Problem

LBL Publications (2006)

Given a graph where each vertex is assigned a generation or consumption volume, we try to bisect the graph so that each part has a significant generation/consumption mismatch, and the cutsize of the bisection is small. Our motivation comes from the vulnerability analysis of distribution systems such as the electric power system. We show that the constrained version of the problem, where we place either the cutsize or the mismatch significance as a constraint and optimize the other, is NP-complete, and provide an integer programming formulation. We also propose an alternative relaxed formulation, which can trade-off between the two objectives and show that the alternative formulation of the problem can be solved in polynomial time by a maximum flow solver. Our experiments with benchmark electric power systems validate the effectiveness of our methods.

Cover page: The Inhibiting Bisection Problem