Search

Scholarly Works (20 results)

Sort By:

Show:

Article

Workload-Adaptive Management of Energy-Smart Disk Storage Systems

Otoo, Ekow

LBL Publications (2010)

Recent studies have identified disk storage systems as one of the major consumers of power in data centers. Many disk power management (DPM) schemes were suggested where the power consumed by disks is reduced by spinning them down during long idle periods. Spinning the disks down and up results in additional energy and response time costs. For that reason, DPM schemes are effective only if the disks experience relatively long idle periods and the scheme does not introduce a severe response time penalty. In this paper we introduce a dynamic block exchange algorithm which switches data between disks based on the observed workload such that frequently accessed blocks end up residing on a few "hot" disks thus allowing the majority of disks to experience longer idle periods. We validate the effectiveness of the algorithm with trace-driven simulations showing power savings of up to 60percent with very small response time penalties.

Cover page: Workload-Adaptive Management of Energy-Smart Disk Storage Systems

Article

Energy Smart Management of Scientific Data

Otoo, Ekow

LBL Publications (2009)

Scientific data centers comprised of high-powered computing equipment and large capacity disk storage systems consume considerable amount of energy. Dynamic power management techniques (DPM) are commonly used for saving energy in disk systems. These involve powering down disks that exhibit long idle periods and placing them in standby mode. A file request from a disk in standby mode will incur both energy and performance penalties as it takes energy (and time) to spin up the disk before it can serve a file. For this reason, DPM has to make decisions as to when to transition the disk into standby mode such that the energy saved is greater than the energy needed to spin it up again and the performance penalty is tolerable. The length of the idle period until the DPM decides to power down a disk is called idleness threshold. In this paper, we study both analytically and experimentally dynamic power management techniques that save energy subject to performance constraints on file access costs. Based on observed workloads of scientific applications and disk characteristics, we provide a methodology for determining file assignment to disks and computing idleness thresholds that result in significant improvements to the energy saved by existing DPMsolutions while meeting response time constraints. We validate our methods with simulations that use traces taken from scientific applications.

Cover page: Energy Smart Management of Scientific Data

Article

Parallel Access of Out-Of-Core Dense Extendible Arrays

Otoo, Ekow J

LBL Publications (2009)

Datasets used in scientific and engineering applications are often modeled as dense multi-dimensional arrays. For very large datasets, the corresponding array models are typically stored out-of-core as array files. The array elements are mapped onto linear consecutive locations that correspond to the linear ordering of the multi-dimensional indices. Two conventional mappings used are the row-major order and the column-major order of multi-dimensional arrays. Such conventional mappings of dense array files highly limit the performance of applications and the extendibility of the dataset. Firstly, an array file that is organized in say row-major order causes applications that subsequently access the data in column-major order, to have abysmal performance. Secondly, any subsequent expansion of the array file is limited to only one dimension. Expansions of such out-of-core conventional arrays along arbitrary dimensions, require storage reorganization that can be very expensive. We present a solution for storing out-of-core dense extendible arrays that resolve the two limitations. The method uses a mapping function F*(), together with information maintained in axial vectors, to compute the linear address of an extendible array element when passed its k-dimensional index. We also give the inverse function, F-1*() for deriving the k-dimensional index when given the linear address. We show how the mapping function, in combination with MPI-IO and a parallel file system, allows for the growth of the extendible array without reorganization and no significant performance degradation of applications accessing elements in any desired order. We give methods for reading and writing sub-arrays into and out of parallel applications that run on a cluster of workstations. The axial-vectors are replicated and maintained in each node that accesses sub-array elements.

Cover page: Parallel Access of Out-Of-Core Dense Extendible Arrays

Article

Analysis of Trade-Off Between Power Saving and Response Time in Disk Storage Systems

Otoo, Ekow J

LBL Publications (2009)

It is anticipated that in the near future disk storage systems will surpass application servers and will become the primary consumer of power in the data centers. Shutting down of inactive disks is one of the more widespread solutions to save power consumption of disk systems. This solution involves spinning down or completely shutting off disks that exhibit long periods of inactivity and placing them in standby mode. A file request from a disk in standby mode will incur an I/O cost penalty as it takes time to spin up the disk before it can serve the file. In this paper, we address the problem of designing and implementing file allocation strategies on disk storage that save energy while meeting performance requirements of file retrievals. We present an algorithm for solving this problem with guaranteed bounds from the optimal solution. Our algorithm runs in O(nlogn) time where n is the number of files allocated. Detailed simulation results and experiments with real life workloads are also presented.

Cover page: Analysis of Trade-Off Between Power Saving and Response Time in Disk Storage Systems

Article
Peer Reviewed

Optimal Chunking of Large Multidimensional Arrays for Data Warehousing

Otoo, Ekow J

LBL Publications (2008)

Very large multidimensional arrays are commonly used in data intensive scientific computations as well as on-line analytical processing applications referred to as MOLAP. The storage organization of such arrays on disks is done by partitioning the large global array into fixed size sub-arrays called chunks or tiles that form the units of data transfer between disk and memory. Typical queries involve the retrieval of sub-arrays in a manner that access all chunks that overlap the query results. An important metric of the storage efficiency is the expected number of chunks retrieved over all such queries. The question that immediately arises is "what shapes of array chunks give the minimum expected number of chunks over a query workload?" The problem of optimal chunking was first introduced by Sarawagi and Stonebraker who gave an approximate solution. In this paper we develop exact mathematical models of the problem and provide exact solutions using steepest descent and geometric programming methods. Experimental results, using synthetic and real life workloads, show that our solutions are consistently within than 2.0percent of the true number of chunks retrieved for any number of dimensions. In contrast, the approximate solution of Sarawagi and Stonebraker can deviate considerably from the true result with increasing number of dimensions and also may lead to suboptimal chunk shapes.

Cover page: Optimal Chunking of Large Multidimensional Arrays for Data Warehousing

Article

A File Allocation Strategy for Energy-Efficient Disk Storage Systems

Otoo, Ekow J

LBL Publications (2008)

Exponential data growth is a reality for most enterprise and scientific data centers. Improvements in price/performance and storage densities of disks have made it both easy and affordable to maintain most of the data in large disk storage farms. The provisioning of disk storage farms however, is at the expense of high energy consumption due to the large number of spinning disks. The power for spinning the disks and the associated cooling costs is a significant fraction of the total power consumption of a typical data center. Given the trend of rising global fuel and energy prices and the high rate of data growth, the challenge is to implement appropriate configurations of large scale disk storage systems that meet performance requirements for information retrieval across data centers. We present part of the solution to this challenge with an energy efficient file allocation strategy on a large scale disk storage system. Given performance characteristics of the disks, and a profile of the workload in terms of frequencies of file requests and their sizes, the basic idea is to allocate files to disks such that the disks can be configured into two sets of active (constantly spinning), and passive (capable of being spun up or down) disk pools. The goal is to minimize the number of active disks subject to I/O performance constraints. We present an algorithm for solving this problem with guaranteed bounds from the optimal solution. Our algorithm runs in O(n) time where n is the number of files allocated. It uses a mapping of our file allocation problem to a generalization of the bin packing problem known as 2-dimensional vector packing. Detailed simulation results are also provided.

Cover page: A File Allocation Strategy for Energy-Efficient Disk Storage Systems

Accelerating Queries on Very Large Datasets

LBL Publications (2009)

In this chapter, we explore ways to answer queries on large multi-dimensional data efficiently. Given a large dataset, a user often wants to access only a relatively small number of the records. Such a selection process is typically performed through an SQL query in a database management system (DBMS). In general, the most effective technique to accelerate the query answering process is indexing. For this reason, our primary emphasis is to review indexing techniques for large datasets. Since much of scientific data is not under the management of DBMS systems, our review includes many indexing techniques outside of DBMS systems as well. Among the known indexing methods, bitmap indexes are particularly well suited for answering such queries on large scientific data. Therefore, more details are given on the state of the art of bitmap indexing techniques. This chapter also briefly touches on some emerging data analysis systems that don't yet make use of indexes. We present some evidence that these systems could also benefit from the use of indexes.

Cover page: Accelerating Queries on Very Large Datasets

Article

A Simpler Proof Of The Average Case Complexity Of Union-Find With Path Compression

LBL Publications (2005)

We present a modified union-find algorithm that represent the data in an array rather than the commonly used pointer-based data structures, and a simpler proof that the average case complexity of the union-find algorithm is linear.

Cover page: A Simpler Proof Of The Average Case Complexity Of Union-Find With Path Compression

Article

Accurate modeling of cache replacement policies in a Data-Grid.

Lawrence Berkeley National Laboratory (2003)

Caching techniques have been used to improve the performance gap of storage hierarchies in computing systems. In data intensive applications that access large data files over wide area network environment, such as a data grid,caching mechanism can significantly improve the data access performance under appropriate workloads. In a data grid, it is envisioned that local disk storage resources retain or cache the data files being used by local application. Under a workload of shared access and high locality of reference, the performance of the caching techniques depends heavily on the replacement policies being used. A replacement policy effectively determines which set of objects must be evicted when space is needed. Unlike cache replacement policies in virtual memory paging or database buffering, developing an optimal replacement policy for data grids is complicated by the fact that the file objects being cached have varying sizes and varying transfer and processing costs that vary with time. We present an accurate model for evaluating various replacement policies and propose a new replacement algorithm referred to as "Least Cost Beneficial based on K backward references (LCB-K)." Using this modeling technique, we compare LCB-K with various replacement policies such as Least Frequently Used (LFU), Least Recently Used (LRU), Greedy DualSize (GDS), etc., using synthetic and actual workload of accesses to and from tertiary storage systems. The results obtained show that (LCB-K) and (GDS) are the most cost effective cache replacement policies for storage resource management in data grids.

Cover page: Accurate modeling of cache replacement policies in a Data-Grid.

Article

Compressed bitmap indices for efficient query processing

LBL Publications (2001)

Many database applications make extensive use of bitmap indexing schemes. In this paper, we study how to improve the efficiencies of these indexing schemes by proposing new compression schemes for the bitmaps. Most compression schemes are designed primarily to achieve good compression. During query processing they can be orders of magnitude slower than their uncompressed counterparts. The new schemes are designed to bridge this performance gap by reducing compression effectiveness and improving operation speed. In a number of tests on both synthetic data and real application data, we found that the new schemes significantly outperform the well-known compression schemes while using only modestly more space. For example, compared to the Byte-aligned Bitmap Code, the new schemes are 12 times faster and it uses only 50 percent more space. The new schemes use much less space(<30\ percent) than the uncompressed scheme and are faster in a majority of the test cases.

Cover page: Compressed bitmap indices for efficient query processing