Lawrence Berkeley National Laboratory
Optimizing I/O Costs of Multi-Dimensional Queries using Bitmap Indices
- Author(s): Rotem, Doron
- Stockinger, Kurt
- Wu, Kesheng
- et al.
Bitmap indices are efficient data structures for processing complex, multi-dimensional queries in data warehouse applications and scientific data analysis. For high-cardinality attributes, a common approach is to build bitmap indices with binning. This technique partitions the attribute values into a number of ranges, called bins, and uses bitmap vectors to represent bins (attribute ranges) rather than distinct values. In order to yield exact query answers, parts of the original data values have to be read from disk for checking against the query constraint. This process is referred to as "candidate check" and usually dominates the total query processing time. In this paper we study several strategies for optimizing the "candidate check" cost for multi-dimensional queries. We present an efficient "candidate check" algorithm based on attribute value distribution, query distribution as well as query selectivity with respect to each dimension. We also show that re-ordering the dimensions during query evaluation can be used to reduce I/O costs. We tested our algorithm on data with various attribute value distributions and query distributions. Our approach shows a significant improvement over traditional binning strategies for bitmap indices.