Lawrence Berkeley National Laboratory
Efficient binning for bitmap indices on high-cardinality attributes
- Author(s): Rotem, Doron
- Stockinger, Kurt
- Wu, Kesheng
- et al.
Bitmap indexing is a common technique for indexing high-dimensional data in data warehouses and scientific applications. Though efficient for low-cardinality attributes, query processing can be rather costly for high-cardinality attributes due to the large storage requirements for the bitmap indices. Binning is a common technique for reducing storage costs of bitmap indices. This technique partitions the attribute values into a number of ranges, called bins, and uses bitmap vectors to represent bins (attribute ranges) rather than distinct values. Although binning may reduce storage costs, it may increase the access costs of queries that do not fall on exact bin boundaries (edge bins). For this kind of queries the original data values associated with edge bins must be accessed, in order to check them against the query constraints.In this paper we study the problem of finding optimal locations for the bin boundaries in order to minimize these access costs subject to storage constraints. We propose a dynamic programming algorithm for optimal partitioning of attribute values into bins that takes into account query access patterns as well as data distribution statistics. Mathematical analysis and experiments on real life data sets show that the optimal partitioning achieved by this algorithm can lead to a significant improvement in the access costs of bitmap indexing systems for high-cardinality attributes.