On the performance of bitmap indices for high cardinality
It is well established that bitmap indices are efficient for read-only attributes with a small number of distinct values. For an attribute with a large number of distinct values, the size of the bitmap index can be very large. To over come this size problem, specialized compression schemes are used. Even though there is empirical evidence that some of these compression schemes work well, there has not been any systematic analysis of their effectiveness. In this paper, we analyze the time and space complexities of the two most efficient bitmap compression techniques known, the Byte-aligned Bitmap Code (BBC) and the Word-Aligned Hybrid (WAH) code, and study their performance on high cardinality attributes. Our analyses indicate that both compression schemes are optimal in time. The time and space required to operate on two compressed bitmaps are proportional to the total size of the two bitmaps. We demonstrate further that an in-place OR algorithm can operate on a large number of sparse bitmaps in time linear in their total size. Our analyses also show that the compressed indices are relatively small compared with commonly used indices such as B-trees. Given these facts, we conclude that bitmap index is efficient on attributes of low cardinalities as well as on those of high cardinalities. We also verify the analytical results with extensive tests, and identify an optimal way to combine different options to achieve the best performance. The test results confirm the linearity in the total size of the compressed bitmaps, and that WAH out performs BBC by about a factor of two.