Skip to main content
eScholarship
Open Access Publications from the University of California

Using Bitmap Index for Joint Queries on Structured and Text Data

Abstract

The database and the information retrieval communities have been working on separate sets of techniques for querying structured data and text data, but there is a growing need to handle these types of data together. In this paper, we present a strategy to efficiently answer joint queries on both types of data. By using an efficient compression algorithm, our compressed bitmap indexes, called FastBit, are compact even when they contain millions of bitmaps. Therefore FastBit can be applied effectively on hundreds of thousands of terms over millions of documents. Bitmap indexes are designed to take advantage of data that only grows but does not change over time (append-only data), and thus are just as effective with append-only text archives. In a performance comparison against a commonly used database system with a full-text index, MySQL, we demonstrate that our indexes answer queries 50 times faster on average. Furthermore, we demonstrate that integrating FastBit with a open source database system, called MonetDB, yields similar performance gains. Since the integrated MonetDB/FastBit system provides the full SQL functionality, the overhead of supporting SQL is not the main reason for the observed performance differences. Therefore, using FastBit in other database systems can offer similar performance advantages.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View