- Author(s): Chen, Liang;
- et al.
We are witnessing a growing number of applications that involve both structured data and unstructured data. A simple example is academic citations : while the citation's content is unstructured text, the citation is associated with structured data such as author list, categories and publication time. To query such hybrid data, a natural approach is to combine structured queries with keyword search. Two fundamental problems arise for this unique marriage : (1) How to evaluate hybrid queries efficiently? (2) How to model relevance ranking? The second problem is especially difficult, because all the foundations of relevance ranking in information retrieval are built on unstructured text and no structures are considered. We present context-sensitive ranking, a ranking framework that integrates structured queries and relevance ranking. The key insight is that structured queries provide expressive search contexts. The ranking model collects keyword statistics in the contexts and feeds them into conventional ranking formulas to compute ranking scores. The query evaluation challenge is the computation of keyword statistics at runtime, which involves expensive online aggregations. At the core of our solution to overcome the efficiency issue is an innovative reduction from computing keyword statistics to answering aggregation queries. Many statistics, such as document frequency, require aggregations over the data space returned by the structured query. This is analogous to analytical queries in OLAP applications, which involve a large number of aggregations. We leverage and extend the materialized view research in OLAP to deliver algorithms and data structures that evaluate context-sensitive ranking efficiently