Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Democratic community-based search with XML full-text queries

Abstract

As the web evolves, it is becoming easier to form online communities based on shared interests, and to create and publish data on a wide variety of topics. With this democratization of information creation, it is natural to query, in an ad-hoc and expressive fashion, the global collection that is the union of all local data collections of others within the community. In order to publish and locate documents of interest while fully delivering on the promise of free data exchange, any community-supporting infrastructure needs to enforce the key requirement to preserve privacy of the association of content providers with potential sensitive information. This privacy- preserving publishing requirement prevents censorship, harassment, or discrimination of users by third parties. It also precludes some obvious approaches that reuse and build on existing centralized technologies including search engines and hosted online communities. This dissertation facilitates democratization of data publishing and efficient search with powerful full-text queries over the community global collection by means of a novel distributed framework that disseminate queries in online communities. We address two challenging issues that arise in this context: the design of distributed access methods to publishers and the evaluation of expressive queries (i.e., XML full-text) locally at the publisher thereof. First, given the virtual nature of the global data collection, we study the problem of efficiently discovering publishers in the community that contain documents matching a user query. We call such peers relevant publishers. We propose a novel distributed infrastructure in which data resides only with the publishers owning it. The infrastructure disseminates user queries to publishers, who answer them at their own discretion, under data-location anonymity constraints. That is the query forwarding infrastructure prevents leaking information about which publishers are capable of answering a certain query. Second, once queries reach relevant publishers, we study how they efficiently process the incoming queries over their local repositories. Given that the commonly used data model for information exchange on the Web is semi-structured (e.g., XML), we propose algorithms for the evaluation and optimization of expressive XML queries that integrate structured and full- text search, including the W3C XQuery Full-Text standard

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View