UC San Diego
Community-oriented information integration
- Author(s): Katsis, Ioannis
- et al.
To allow their members to collaboratively maintain the community knowledge, modern online communities need to integrate their members' structured data into a single community database. Existing integration solutions employed by enterprises are not suited for communities, as they rely on a central authority to carry out the integration tasks and are therefore too costly and not scalable to large numbers of sources. To solve this problem, we propose the community-oriented integration (CII) paradigm, which removes the need for a central authority by delegating the integration tasks to the individual community members. In this dissertation, we describe how to decentralize two main integration tasks: The registration of sources in the integration system, which becomes the responsibility of each source owner and the resolution of inconsistencies in the collective data, which is delegated to the consumers of the integrated data. To facilitate this distribution, we introduce a novel architecture and two tools - RIDE and Ricolla - that assist the community members, who typically lack the sophistication of the central authority and an overview of the system, in carrying out the source registration and inconsistency resolution tasks, respectively, autonomously. RIDE models source registrations as sets of Global and Local As View (GLAV) mappings and assists the source owner in creating a registration that balances two competing requirements: Making her data visible to applications that run on top of the community database (by exporting more data) and minimizing the cleaning cost required for publication (by publishing less data). We model these trade-offs as different self-reliance levels, present decidability results and appropriate decision procedures (when existent) and describe an algorithm for interactively guiding the user towards a registration with a particular self-reliance level. On the inconsistency resolution front, Ricolla models inconsistent data as sets of possible worlds and displays them to the users through a novel data model that summarizes them in an easily understandable and compact form. It also offers a flexible architecture that enables different schemes for the inconsistency resolution, allowing among others users to resolve inconsistencies individually, according to their own opinions, or in collaboration with their peers