[ Pagewise preview ]
| Category | Value | ||
| Available via | http://dbpubs.stanford.edu/pub/2008-8 | ||
| Submitted on | 17th of March 2008 | ||
| Author | Das Sarma, Anish; Dong, Luna; Halevy, Alon | ||
| Title | Bootstrapping Pay-As-You-Go Data Integration Systems | ||
| Date of publication | 2008 | ||
| Published in | SIGMOD, 2008 | ||
| Citation | Das Sarma, Anish; Dong, Luna; Halevy, Alon. Bootstrapping Pay-As-You-Go Data Integration Systems, SIGMOD, 2008 | ||
| Number of pages | 13 | ||
| Language | English | ||
| Project | Miscellaneous | ||
| Type | Conference or Journal Paper | ||
| Subject group | Miscellaneous | ||
| Abstract | Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront effort of creating a mediated schema and semantic mappings from the data sources to the mediated schema. Many application contexts involving multiple data sources (e.g., the web, personal information management, enterprise intranets) do not require full integration in order to provide useful services, motivating a pay-as-you-go approach to integration. With that approach, a system starts with very few (or inaccurate) semantic mappings and these mappings are improved over time as deemed necessary. This paper describes the first completely self-configuring data integration system. The goal of our work is to investigate how advanced of a starting point we can provide a pay-as-you-go system. Our system is based on the new concept of a probabilistic mediated schema that is automatically created from the data sources. We automatically create probabilistic schema mappings between the sources and the mediated schema. We describe experiments in multiple domains, including 50-800 data sources, and show that our system is able to produce high-quality answers with no human intervention. | ||
| Keywords | pay-as-you-go data integration | ||
| Contact address | anish@cs.stanford.edu | ||
| Fulltext source |
| Management of the document by | siroker@db.stanford.edu
| |
[ Pagewise preview ]