Pagewise preview ]

CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/2001-5
Submitted on 29th of January 2001
Author Yingwei Cui, Jennifer Widom
Title Lineage Tracing for General Data Warehouse Transformations
Date of publication 2001
Published in Proc. of 27th International Conference on Very Large Data Bases (VLDB'01), Rome, Italy, September, 2001.
Citation Yingwei Cui, Jennifer Widom. Lineage Tracing for General Data Warehouse Transformations, Proc. of 27th International Conference on Very Large Data Bases (VLDB'01), Rome, Italy, September, 2001.
Number of pages 30
Language English
Project WHIPS
Type Conference or Journal Paper
Subject group Data Warehousing
Abstract Data warehousing systems integrate information from operational data sources into a central repository to enable analysis and mining of the integrated information. During the integration process, source data typically undergoes a series of transformations, which may vary from simple algebraic operations or aggregations to complex data cleansing procedures. In a warehousing environment, the data lineage problem is that of tracing warehouse data items back to the original source items from which they were derived. We formally define the lineage tracing problem in the presence of general data warehouse transformations, and we present algorithms for lineage tracing in this environment. Our tracing procedures take advantage of known structure or properties of transformations when present, but also work in the absence of such information. Our results can be used as the basis for a lineage tracing tool in a general warehousing setting, and also can guide the design of data warehouses that enable efficient lineage tracing.
Keywords lineage, data transformation, schema mapping
Contact address Computer Science Department, Stanford, CA 94305
{cyw, widom}@db.stanford.edu
Sponsored by This work was supported by the National Science Foundation under grants IIS-9811947 and IIS-9817799, and by Sagent Technology Inc.
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Plain text (text, text.gz, text.zip)
  • Management of the document bysiroker@db.stanford.edu

    Pagewise preview ]


    Stanford InfoLab Publication Server