Link to paper: [[http://portal.acm.org/citation.cfm?id=357777]] Environment: Data warehousing setting Set of source tables T1..Tn Warehouse is relational view V over source data, V=Q(T1..Tn) Problem: Given tuple t in warehouse view V, what set of source tuples comprise t's lineage? Significant differences from Trio lineage: (1) Query Q used to compute V is known. (2) Lineage is sets of source tuples, not boolean combinations. (Can probably turn into boolean formulas based on Q.) (3) Lineage is not computed when V is populated. Instead a "lineage tracing query" LQ is constructed to obtain t's lineage. LQ is based on Q and parameterized by t. LQ is mostly relational with a couple of easy special operators (e.g., Split). Main contributions: (1) Formal declarative definition of lineage independent of specific operators (2) Lineage for specific operators consistent with formal definition, then (inductively) lineage for any relational view (3) Theorems about correctness, transitivity, equivalent queries, etc. (4) Method for generating lineage tracing query from view definition SPJ: One query ASPJ: One query Arbitrary aggregation and set operators: sequence of queries with intermediate results (5) Bag semantics: "derivation sets" (all individual derivations) and "derivation pools" (all possibly contributing tuples) (6) Issues related to warehousing environment, e.g., avoid source queries Some details/examples for (1),(2),(4): (1) Tuple Derivation for an Operator: T = Op(T1..Tn) Lineage of t in T is: a) Maximal subsets T1*..Tn* of T1..Tn such that: b) Op(T1*..Tn*) = {t} c) For all t* in Ti*, Op(T1*..T(i-1)*,{t*},T(i+1)*..Tm*) != emptyset Examples: Select(A>5) from {5,6}. Lineage of 6 in result is {6}. (b) would allow {5,6} but (c) makes sure it's not included. Group-By(A) Sum(B) from {<1,2>,<1,3>,<1,0>}. Lineage of 5 in result is all three tuples. (b) and (c) would allow <1,0> not to be included, but (a) makes sure it's included. (2) Definitions for most individual operators are unsurprising: Selection, Projection, Join, Aggregation, Union Difference: Lineage of t in T1-T2 is <{t},T2>. Consistent with definition. Definition for arbitrary Q is straightforward induction. (4) Tracing Queries SPJ View: Split (Select_c and A=t (T1 join ... join Tn)) ASPJ View: Split (Select_c and G=t.G (T1 join ... join Tn)) General Views: Arbitrary aggregation, union, difference Split into "AUSPJ-segments" and "D-segments", need "intermediate views" (materialized or computed on demand) Example: Project(B) Select (B>5) GroupBy(A) Sum(B) from {<1,2>,<1,3>,<2,4>,<2,6>} Lineage of 10 in result needs intermediate result {<1,5>,<2,10>} Note: Similar intermediate results needed for incremental view maintenance. Note: Additional intermediate results needed for tracing entirely at warehouse.