Table of Contents

Paper Link

Provenance semirings, PODS 2007

Semirings

Semirings are algebraic structures: a set R with two abstract operators “+” and “.”, and two special elements “0” and “1”. Set of non-negative integers with the natural mapping to “+”, “.”, “0” and “1” is an example.

Basic Idea

The paper considers annotated relations : each tuple has a associated tag. This tag could be a binary value (regular relations), a confidence value (for probabilistic databases), a cardinality (for bag semantics), a variable (c-tables). The paper extends relational operators to operate over these annotated relations.

A regular relation can be thought of as a function R from the domain of all possible tuples to {0,1} : 1 for tuples in the relation, 0 for every other tuple. Any R with finitely many non-”0” represents a relation. In general for the non-binary case, “0” is reserved for absent tuples.

Hence, results of queries are also annotated relations. Now the domain for annotations, “+” and “.” can be defined to capture the cases of probabilistic databases, bag semantics, and c-tables. Example, Domain=set of natural numbers, and natural + and . capture bag semantics.

Now suppose that we gave a identifier to each tuple in the base relations, and for derived tuples we build up polynomials using the “+” and “.” operators using the extended relational algebra above. The annotation of each derived tuple is now a polynomial formula that represents its how-lineage. (how-lineage also includes where lineage. If we mapped both “+” and “.” to union, we would generate where-lineage.) By maintaining this polynomial formula, we have enough information to compute confidences for instance (as a Trio paper demonstrated).

Key Takeaway

Keeping formula lineage over relational operators (like Trio), is enough to accomplish any task that satisfies the semirings properties: confidence value computation, cardinality computation for bag semantics, c-table possible worlds are instances. The paper shows that three distinct scenarios essentially use similar query processing ideas, that can be thought of as computation over the provenance.