====== Paper Link ====== [[http://doi.acm.org/10.1145/1265530.1265535 | Provenance semirings]], PODS 2007 ====== Semirings ====== [[http://en.wikipedia.org/wiki/Semiring | Semirings]] are algebraic structures: a set R with two abstract operators "+" and ".", and two special elements "0" and "1". Set of non-negative integers with the natural mapping to "+", ".", "0" and "1" is an example. ====== Basic Idea ====== The paper considers annotated relations : each tuple has a associated tag. This tag could be a binary value (regular relations), a confidence value (for probabilistic databases), a cardinality (for bag semantics), a variable (c-tables). The paper extends relational operators to operate over these annotated relations. A regular relation can be thought of as a function R from the domain of all possible tuples to {0,1} : 1 for tuples in the relation, 0 for every other tuple. Any R with finitely many non-"0" represents a relation. In general for the non-binary case, "0" is reserved for absent tuples. * Union maps to "+" : R1 union R2 -> R1 + R2 * Projection maps to a summation. * Natural Join maps to "." : R1 join R2 -> R1.R2 * Selection maps to "." with a possibly non-finite relation P that contains all tuples that satisfy the predicate, almost like a join. Hence, results of queries are also annotated relations. Now the domain for annotations, "+" and "." can be defined to capture the cases of probabilistic databases, bag semantics, and c-tables. Example, Domain=set of natural numbers, and natural + and . capture bag semantics. Now suppose that we gave a identifier to each tuple in the base relations, and for derived tuples we build up polynomials using the "+" and "." operators using the extended relational algebra above. The annotation of each derived tuple is now a polynomial formula that represents its how-lineage. (how-lineage also includes where lineage. If we mapped both "+" and "." to union, we would generate where-lineage.) By maintaining this polynomial formula, we have enough information to compute confidences for instance (as a Trio paper demonstrated). ====== Key Takeaway ====== Keeping formula lineage over relational operators (like Trio), is enough to accomplish any task that satisfies the semirings properties: confidence value computation, cardinality computation for bag semantics, c-table possible worlds are instances. The paper shows that three distinct scenarios essentially use similar query processing ideas, that can be thought of as computation over the provenance.