Provenance Management in Curated Databases, SIGMOD 2006
Represent the database as a tree
DB/R/tid/F (Database, Relation, Tuple ID, Field)
Record the actions of the following type:
TID, Op, Loc, Src (Transaction ID, Operation, Location, Source)
100, D, T/a, ⊥ (Delete node T/a) 100, C, T/a, S/b (Copy S/b to T/a) 101, I, T/a/c, ⊥ (Insert c at T/a)
Compressions happen in two ways:
Hierarchical: Copying an entire sub-tree <=> Copying all individual nodes in the sub-tree, thus recording of each individual node is not required
Transactional: Skip intermediate steps in a transaction while recording in a log
The log gives us an EDB: Prov(t, x, p, q)
Here are some useful IDBs:
Unchanged: Unch(t, p) ← ¬(∃x,q. Prov(t, x, p, q)) Insert: Ins(t, p) ← Prov(t, I, p, ⊥) Delete: Del(t, p) ← Prov(t, D, p, ⊥) Copy: Copy(t, p, q) ← Prov(t, C, p, q)
Basically, these record whether “p” was changed, in a transaction “t”.
Additionally, we have Trace, which records where “p” came from.
Trace(p, t, p, t). Trace(p, t, q, u) ← Trace(p, t, r, s), Trace(r, s, q, u). Trace(p, t, q, t − 1) ← Copy(t, p, q) Trace(p, t, p, t − 1) ← Unch(t, p)
This is all we need to write fairly complex queries, such as
* Which transaction first created data at a location? (Note tnow is current transaction number)
Src(p) = {u | ∃q.Trace(p, tnow , q, u), Ins(u, q)}
* What is the sequence of all transactions that copied a node to the current position?
Hist(p) = {u | ∃q.Trace(p, tnow , q, u), Copy(u, q)}
* Which transactions were responsible for the creation or modification of the subtree under a node? (Note ≤ is the prefix operator)
Mod(p) = {u | ∃q.p ≤ q, Trace(q, tnow , r, u), ¬Unch(u, r)}
The authors have implemented a “CPDB” (Copy-Paste DB) which works as a wrapper around the traditional database and records user actions.