Relevant Paper
Motivation

An execution of a workflow that can be represented as a directed acyclic graph produces the lineage information as a dependency graph whose nodes are data sets. We want to efficiently store this graph in a relational database and find all predecessors of a given node using a SQL query. Other systems have encoded each dependency as a pair of parent and child nodes, but lineage retrieval using a recursive query does not scale as the number of nodes increases.

Idea

If the dependency graph is a tree, each node ni can be encoded as an one-dimensional interval (li, ri) over the natural numbers in such a way that:

  • n1 is a predecessor of n2 iff l1 < l2 and r1 > r2
  • n2 is a predecessor of n1 iff l2 < l1 and r2 > r1
  • n1 and n2 are not related iff otherwise

(ni, li, ri) can be stored in a table, and a SQL query finding all predecessors (or successors) of a given node is trivial.

Contribution

Interval tree encoding cannot be applied for an arbitrary DAG. This paper provides:

  • transformation algorithm that takes an arbitrary DAG as input and outputs an equivalent DAG that can be encoded with 1-D intervals
  • interval assignment algorithm
 
panda/reading/heinis.txt · Last modified: 2010/02/05 15:34 by hyunjung
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki