Relevant Paper
Motivation

An execution of a workflow that can be represented as a directed acyclic graph produces the lineage information as a dependency graph whose nodes are data sets. We want to efficiently store this graph in a relational database and find all predecessors of a given node using a SQL query. Other systems have encoded each dependency as a pair of parent and child nodes, but lineage retrieval using a recursive query does not scale as the number of nodes increases.

Idea

If the dependency graph is a tree, each node ni can be encoded as an one-dimensional interval (li, ri) over the natural numbers in such a way that:

(ni, li, ri) can be stored in a table, and a SQL query finding all predecessors (or successors) of a given node is trivial.

Contribution

Interval tree encoding cannot be applied for an arbitrary DAG. This paper provides: