== Relevant Paper == * [[http://portal.acm.org/citation.cfm?id=1376716|Efficient Lineage Tracking For Scientific Workflows]] (SIGMOD 2008) – Thomas Heinis, Gustavo Alonso == Motivation == An execution of a workflow that can be represented as a directed acyclic graph produces the lineage information as a dependency graph whose nodes are data sets. We want to efficiently store this graph in a relational database and find all predecessors of a given node using a SQL query. Other systems have encoded each dependency as a pair of parent and child nodes, but lineage retrieval using a recursive query does not scale as the number of nodes increases. == Idea == If the dependency graph is a tree, each node ni can be encoded as an one-dimensional interval (li, ri) over the natural numbers in such a way that: * n1 is a predecessor of n2 iff l1 < l2 and r1 > r2 * n2 is a predecessor of n1 iff l2 < l1 and r2 > r1 * n1 and n2 are not related iff otherwise (ni, li, ri) can be stored in a table, and a SQL query finding all predecessors (or successors) of a given node is trivial. == Contribution == Interval tree encoding cannot be applied for an arbitrary DAG. This paper provides: * transformation algorithm that takes an arbitrary DAG as input and outputs an equivalent DAG that can be encoded with 1-D intervals * interval assignment algorithm