A directed graph G(N, E) where N are uniquely labeled modules and E are the edges between modules indicating order of execution of modules. Two special nodes, input and output are source and sink nodes respectively. Every node in G must be on some path from input to output.
A disjoint partition of the nodes in G, {M_1, ..., M_n} created based on a set of relevant modules specified by user. Each M_i is a composite module containing one or more modules from G. Size of a userview is the number of composite modules.
Given a user view, a new workflow specification graph (U(G)) is created where the nodes are the M_i and there is an edge between M_i and M_j if there was an edge between a module within M_i and a module within M_j in G.
A directed acyclic graph, G_r, in which nodes are labeled with unique step-ids as well as the modules (in G) of which they are executions. Module labels ar not unique since G may have cycles and they are unrolled in G_r.
Provenance of a data object is the sequence of modules and input data objects on which it depends. Immediate provenance is the step which produced it and the input set of data objects. Deep provenance is recursively defined as all steps and input data objects that were transitively used to produce it.
Execution of steps within same composite module causes virtual execution of composite module. Provenance is restricted by hiding internal steps as well as data passed between internal steps.
A path in G (or U(G)) which contains no relevant intermediate module (or composite module).
A user view is good if it is:
1 Well-formed - every composite module of view must contain atmost one relevant module. 2 Preserve dataflow - Iff every edge, e, in G that induces an edge, e', on an nr-path in U(G), e lies on an nr-path in G. 3 Complete wrt dataflow - Iff for every edge, e, on an nr-path in G that induces an edge, e', in U(G), e' lies on an nr-path in U(G)
No two M_i, M_j can be removed from U and replaced by (M_i U M_j) to yield a solution that preserves Properties 1-3.