Related Papers

Applications

  • common analysis in molecular biology: Phylogenomic inference of protein biological function

Summary

  • Given a workflow (a graph of modules), allow users to specify modules they are interested in. Create a ‘userview’ of the workflow consisting mainly of those relevent modules. Then, answer to provenance queries should only contain information about those relevant modules.
  • Provides a notion of a ‘good’ userview
  • Presents an algorithm which takes as input a workflow specification and a set of relevant modules, and constructs a minimal ‘good’ userview
  • Built a system called Zoom*UserViews which provides a graphical interface to view workflow specifications, select relevant modules, construct userviews on-the-fly and pose provenance queries

Details

Workflow Specification - G

A directed graph G(N, E) where N are uniquely labeled modules and E are the edges between modules indicating order of execution of modules. Two special nodes, input and output are source and sink nodes respectively. Every node in G must be on some path from input to output.

User view - U

A disjoint partition of the nodes in G, {M_1, ..., M_n} created based on a set of relevant modules specified by user. Each M_i is a composite module containing one or more modules from G. Size of a userview is the number of composite modules.

Induced workflow specification - U(G)

Given a user view, a new workflow specification graph (U(G)) is created where the nodes are the M_i and there is an edge between M_i and M_j if there was an edge between a module within M_i and a module within M_j in G.

Workflow run - G_r

A directed acyclic graph, G_r, in which nodes are labeled with unique step-ids as well as the modules (in G) of which they are executions. Module labels ar not unique since G may have cycles and they are unrolled in G_r.

Immediate and Deep Provenance

Provenance of a data object is the sequence of modules and input data objects on which it depends. Immediate provenance is the step which produced it and the input set of data objects. Deep provenance is recursively defined as all steps and input data objects that were transitively used to produce it.

Composite execution

Execution of steps within same composite module causes virtual execution of composite module. Provenance is restricted by hiding internal steps as well as data passed between internal steps.

nr-path

A path in G (or U(G)) which contains no relevant intermediate module (or composite module).

Good user view

A user view is good if it is:

1 Well-formed - every composite module of view must contain atmost one relevant module.
2 Preserve dataflow - Iff every edge, e, in G that induces an edge, e', on an nr-path in U(G), e lies on an nr-path in G.
3 Complete wrt dataflow - Iff for every edge, e, on an nr-path in G that induces an edge, e', in U(G), e' lies on an nr-path in U(G)
Minimal user view

No two M_i, M_j can be removed from U and replaced by (M_i U M_j) to yield a solution that preserves Properties 1-3.

 
panda/reading/userview.txt · Last modified: 2009/12/01 11:14 by ragho
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki