Data lineage model for Taverna workflows with lightweight annotation requirements

Link: Data lineage model for Taverna workflows with lightweight annotation requirements

The focus of this paper is on annotating workflows with more precise and less noisy lineage.

We can always reconstruct the provenance of a workflow data product from the workflow execution trace. However, there are two highlighted problems with the default provenance:

1) Lack of precision: We assume that most of the transformations in our workflow are ‘black-box’, meaning that the transformations do not automatically store fine-grained provenance. So by default, all we can state for an output item is that its lineage consists of all input items from the input data set.

2) Noisy: Some of the transformations in our workflow may not be important from a provenance standpoint. An example of a transformation that we may want to ignore is a ‘string-to-int type-conversion’ transformation.

This paper gives two ways to annotate workflows to make the lineage more useful.

1) Instance-level lineage: One way is to annotate output items with more specific lineage. 1-1 transformations and aggregations are two classes of transformations for which the user can specify more specific lineage.

2) Ignore transformations: The other way is to ignore transformations that are marked as ‘insignificant’.

Finally, the paper notes if a transformation is deterministic, we can always trace the lineage lazily.

 
panda/reading/taverna_annotation.txt · Last modified: 2009/12/01 11:21 by ragho
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki