Link to paper: http://www.eecs.harvard.edu/~syrah/pass/pubs/ipaw08.pdf
This is where PASSv2 starts
Provenance Data (at least... from the PASS perspective)
⁃ e.g. "siblings" -> what does this mean? it's unclear. two provenance objects could have the same parent, but different grandparents ⁃ could also have annotations ⁃ deciding what kinds of relationships to detect and label depends on goals an implementation ⁃ most people care only about ancestry, but we should support direct label capturing
⁃ need to follow paths whose structure is not known beforehand (e.g. "find me all the descendants of X) ⁃ need to support matching for regular expressions on path names (especially if we're allowing multi-layer provenance, where attribute naming conventions are not standardized) ⁃ need to support aggregation (i.e. find me all objects with at least 10 ancestors) ⁃ subqueries (find me the objects in graph A that mach those in graph B that are subject to these constraints) * semi-structured data is a natural fit
Why Other Models Don’t Work
⁃ nodes and edges; get paths from crossing edge-list with itself multiple times ⁃ unwieldy, although possible ⁃ no way of comparing if two paths are the same, or if two paths lead to the same object
⁃ XML is hierarchical, but follows a tree model (i.e. no diamonds); vanilla implementation is unsuitable ⁃ XQuery path always holds the value of the object at the rightmost end; have to truncate to get intermediate results ⁃ XQuery paths are lists of nodes rather than lists of edges ⁃ possible to hack around until it works, but...
⁃ Resource Description Framework -> directed, labeled graphs ⁃ In theory, a perfect fit ⁃ but SPARQL has several shortcomings ⁃ doesn't support sub-queries ⁃ doesn't support constraints on path expressions ⁃ doesn't support path expressions of arbitrary length ⁃ ... and more ⁃ several extensions for SPARQL (almost anything you can imagine, if I recall correctly) ⁃ but it's a pain to get all of them to work at once ⁃ can workaround, and may one day be suitable; current workarounds: ⁃ explicitly code the end-node in the query so you don't have to do a path search ⁃ build a full path by repetition (i.e. to find great grandparents, find all parents of x, all parents of parents of x, all parents of parents of parents of x).
PQL