[ Pagewise preview ]
| Category | Value | ||
| Available via | http://dbpubs.stanford.edu/pub/2003-11 | ||
| Submitted on | 17th of February 2003 | ||
| Author | Raghavan, Sriram; Garcia-Molina, Hector | ||
| Title | Complex Queries over Web Repositories | ||
| Date of publication | 2003 | ||
| Citation | Raghavan, Sriram; Garcia-Molina, Hector. Complex Queries over Web Repositories, | ||
| Number of pages | 23 | ||
| Language | English | ||
| Project | Digital Libraries | ||
| Type | Technical Report | ||
| Subject group | Databases and the Web | ||
| Abstract | Web repositories, such as the Stanford WebBase repository, manage large heterogeneous collections of Web pages and associated indexes. For effective analysis and mining, these repositories must provide a declarative query interface that supports "complex expressive Web queries". Such queries have two key characteristics: (i) They view a Web repository simultaneously as a collection of text documents, as a navigable directed graph, and as a set of relational tables storing properties of Web pages (length, URL, title, etc.). (ii) The queries employ application-specific ranking and ordering relationships over pages and links to filter out and retrieve only the "best" query results. In this paper, we model a Web repository in terms of "Web relations" and describe an algebra for expressing complex Web queries. Our algebra extends traditional relational operators as well as graph navigation operators to uniformly handle plain, ranked, and ordered Web relations. In addition, we present an overview of the cost-based optimizer and execution engine that we have developed, to efficiently execute Web queries over large repositories. | ||
| Keywords | Complex queries, Web repositories, Ordered relations, Web graph navigation | ||
| Fulltext source |
| Management of the document by | pubs@db.stanford.edu
| |
[ Pagewise preview ]