[DigLib] Stanford Digital Library Technologies [Working Papers]

SIDL-WP-1999-0113

Taxonomy of Crawlers

Junghoo Cho, Hector Garcia-Molina

paepcke@cs.stanford.edu

Abstract: Crawlers vary dramatically since the underlying crawling applications have different requirements. For instance, personal crawlers expect less bandwidth resources than AltaVista's crawler, while image crawlers open far fewer socket connections per second than text crawlers since image objects are signiffcantly larger than a typical HTML page. These different crawlers can be classified into a taxonomy.


Note: Papers in this series are in development and are not in a final form for publication or general dissemination. They are subject to change. Please do not quote or further distribute them without explicit permission from the authors.
This paper was created on: 08/01/99 and last revised on:9/22/1999

Author's Comments: Rough thoughts on different kinds of crawlers.

Status: PRIVATE

Click here to see the full text of SIDL-WP-1999-0113 (PDF)

Revision History

VersionFormat DateComments
1PS9/22/1999Rough thoughts on different kinds of crawlers.

[Stanford] [DigLib]