CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/2002-7
Previous version2001-8
Submitted on 14th of February 2002
Author Haveliwala, Taher H.; Gionis, Aristides; Klein, Dan; Indyk, Piotr
Title Evaluating Strategies for Similarity Search on the Web
Date of publication 2002
Published in WWW-2002
Citation Taher H. Haveliwala, Aristides Gionis, Dan Klein, and Piotr Indyk. Evaluating Strategies for Similarity Search on the Web. To appear in Proceedings of the Eleventh International World Wide Web Conference, 2002.
Number of pages 10
Language English
Project Stanford InfoLab; Database Group; Natural Language Processing Group
Type Conference or Journal Paper
Subject group Data Mining; Databases and the Web
Abstract Finding pages on the Web that are similar to a query page (Related Pages) is an important component of modern search engines. A variety of strategies have been proposed for answering Related Pages queries, but comparative evaluation by user studies is expensive, especially when large strategy spaces must be searched (e.g., when tuning parameters). We present a technique for automatically evaluating strategies using Web hierarchies, such as Open Directory, in place of user feedback. We apply this evaluation methodology to a mix of document representation strategies, including the use of text, anchor-text, and links. We discuss the relative advantages and disadvantages of the various approaches examined. Finally, we describe how to efficiently construct a similarity index out of our chosen strategies, and provide sample results from our index.
Keywords related pages, similarity search, search, evaluation, Open Directory Project
Sponsored by NSF Grant IIS-0085896
NSF Grant IIS-0118173
NSF Graduate Research Fellowship
Microsoft Research Graduate Fellowship
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Management of the document bysiroker@db.stanford.edu


    Stanford InfoLab Publication Server