Pagewise preview ]

CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/2001-27
Submitted on 29th of June 2001
Author Ganesan, Prasanna; Garcia-Molina, Hector; Widom, Jennifer
Title Exploiting Hierarchical Domain Structure to Compute Similarity
Date of publication 2001
Citation Ganesan, Prasanna; Garcia-Molina, Hector; Widom, Jennifer. Exploiting Hierarchical Domain Structure to Compute Similarity, Extended Technical Report, 2001
Number of pages 34
Language English
Project Database Group
Type Technical Report
Subject group Data Mining; E-Commerce
Abstract The notion of similarity between objects finds use in many contexts, e.g., in search engines, collaborative filtering, and clustering. Objects being compared often are modeled as sets, with their similarity traditionally determined based on set intersection. Intersection-based measures do not accurately capture similarity in certain domains, such as when the data is sparse or when there are known relationships between items within sets. We propose new measures that exploit a hierarchical domain structure in order to produce more intuitive similarity scores. We also extend our similarity measures to provide appropriate results in the presence of multisets (also handled unsatisfactorily by traditional measures), e.g., to correctly compute the similarity between customers who buy several instances of the same product (say milk), or who buy several products in the same category (say dairy products). We also provide an experimental comparison of our measures against traditional similarity measures, and describe an informal user study that evaluated how well our measures match human intuition.
Contact address prasannag@cs.stanford.edu
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Plain text (text, text.gz, text.zip)
  • Management of the document bysiroker@db.stanford.edu

    Pagewise preview ]


    Stanford InfoLab Publication Server