Pagewise preview ]

CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/2002-11
Submitted on 20th of February 2002
Author Kamvar, Sepandar D.; Klein, Dan; Manning, Christopher D.
Title Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based Approach
Date of publication February 2002
Citation Kamvar, Sepandar D.; Klein, Dan; Manning, Christopher D.. Interpreting and Extending Classical Agglomerative Clustering Algorithms using a Model-Based Approach, Stanford Technical Report, 2002.
Number of pages 8
Language English
Project Natural Language Processing Group
Type Technical Report
Subject group Computer Science; Data Mining; Miscellaneous
Abstract We present two results which arise from a model-based approach to hierarchical agglomerative clustering. First, we show formally that the common heuristic agglomerative clustering algorithms -- single-link, complete-link, group-average, and Ward's method -- are each equivalent to a hierarchical model-based method. This interpretation gives a theoretical explanation of the empirical behavior of these algorithms, as well as a principled approach to resolving practical issues, such as number of clusters or the choice of method. Second, we show how a model-based approach can be used to extend these basic agglomerative algorithms. We introduce adjusted complete-link, Mahalanobis-link, and line-link as variants of the classical agglomerative methods, and demonstrate their utility.
Keywords clustering, probabilistic models, model-based clustering, hierarchical clustering
Contact address klein@cs.stanford.edu
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Plain text (text, text.gz, text.zip)
  • Management of the document bysiroker@db.stanford.edu

    Pagewise preview ]


    Stanford InfoLab Publication Server