Pagewise preview ]

CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/1997-75
Submitted on 29th of October 2001
Author Koller, Daphne; Sahami, Mehran
Title Hierarchically classifying documents using very few words
Date of publication 25th of February 1997
Citation Koller, Daphne; Sahami, Mehran. Hierarchically classifying documents using very few words,
Number of pages 17
Language English
Project Digital Libraries
Type Other
Subject group Digital Libraries
Abstract The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. One can use existing classifiers by ignoring the hierarchical structure, treating the topics as separate classes. Unfortunately, in the context of text categorization, we are faced with a large number of classes and a huge number of relevant features needed to distinguish between them. Consequently, we are restricted to using only very simple classifiers, both because of computational cost and the tendency of complex models to overfit. We propose an approach that utilizes the hierarchical topic structure to decompose the classification task into a set of simpler problems, one at each node in the classification tree. As we show, each of these smaller problems can be solved accurately by focusing only on a very small set of features, those relevant to the task at hand. This set of relevant features varies widely throughout the hierarchy, so that, while the overall relevant feature set may be large, each classifier only examines a small subset. The use of reduced feature sets allows us to utilize more complex (probabilistic) models, without encountering the computational and robustness difficulties described above.
Notes Previous number = SIDL-WP-1997-0059
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Plain text (text, text.gz, text.zip)
  • Management of the document bypubs@db.stanford.edu

    Pagewise preview ]


    Stanford InfoLab Publication Server