Pagewise preview ]

CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/1999-68
Submitted on 31st of October 2001
Author Brin, Sergey; Page, Lawrence
Title Dynamic Data Mining: Exploring Large Rule Spaces by Sampling.
Date of publication 11th of November 1999
Citation Brin, Sergey; Page, Lawrence. Dynamic Data Mining: Exploring Large Rule Spaces by Sampling.,
Number of pages 21
Language English
Project Digital Libraries
Type Other
Subject group Digital Libraries
Abstract A great challenge for data mining techniques is the huge space of potential rules which can be generated. If there are tens of thousands of items, then potential rules involving three items number in the trillions. Traditional data mining techniques rely on downward-closed measures such as support to prune the space of rules. However, in many applications, such pruning techniques either do not sufficiently reduce the space of rules, or they are overly restrictive. We propose a new solution to this problem, called Dynamic Data Mining (DDM). DDM foregoes the completeness offered by traditional techniques based on downward-closed measures in favor of the ability to drill deep into the space of rules and provide the user with a better view of the structure present in a data set. Instead of a single deterministic run, DDM runs continuously, exploring more and more of the rule space. Instead of using a downward-closed measure such as support to guide its exploration, DDM uses a user-defined measure called weight, which is not restricted to be downward closed. The exploration is guided by a heuristic called the Heavy Edge Property. The system incorporates user feedback by allowing weight to be redefined dynamically. We test the system on a particularlly difficult data set - the word usage in a large subset of the World Wide Web. We find that Dynamic Data Mining is an effective tool for mining such difficult data sets.
Notes Previous number = SIDL-WP-1999-0122
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Plain text (text, text.gz, text.zip)
  • Management of the document bypubs@db.stanford.edu

    Pagewise preview ]


    Stanford InfoLab Publication Server