title: Web Content Categorization Using Link Information creator: Gyongyi, Zoltan creator: Garcia-Molina, Hector creator: Pedersen, Jan subject: Data Mining subject: Databases and the Web description: Document categorization is one of the foundational problems in (web) information retrieval. Even though web documents are hyperlinked, most proposed classification techniques take little advantage of the link structure and rely primarily on text features, as it is not immediately clear how to make link information intelligible to supervised machine learning algorithms. This paper introduces a link-based approach to classification, which can be used in isolation or in conjunction with text-based classification. Various large-scale experimental results indicate that link-based classification is on par with text-based classification, and the combination of the two offers the best of both worlds. publisher: Stanford date: 2006-06 type: Techreport type: NonPeerReviewed format: application/pdf identifier: http://ilpubs.stanford.edu:8090/782/1/2006-17.pdf identifier: Gyongyi, Zoltan and Garcia-Molina, Hector and Pedersen, Jan (2006) Web Content Categorization Using Link Information. Technical Report. Stanford. relation: http://ilpubs.stanford.edu:8090/782/