| Available via | http://dbpubs.stanford.edu/pub/2007-30 |
|
Submitted on |
23rd of October 2007 |
|
Author |
Koutrika, Georgia; Effendi Frans; Gyongyi, Zoltan; Heymann, Paul; Garcia-Molina, Hector |
|
Title |
Combating Spam in Tagging Systems: An Evaluation |
|
Date of publication |
2007 |
|
Citation |
Koutrika, Georgia; Effendi Frans; Gyongyi, Zoltan; Heymann, Paul; Garcia-Molina, Hector. Combating Spam in Tagging Systems: An Evaluation |
|
Number of pages |
35 |
|
Language |
English |
|
Project |
Stanford InfoLab |
|
Type |
Other |
|
Subject group |
Miscellaneous |
|
Abstract |
Tagging systems allow users to interactively annotate a pool of shared resources using descriptive strings, which are called tags. Tags are used to guide users to interesting resources and help them build communities that share their expertise and resources. As tagging systems are gaining
in popularity, they become more susceptible to tag spam: misleading tags that are generated in order to increase the visibility of some resources or simply to confuse users. Our goal is to understand this problem better. In particular, we are interested in answers to questions such as:
How many malicious users can a tagging system tolerate before results significantly degrade? What types of tagging systems are more vulnerable to malicious attacks? What would be the effort and the impact of employing a trusted moderator to find bad postings? Can a system
automatically protect itself from spam, for instance, by exploiting user tag patterns? In a quest
for answers to these questions, we introduce a framework for modeling tagging systems and user
tagging behavior. We also describe a method for ranking documents matching a tag based on
taggersý reliability. Using our framework, we study the behavior of existing approaches under malicious attacks and the impact of a moderator and our ranking method. We use two complementary techniques to generate scenarios:
(a) Data Driven. We use a real data set of documents and tags, and inject spam tags based on a bad user model.
(b) Synthetic. We generate documents and their tags based on data distributions, and then again inject spam tags. |
|
Keywords |
tagging systems, tag spam, evaluation |
|
Contact address |
koutrika@stanford.edu |
| Fulltext source |
PDF (pdf, pdf.gz, pdf.zip)
| Management of the document by | siroker@db.stanford.edu
| |