Pagewise preview ]

CategoryValue
Available viahttp://dbpubs.stanford.edu/pub/2005-35
Submitted on 29th of November 2005
Author Menestrina, David; Benjelloun, Omar; Garcia-Molina, Hector
Title Generic Entity Resolution with Data Confidences
Date of publication 2005
Citation Menestrina, David; Benjelloun, Omar; Garcia-Molina, Hector. Generic Entity Resolution with Data Confidences,
Number of pages 17
Language English
Project Stanford InfoLab
Type Technical Report
Subject group Data Integration and Mediation
Abstract We consider the {\em Entity Resolution} ({\em ER}) problem (also known as deduplication, or merge-purge), in which records determined to represent the same real-world entity are successively located and merged. Our approach to the ER problem is {\em generic}, in the sense that the functions for comparing and merging records are viewed as black-boxes. In this context, managing numerical confidences along with the data makes the ER problem more challenging to define (e.g., how should confidences of merged records be combined?), and more expensive to compute. In this paper, we propose a sound and flexible model for the ER problem with confidences, and propose efficient algorithms to solve it. We validate our algorithms through experiments that show significant performance improvements over naive schemes.
Keywords Entity Resolution, Deduplication, Record Linkage
Fulltext source
  • Postscript (ps, ps.gz, ps.zip)
  • PDF (pdf, pdf.gz, pdf.zip)
  • Plain text (text, text.gz, text.zip)
  • Management of the document bysiroker@db.stanford.edu

    Pagewise preview ]


    Stanford InfoLab Publication Server