Talk: Duplicate Detection and Elimination in XML Databases
6 dezembro 2011, 14:45 • Helena Galhardas
Na próxima 6ªf, 9/12, 16H, sala 1.4 Tagus Park, terá lugar uma apresentação (defesa de CAT/Doutoramento) cujo tópico é bastante relevante nesta cadeira.
Estão todos convidados a assistir!
Duplicate Detection and Elimination in XML Databases
Luís Leitão, PhD student, DEI@IST and DMIR@INESC-ID
We present a proposal to address the problem of duplicate detection and
elimination in XML databases. We propose a Bayesian Network model for duplicate de-
tection and describe the problems that need to be considered to further improve it in terms
of effectiveness and efficiency. To this effect, we present strategies both to accelerate the
process and to improve its final outcome. Some of the solutions here proposed have already
been implemented and tested with highly positive results. The remaining have either been
submitted to preliminary tests, in order to obtain some feedback about their viability, or
have simply been edified and still need further development and testing. Results obtained
to this point are promising and leave room to further study.