Talk: Duplicate Detection and Elimination in  XML Databases

6 dezembro 2011, 14:45 Helena Galhardas

Na próxima 6ªf, 9/12, 16H, sala 1.4 Tagus Park, terá lugar uma apresentação (defesa de CAT/Doutoramento) cujo tópico é bastante relevante nesta cadeira.

Estão todos convidados a assistir!

Duplicate Detection and Elimination in XML Databases

Luís Leitão, PhD student, DEI@IST and DMIR@INESC-ID

We present a proposal to address the problem of duplicate detection and

elimination in XML databases. We propose a Bayesian Network model for duplicate de-

tection and describe the problems that need to be considered to further improve it in terms

of effectiveness and efficiency. To this effect, we present strategies both to accelerate the

process and to improve its final outcome. Some of the solutions here proposed have already

been implemented and tested with highly positive results. The remaining have either been

submitted to preliminary tests, in order to obtain some feedback about their viability, or

have simply been edified and still need further development and testing. Results obtained

to this point are promising and leave room to further study.