Sumários
There was no class
22 outubro 2018, 15:30 • Helena Galhardas
The instructor went to a conference abroad.
Duplicate detection and elimination with PDI
22 outubro 2018, 15:30 • Diogo Ribeiro Ferreira
Pairwise comparison by join. Pairwise similarity and thresholds. Calculating similarity/distance measures in PDI. Weighted similarity on multiple attributes. Reducing the number of comparisons. Clustering and transitive closure. Merging of duplicates.
Duplicate detection and elimination with PDI and Lab Guide 5
22 outubro 2018, 09:30 • João Pedro Lebre Magalhães Pereira
- How to specify a duplicate detection and elimination process with PDI transformations
- Resolution of Lab Guide 5
String Matching
18 outubro 2018, 15:30 • Diogo Ribeiro Ferreira
The Damerau-Levenshtein distance. Converting a distance measure into a similarity measure. The Needleman-Wunsch measure. The Jaro and Jaro-Winkler measures. The Jaccard measure. Phonetic measures: Soundex and Refined Soundex.
Data matching and fusion
18 outubro 2018, 14:30 • Helena Galhardas
Data matching (detection of approximate duplicates):
- Two challenges: accuracy and efficiency
- Record-oriented matching techniques: rule-based matching
- Scaling-up record-set oriented matching: sorted neighbourhood method
- Measures and data sets
- Types of data conflicts
- Data conflict resolution strategies and functions
- Relational operators and extensions