Sumários
String matching (cont.), schema matching and mapping
5 outubro 2015, 15:30 • Helena Galhardas
String matching algorithms
- token-based: TF/IDF measure
- phonetic-based: soundex
- hybrid methods: overview
String matching - how to improve the scalability - blocking algorithms:
- inverted indexes over strings
- size filtering
- prefix filtering
Creating schema mappings:
- components: schema matching and schema mapping
- challenges
- components of a schema matching system.
Lab 3
5 outubro 2015, 09:30 • Diogo Ribeiro Ferreira
Sorting, grouping, and filtering data with Pentaho. Schema mapping with Global-as-View (GAV).
Lab 2
1 outubro 2015, 16:00 • Diogo Ribeiro Ferreira
Working with CSV data: calculations, number ranges, field selection, and de-duping. Query expressions with Datalog.
String matching
1 outubro 2015, 14:30 • Helena Galhardas
String matching algorithms:
- challenges: accuracy and scalability
- sequence-based: edit distance, Needleman-Wunch measure, Jaro measure, Jaro-winkler measure.
- token-based: overlap measure, Jaccard measure, TF/IDF measure
String matching
1 outubro 2015, 14:30 • Helena Galhardas
String matching algorithms:
- challenges: accuracy and scalability
- sequence-based: edit distance, Needleman-Wunch measure, Jaro measure, Jaro-winkler measure.
- token-based: overlap measure, Jaccard measure, TF/IDF measure