Sumários

String matching (cont.), schema matching and mapping

5 outubro 2015, 15:30 Helena Galhardas

String matching algorithms

  • token-based:  TF/IDF measure
  • phonetic-based: soundex
  • hybrid methods: overview

String matching - how to improve the scalability - blocking algorithms:

  • inverted indexes over strings
  • size filtering
  • prefix filtering

Creating schema mappings:

  • components: schema matching and schema mapping
  • challenges
  • components of a schema matching system.


Lab 3

5 outubro 2015, 09:30 Diogo Ribeiro Ferreira

Sorting, grouping, and filtering data with Pentaho. Schema mapping with Global-as-View (GAV).


Lab 2

1 outubro 2015, 16:00 Diogo Ribeiro Ferreira

Working with CSV data: calculations, number ranges, field selection, and de-duping. Query expressions with Datalog.


String matching

1 outubro 2015, 14:30 Helena Galhardas

String matching algorithms:

  • challenges: accuracy and scalability
  • sequence-based: edit distance, Needleman-Wunch measure, Jaro measure, Jaro-winkler measure.
  • token-based: overlap measure, Jaccard measure, TF/IDF measure


String matching

1 outubro 2015, 14:30 Helena Galhardas

String matching algorithms:

  • challenges: accuracy and scalability
  • sequence-based: edit distance, Needleman-Wunch measure, Jaro measure, Jaro-winkler measure.
  • token-based: overlap measure, Jaccard measure, TF/IDF measure