
Mistakes  MP3

18 novembro 2011, 15:47 Helena Galhardas

There was a mistake in exercise 4.2, concerning the attributes projected by the query.

There were two mistakes in exercise 5.2, one concerning the schema mapping defining the mediator, and a second one concerning the schema of the mediator G.

All of them are already corrected in the version available on-line.

Mini-Project 3

7 novembro 2011, 12:25 Bruno Emanuel Da Graça Martins

The exercises for mini-project 3 have been published online, on the IST Tagus course website.

Invited talk by SAPO Labs: theoretical class, next tuesday, 8/11, 9H30

6 novembro 2011, 19:26 Helena Galhardas

Next tuesday, 8/11, an invited speaker from SAPO Labs will give the following talk:

Semantic APIS's

Luís Sarmento, SAPO Labs

In this talk, we will present a set of APIs that have been developed by SAPO Labs, whose goal is to help the processing of contents in Natural Language. These APIs are open source and they can be used in the construction of Information Extraction and Visualization applications.

These APIs allow the access to three types of   resources: (i) lexical-semantic databases; (ii) basic operations for text processing: and (iii) distilled data coming from on-line journals.

In what concerns the lexical-semantic databases, there are two APIs. SAPO Semantic Lists that publishes lists of words semantically categorized (e.g., lists of occupations). Verbetes supplies information about how important people are mentioned and an historic view of their activities. For example, the Verbetes API allows to know that "Paulo Bento" is the "current national football coacher", but he was the "Sporting coacher", or that "Villas Boas" is an alternative valid name for "Villas-Boas" and that, in the football context, it probably refers to "André Villas-Boas", who is the "Chelsea FC coacher".

Concerning the API of basic operations for text processing, we will speak about the API for processing user-generated contents (e.g., text containing on-line comments, or Twitter messages). It supports the execution of low-level tasks such as the delimitation of words, the identification of "smileys", and the normalization of vocabulary. We will also present the API for identifying entity names that, currently, allows to anotate person names in text, as well as other elements (e.g., occupation).

Finally, we will present the APIs that produce distilled data coming from on-line journals. We will talk about the API SAPO News Trends that supplies information about which topics and important people are the "hottest" in current on-line journals, as well as their history in the last years, We will also present the API Sapo Voxx that allows the access to citations to different important people that were published in on-line journals and also permits to search the corresponding historic.

All these APIS are available as Web Services or through SW modules that can be installed locally. We will finish our talk by giving examples of their use.



Document token-list.xml updated for Mini-Project 2

3 novembro 2011, 12:50 Helena Galhardas

The XML document  token-list.xml, which is used in the last exercise from mini-project 2, was updated on the course webpage.

The file that was listed previously on the course website only contained a very small number of examples for the classification of word tokens, and students should use the new   token-list.xml document, which nonetheless has exactly the same format.

Office Hours (B. Martins) and Labs: week 31/10 - 4/11

28 outubro 2011, 22:29 Helena Galhardas

During the week 31/10-4/11, Prof. Bruno Martins will not be at IST, because he's going to a conference. For this reason, he will not be available in his Office Hours.

Labs in this week will be taught by Prof. Pável Calado and will serve exclusively to answer questions about the 2nd mini-project.