Artigos e Projectos

Information Management and Retrieval

(2007/2008)

Name

Paper

Project

48372

Ricardo Cruz

A.19

B.14

49616

Bruno José Saraiva Barreiros

A.4

B.5

50000

Humberto Miguel Guerreiro da Glória

A.2

B.5

50991

Francisco Miguel Falcão dos Reis

A.5

B.2

51110

Vanda Sofia Torres Ribeiro

A.22

B.16

52304

Diogo José da Fonseca Simões

???

???

52308

Alexandre Santos Frazão

A.7

B.1

52316

Hugo Jorge de Bento Alberto

A.6

B.6

52317

Edgar Ferreira de Oliveira

A.3

B.1

52387

Tiago Alexandrino Janela

A.20

B.4

52411

Miguel da Silva Ferreira Neiva Vieira

A.12

B.6

52412

Marcelo Serrano Ferreira

A.17

B.12

52421

Miguel Jorge de Brito Vilhena

A.11

B.4

52473

João Miguel Martins Gonçalves

A.8

B.11

52475

Tiago Jorge Cabrita dos Santos

A.9

B.11

53808

Daniel Enrique Zacarias Silva

A.10

B.7

53823

Bruno José Gonçalves Oliveira

A.14

B.9

53829

Daniel António Quaresma Costa

A.1

B.7

53831

André Moura Garcia Pereira

A.21

B.13

53837

Diogo Bagulho Galvão

A.18

B.13

53939

Ricardo Daniel Figueiredo Freire

A.16

B.9

A. Papers (individual works)

A.1. MPEG-21 (Daniel Costa - 53829)

  • To raise the actual status of definition and usage of MPEG-21. This work must identify and point also other standards that cover the same or similar requirements.

A.2. MPEG-7 (Humberto Glória - 5000)

  • To raise the actual status of definition and usage of MPEG-7. This work must identify and point also other standards that cover the same or similar requirements.

A.3. Search Engines (Edgar Oliveira - 52317)

  • To identify multiple relevant cases of search engines, structuring them according to its common characteristics (features, targeted problem/area, etc.).

A.4. Network Access (Bruno Barreiros)

  • To raise the actual status of usage of identifiers and resolution services to access to information. Focus on the Handle System, DOI, OpenURL, etc.

A.5. Syndication (Francisco Falcão Reis - 50991)

  • To raise the actual status of usage and propose a comparative analysis of representative services based in syndication protocols (RSS, ATOM, …). A special focus on services of news aggregators is recommended.

A.6. Ranking de páginas web (Hugo Alberto - 52316)

  • Estado da arte dos rankings das páginas web: A ideia é tentar saber quais as di ferentes abordagens que estão a ser estudadas nessa área. Nas aulas foram dados exemplos de análise de links, aprendizagem automática, relevance feedback, logs (possivelmente com localização geográfica do acesso), etc.

A.7. XML retrieval (Alexandre Frazão - 52308)

  • To raise the actual status of definition and usage of XML retrieval tools. Use the INEX proceedings as a main reference

A.8. Image retrieval (João Gonçalves - 52473)

  • To raise the actual status of definition and usage of image retrieval techniques and tools. Include content based tools and metadata based tools (Flickr, Google Images, etc.).

A.9. Music retrieval (Tiago Santos - 52475)

  • To raise the actual status of definition and usage of music retrieval techniques and tools.

A.10. Calendar Information (Daniel Silva - 53808)

  • To raise the actual status of definition and usage of data structures and protocols for calendars to be used in shared environments.

A.11. Electronic Resource Management Systems (Miguel Vilhena - 52421)

  • To raise the actual status of definition and usage of Electronic Resource Management Systems (ERMS). An ERMS is a system intended to manage the descriptions and support administrative metadata of digital resources in libraries and organisations in general. The term is common, but the definition is not consensual, so the work must focus also in the identification and analysis of the possible definitions. Note: this is not the same as "Enterprise Resource Management System".

A.12. OAI-PMH (Miguel Vieira)

  • To raise the actual status of usage of OAI-PMH (identify related open-source tools, services and projects, in two perspectives: common cases and innovative or unusual but relevant cases). Paper related with project B.7 .

A.13. Z39.50 and SRU

  • To raise the actual status of definition and usage of the Z39.50 and SRU protocols (identify related open-source tools, services and projects, in two perspectives: common cases and innovative or unusual but relevant cases). Paper related with project B.8.

A.14. Names in metadata (Bruno Oliveira - 53823)

  • State of the art in techniques to detect the occurrences of the name of a same person or organisation, usually as authors, in multiple metadata records. Paper related with project B.9.

A.15. The same citation

  • State of the art in techniques to detect multiple occurrences of the same bibliographic citation of scientific works found in a set of citations, considering the most common metadata attributes (authors, title, date and place of publication, etc.). Paper related with project B.10.

A.16. Meta-search engines (Ricardo Freire - 53939)

  • To raise the actual status on meta-search engines (existing services and their main characteristics). Paper related with project B.11.

A.17. Clustering of web results (Marcelo Ferreira - 52412)

  • To raise the actual state of the art in clustering techniques for web results. Paper related with project B.12.

A.18. Relevance feedback (Diogo Galvão)

  • To raise the actual state of the art in relevance feedback techniques. Paper related with project B.13.

A.19. Document Server (Ricardo Cruz - 48372)

  • To raise the sate of the art of techniques to manage documents in projects and characterisation of existing project management tools according to their support to store, manage, preserve, search and retrieve documents. Paper related with project B.14.

A.20. Evaluating Web Search Engines (Tiago Janela - 52387)

  • To raise the actual state of the art in techniques for evaluation of web search engines. This paper can be relevant for project B.15.

A.21. Geoparsers (André Pereira - 53831)

  • To raise the actual state of the art of definition and usage of techniques for geoparsers. The work must identify also cases of usage of geoparsers, including services available in the Web.

A.22. Suggesting tags (Vanda Ribeiro - 51110)

  • To raise the state of the art in the characterisation of solutions and systems the suggesting of tags

B. Projects (individual or groups of two students)

B.1. Search Trends (Edgar Oliveira - 52317 / Alexande Frazão - 52308)

  • Project: The basic version of this project will comprise the development of a solution to process logs of searching services, detect trends, and store those trends in structured XML files. An advanced version will be to publish the results in indexes, similar to Google Trends ( http://www.google.com/trends ). The logs from PORBASE will be available.

B.2. Metadata Converter Service (Francisco Falcão Reis - 50991)

  • Project: To develop a service for bidirectional bibliographic metadata translations (in practical terms, to contribute to a new version of the service http://urn.porbase.org, enriching it with more formats and adding also a service that can take uploaded records in any of the supported formats and convert it to any other format). NOTE: Support provided by Nuno Freire (sample records, existing code in Java and practical advising).

B.3. Alphabetic Indexes

  • Develop a solution to create alphabetic browsing indexes from descriptive metadata (taking at least Dublin Core as a reference format). Work can use the JSP Tag Library Alphabetical Navigation Bar ( http://www.aikiinc.com/alphanavbar/ ) with Java APIs to process XML and generate the Web interfaces (dynamic browsing indexes, where each entry links to the full metadata record). Other references can be explored, such as http://simile.mit.edu/wiki/Longwell

B.4. Timelines (Miguel Vilhena - 52421 / Tiago Janela - 52387)

  • Develop a solution to create time browsing indexes from descriptive metadata (taking at least Dublin Core as a reference format). Work can use the Javascript SMILE TimeLine ( http://simile.mit.edu/timeline/ ) with Java APIs to process XML and generate the Web interfaces (timelines, where each reference links to a full metadata record). The reference metadata will be sets of record describing old maps (metadata and thumbnails of the maps will bee available)

B.5. SmarterPhone (Humberto Glória - 50000 / Bruni Barreiros)

  • A minha tese de mestrado (orientada pelo prof. Daniel Gonçalves) intitula-se de SmarterPhone (http://cgm.dei.ist.utl.pt/propostas/mestrados0708/smarterphone/). O objectivo consiste em tornar um telemóvel/pda/smartphone mais inteligente. Assim, pretende-se recolher informação sensível ao contexto do aparelho para que este possa à posteriori tomar acções inteligentes sem a interacção directa do utilizador. Pretende-se ainda recolher informação não apenas do telemóvel mas também do computador pessoal do utilizador de forma a aumentar a fonte de informação para uma base sólida para as acções inteligentes. Uma das características vai ser a recuperação de documentos. Para a parte aplicativa no computador já existem plataformas nas quais me vou basear, tal como a base de dados Scroll (desenvolvida no INESC-ID). No entanto não existe nada para telemóvel em si, por isso o projecto proposto consistirá, na sua versão básica, no desenho e desenvolvimento de uma solução de indexação para as mensagens SMS e MMS contidas no telemóvel (à partida com o algoritmo TFxIDF). A versão avançada do projecto consistirá num desenvolvimento de uma solução de ranking para as mesmas mensagens.

B.6. Document indexing (Miguel Vieira - 52411 / Hugo Alberto - 52316)

  • Implementation of a generic document sort-based indexer, using the algorithms presented in the book "Managing Gigabytes". Compression and in-place algorithms are optional. The implementation should be done under the IR-BASE project framework: http://www.bcs.org/server.php?show=ConWebDoc.8762

B.7. OAI-PMH (Daniel Costa - 53829 / Daniel Silva 53808)

  • Develop a service to "watchdog" the quality of service of OAI-PMH servers. This is important, for example, in scenarios where service providers harvest the metadata from the data providers, and use that metadata only to build indexes, without keeping copy of the records. In these scenarios, if for example latter on a user of a service wants to see the full record, the service provider might want to request just that record. Because this is a real time scenario, to be able to predict the behaviour of a data provider is a very important issue.

B.8. Z39.50 and SRU

  • Harvest Z39.50 and SRU: To develop a solution to harvest bibliographic databases which are only available by Z39.50 and SRU. A simple version should focus local cases when the harvester and Z39.50 or SRU server in the same network and only one server is targeted. A more advanced version can focus cases when the harvester targets multiple servers in the Internet. Support provided by Nuno Freire (existing code in Java and practical advising).

B.9. Names in metadata (Bruno Oliveira - 53823 / Ricardo Freire 53939)

  • Implementation of techniques to detect the occurrences of the name of a same person or organization, usually as authors, in multiple metadata records. Benchmark study with data from PORBASE.

B.10. The same citation

  • Implementation of techniques to detect the occurrences of the same bibliographic citation in a set of citations. Benchmark study with data from FENIX and INESC-ID. NOTE: in case of success, the results of this project will be integrated in the FENIX system.

B.11. Meta-search engines (João Gonçalves - 52473 / Tiago Santos - 52475)

  • Implementation of a meta-search engine. A meta-search engine is a search engine that, given a user query, uses other existing search-engines to obtain the results. It then combines the multiple lists of results retrieved by the search engines into a single ranking. A description of a meta-search engine can be found in the book "Modern Information Retrieval", chap. 13.6. The implementation should be done under the IR-BASE project framework: http://www.bcs.org/server.php?show=ConWebDoc.8762

B.12. Clustering of web results (Marcelo Ferreira - 52412)

  • Project: Implement a system that, given a query, uses an existing search engine to obtain the results and clusters the Web pages retrieved according to topic. The result should be something similar to what is shown in http://clusty.com/ . Already existing clustering applications can be used, such as Cluto ( http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview ). The implementation should be done under the IR-BASE project framework: http://www.bcs.org/server.php?show=ConWebDoc.8762

B.13. Relevance feedback (André Pereira / Diogo Galvão)

  • Project: Implement a system that, given a query, uses an existing search engine to obtain the results. The results are then presented to the user, who can select which ones are relevant. The system will then use the user's selection to improve the initial query and re-submit the results. Algorithms to be used are proposed in http://inex.is.informatik.uni-duisburg.de:2004/pdf/ker_ruthven_lalmas.pdf . The implementation should be done under the IR-BASE project framework: http://www.bcs.org/server.php?show=ConWebDoc.8762

B.14. Document Server (Ricardo Cruz)

  • Project: Develop a document server for projects considering a generic project management technique where results are associated to milestones and the work is split in workpackages which can be also successively split in sub-workpackages and in the end in tasks. Issues such as descriptive metadata, document versioning and status (draft documents, approved documents, etc.) must be considered. A fundamental feature of the system must be a searching service that searches in the content of the documents and presents the references to the documents according the project structure, the document status, and all the relevant properties. It can be used an open-source wiki software, especially to support the access control and the management user interface.

B.15. Evaluating B-ON

  • To develop and carry on a process of evaluation of the b-on. This work must comprise the identification of the relevant metrics, the perform of the evaluation tasks and the analysis of the results.

B.16. Detecting advertising (Vanda Ribeiro)

  • Sugestão de tags para o serviço Digg. O objectivo é melhorar o sistema que um finalista do ano passado elaborou, encontrando uma forma de detectar o que são anúncios numa página e não os considerar para o processo de sugestão de tags, de forma a melhorar as tags sugeridas.

Attachments