Seminars RGI (Information Retrieval and Management)

10 dezembro 2010, 20:05 José Borbinha

 

Wednesday, December 15
Program


Time/Place: 10:00 / FA2 (building "Informática I")

  • The Portuguese Web Archive: past, present and future
  • Daniel Gomes (FCCN - Fundação para o Cálculo Científico Nacional)

Time/Place: 11:00 / FA2 (building "Informática I")

  • Component selection in digital preservation: Preservation planning as decision problem
  • Christoph Becker (Technical University of Vienna - Austria / INESC-ID - Information Systems Group)

Time/Place: 14:30 / F4 (building "Informática I")

  • Using model representations to achieve Semantic Interoperability in Information Systems
  • Hugo Manguinhas (IST / INESC-ID - Information Systems Group)



Abstracts


  • Title: The Portuguese Web Archive: past, present and future
  • Author: Daniel Gomes (FCCN)

Abstract: The web was invented to quickly exchange data between scientists but it became a crucial resource across the world. Students from all levels of education use the web as primary source of information. From kindergarten children that play on-line games to learn colours, to graduate students that analyze cutting-edge scientific articles.
However, the web is extremely ephemeral. Most of the information published on it becomes quickly unavailable and is lost forever. 80% of the pages are updated or disappear after 1 year. 50% of the URLs available today will become invalid after 2 months. Even printed scientific publications suffer from the effects of web ephemerality, because they cite on-line resources that became unavailable.
Besides loosing important scientific and historical information, web ephemerality causes that common people lose their memories as individuals. Every second, photos are being taken and published on the web. However, the most elementary preservation concerns, such as creating backup copies, are rarely taken. As consequence, in the future many people will not be able to show portraits of their memories. Broken links also degrade the performance of popular web applications and services such as shared bookmarks, search engines or social networks, leading their users (and customers) to dissatisfaction.
The web needs preservation mechanisms to fight ephemerality. It must be ensured that information besides being accessible world-wide, prevails across time to transmit knowledge for future generations. Web archives are innovative systems that acquire, store and preserve information published on the web.
The Portuguese web Archive project began in 2007 and is undertaken by the Foundation for National Scientific Computing (FCCN). It mainly aims to preserve contents interesting to the Portuguese community. In September 2010, the Portuguese Web Archive held 889 million contents published since 1996. FCCN provides an experimental full-text search over 130 million contents archived between 1996 and 2007 (available at http://www.archive.pt). The distributed computing infra-structure created for the Portuguese Web Archive combined with the stored historical data, is a powerful resource to support web data mining research activities. It already enabled research activities regarding web accessibility to people with disabilities and characterizations of the Portuguese web.


 

  • Title: Component selection in digital preservation: Preservation planning as decision problem
  • Author: Christoph Becker

Abstract: The mission of digital preservation is to overcome the obsolescence threats that digital material is facing on the bitstream, the logical, and the semantic level, and to provide continued, authentic long-term storage and access to digital objects in a usable form for a specific user community. This requires preservation actions to be carried out when the original environment of digital objects is unavailable, to either recreate it (emulation) or transform the objects' representation into a form usable in a new environment (migration). The mission of preservation planning is to ensure authentic future access for a specific set of objects by defining the actions needed to preserve it. The core problem of preservation planning is a domain-specific instance of component selection and can be correspondingly reformulated and modelled. Most approaches to this multi-criteria decision making problem use goal-oriented requirements modelling and focus on the problem of selection. Evaluation of the suitability of components is carried out largely manually and partly relies on subjective judgment. However, in dynamic, distributed environments with high demands for transparent selection processes leading to trustworthy, auditable decisions, subjective judgments and vendor claims are not considered sufficient. Furthermore, continuous monitoring and re-evaluation of components after integration is sometimes needed.
In this talk, I will describe how an evidence-based approach to component evaluation can improve
repeatability and reproducibility of component selection under the following conditions: (1) Functional
homogeneity of candidate components and (2) High number of components and selection problem instances. I will outline tool support for preservation planning and a framework for automated measurements. I will further present a taxonomy of decision criteria for the described scenario and discuss the data collection means needed for each category of criteria. An analysis of real-world case studies reveals the current coverage of automation and points to a number of challenging problems to be solved in the future.
www.ifs.tuwien.ac.at/dp/plato
www.ifs.tuwien.ac.at/~becker


 

  • Title: Using model representations to achieve Semantic Interoperability in Information Systems
  • Author: Hugo Manguinhas (IST / INESC-ID - Information Systems Group)

Abstract: Interoperability in Information Systems is a requirement when there is a need to exchange information between two processes running in the same or in different information systems.
A level of semantic interoperability is thus reached when the information model expected by both processes is shared and the information entities being interchanged are unambiguously defined. This presentation explains how this level of interoperability can be achieved by taking advantage of the rules modeling the information models being interchanged and also the relationships and equivalences that can be found between them.
Also addressing this problem is the use of Metadata Registries, defined as a central location in an organization where all the meta-information is stored and maintained in a controlled environment, so to promote its common understanding within and across organizations.