FenixEdu™

Ver Post

Prova de CAT

21 novembro 2017, 12:14 - Fátima Sampaio

Candidate: Sérgio Ricardo de Oliveira Esteves N.º 54564/D

Title: Techniques for Enhancing the Performance of Data-intensive Management Systems

Date: 27/11/2017

Time: 14h30

Location: Sala 0.20, Pavilhão Informática II, IST, Alameda

Advisors: Professor Luís Manuel Antunes Veiga/ Professor João Nuno De Oliveira e Silva

Abstract: The demand for storing and analyzing vast volumes of data is today on the rise as web-based enterprises introduce innovative and interactive applications, that attract more and more users on a global scale. To cope with such data volumes, data management systems have been evolving to deliver increasingly better performance and eﬃciency at lower costs in large-scale scenarios. A fundamental property of these systems is data consistency. In storage systems, consistency refers to how accurate, fresh and synchronized is the state of data replicas residing in diﬀerent machines and locations. Most of these systems, namely NoSQL data stores, sacriﬁce consistency in favor of availability and performance for cross cluster synchronization; while others, provide strong consistency and sacriﬁce availability.In data processing systems, namely in dataﬂow processing, consistency refers to the completeness state of the input that is reﬂected in the output within a time frame. Most dataﬂow management systems are strongly consistent by enforcing strict temporal synchronization across processing steps. The main goal of our research is to study, design, implement and evaluate performance optimizations for data-intensive management systems. At the heart of these optimizations resides the tuning of data consistency. In particular, we take into account the semantics of data in order to trade-oﬀ consistency for performance in storage and data processing systems. We are able to achieve substantial performance gains, namely in terms of latency, throughput, bandwidth, and resource utilization, while keeping application outputs within acceptable levels, as deﬁned by applications.As contributions we propose: (i) VFC3, a consistency model, equipping a framework for NoSQL data stores, that enables multiple consistency levels over groups of data with diﬀerent replication urgencies; (ii) Fluχ, a dataﬂow model, empowering a framework for dataﬂow managers, that enables deferred triggering of computation stages based on the assessed impact of the input on changing the output; and (iii) WaaS, a scheduling algorithm, inspired by the F luχ model, to allocate machines to dataﬂow tasks based on time, budget and consistency constraints.

Departamento de Engenharia Informática (Site descontinuado; Novo Site em https://dei.tecnico.ulisboa.pt)

Ver Post

Prova de CAT