Ver Post

Prova de CAT

21 novembro 2017, 12:14 - Fátima Sampaio


Candidate: Sérgio Ricardo de Oliveira Esteves N.º 54564/D

Title: Techniques for Enhancing the Performance of Data-intensive Management Systems

Date: 27/11/2017

Time: 14h30

Location: Sala 0.20, Pavilhão Informática II, IST, Alameda

Advisors: Professor Luís Manuel Antunes Veiga/ Professor João Nuno De Oliveira e Silva

Abstract: The demand for storing and analyzing vast volumes of data is today on the rise as web-based enterprises introduce innovative and interactive applications, that attract more and more users on a global scale. To cope with such data volumes, data management systems have been evolving to deliver increasingly better performance and efficiency at lower costs in large-scale scenarios. A fundamental property of these systems is data consistency. In storage systems, consistency refers to how accurate, fresh and synchronized is the state of data replicas residing in different machines and locations. Most of these systems, namely NoSQL data stores, sacrifice consistency in favor of availability and performance for cross cluster synchronization; while others, provide strong consistency and sacrifice availability.In data processing systems, namely in dataflow processing, consistency refers to the completeness state of the input that is reflected in the output within a time frame. Most dataflow management systems are strongly consistent by enforcing strict temporal synchronization across processing steps. The main goal of our research is to study, design, implement and evaluate performance optimizations for data-intensive management systems. At the heart of these optimizations resides the tuning of data consistency. In particular, we take into account the semantics of data in order to trade-off consistency for performance in storage and data processing systems. We are able to achieve substantial performance gains, namely in terms of latency, throughput, bandwidth, and resource utilization, while keeping application outputs within acceptable levels, as defined by applications.As contributions we propose: (i) VFC3, a consistency model, equipping a framework for NoSQL data stores, that enables multiple consistency levels over groups of data with different replication urgencies; (ii) Fluχ, a dataflow model, empowering a framework for dataflow managers, that enables deferred triggering of computation stages based on the assessed impact of the input on changing the output; and (iii) WaaS, a scheduling algorithm, inspired by the F luχ model, to allocate machines to dataflow tasks based on time, budget and consistency constraints.