Prova de CAT Daniel Rosa Ramos

8 maio 2024, 15:23 Sandra Espírito Santo


Daniel Rosa Ramos 

Título da Tese :  Towards Automated API Refactoring for Evolving Codebases

Local: Sala 336 do INESC-ID ou via zoom no link de ZOOM:

Data : 10 de MAIO de 2024
Hora : 16H00


Modern software development heavily relies on third-party libraries and frameworks, which yield significant productivity gains. Libraries expose functionality through Application Programming Interfaces (APIs). Although stable API selection is desirable, it is often not possible, as software must adapt to new technical requirements or shifts in stakeholder or market demands. Therefore, as libraries evolve, clients may need to migrate APIs to adapt to these changes. The task of adapting APIs to accommodate non-functional changes is a form of software refactoring, a crucial practice in software engineering. Refactoring entails modifying code to improve its quality and reduce complexity. However, refactoring is typically labor-intensive and prone to errors. The complexity of API refactoring has spurred numerous research efforts towards automating this task. A widely used method for automating API refactoring is to generate match-replace rules by mining vast amounts of data from client projects of the libraries sourced from collaborative coding platforms like GitHub. However, a significant challenge with mining approaches is their limited effectiveness due to reliance on data from clients that have already undergone refactoring, which is often scarce. In this thesis proposal, we explore novel methods for automated API refactoring that do not rely on extensive training data or specific refactoring examples from client projects. In particular, we explore three alternative data sources. First, we use API documentation to discover API mappings, which we use to both generate migration rules and as a heuristic to guide the migration process. Specifically, the API mappings are used as a heuristic to guide a program synthesis approach to migrate client code effectively and reliably. Second, we use the API development process, particularly library pull requests, to learn API migration rules for addressing breaking changes. Our core idea is that if a library changes functionality, its tests, and internal usages will likely change as well, providing a rich data source for generating migration rules. Third, we exploit natural language, as software is enriched with an abundance of natural language data, including commit messages, issue reports, and comments. We use this unstructured data to test equivalence between API usages by synthesizing pairs of code examples in the source and target libraries. Our goal is to then abstract these code examples to generate broadly applicable migration scripts. So far, we have implemented our ideas as proof of concepts in two automated refactoring tools, as well as a language and toolset for expressing API refactorings. Our proof-of-concept tools leverage state-of-the-art program synthesis and machine learning techniques, which are crucial for establishing API mappings, synthesizing migration scripts, and migrating client code directly. We evaluated the two tools on real datasets by migrating client programs found on collaborative coding platforms. Our ongoing research aims to automatically generate training examples from natural language and documentation, which we will then use to generate migration scripts for libraries where migration data is scarce. 

Prova de CAT do Aluno Tiago Luís de Oliveira Brito

8 julho 2021, 14:49 Sandra Espírito Santo

Prova de CAT 

     Candidato Tiago  Luís de Oliveira Brito nº 72647

Thesis Title

Study and Mitigation of Security Vulnerabilities in Server-Side JavaScript Web Applications

Link de Zoom:

Data: 27 de Julho de 2021

Hora:  11h00

Orientador : Professor Doutor Nuno Miguel Carvalho dos Santos

Thesis Abstract

With the emergence of the NodeJS ecosystem, JavaScript has become the most popular programming language for developing web applications. NodeJS is a cross-platform JavaScript engine for running server-side code; it features a package management system named Node Package Manager (npm) which currently stores thousands of third-party packages that web developers can readily import into their code. However, it is difficult to guarantee the absence of security vulnerabilities in server-side JavaScript code. Single packages may happen to export correctly implemented functions (e.g., string manipulation), but be ill-used by other packages or application code. Conversely, the correct code may depend on buggy packages that propagate vulnerabilities up in the dependency chain. Despite the large body of work on client-side JavaScript code vulnerabilities, limited attention has been paid to the server-side. When compared to the browser, compromised JavaScript code has fewer security barriers on NodeJS (e.g., no sandbox), and a larger and more privileged API facilitating access to critical system resources (e.g., file system). By exploiting such code, an attacker may potentially compromise the server and/or affect many users by launching attacks such as SQL injection, or remote code execution attacks. This work aims at studying the extent to which vulnerability detection for JavaScript code can be automated and develop effective vulnerability detection tools that enable web developers to identify potential security flaws in their application code before deployment. First, by conducting an empirical study, we aim to evaluate the existence and effectiveness of existing vulnerability detection tools for JavaScript code. This empirical study will allow us to survey the state-of-the-art techniques, analyze their weaknesses, and identify the general challenges to automated vulnerability detection. Then, we aim at developing new code analysis techniques to tackle the challenges identified from the empirical study, thus improving current vulnerability detection capabilities. We will validate these improvements by enhancing existing tools (or building our own) and testing them in the wild against real-world vulnerable JavaScript code.

Provas de CAT do aluno Miguel Serras Vasco

30 junho 2021, 09:52 Sandra Espírito Santo

Prova de CAT

                                  Candidato: Miguel Serras Vasco nº 70413

Titulo da Tese: "Multimodal Representation Learning for Agent Perception and Action"

Link de Zoom:

Data: 16 de Julho de 2021

Hora: 10h00 às 11h30

Orientadora: Professor Doutora Ana Maria Severino de Almeida e Paiva

Co-orientador: Professor Doutor Francisco António Chaves Saraiva de Melo

Thesis Abstrat: 

In this thesis, we aim at endowing agents with mechanisms to learn multimodal representations from sensory data and to allow them to execute tasks considering different subsets of available perceptions. We address the learning of these representations from supervised and unsupervised learning frameworks and how to leverage such representations for reinforcement learning scenarios under changing perceptual conditions. In the context of supervised multimodal representation learning, we contribute with a novel action representation and learning algorithm that allows agents to consider contextual information provided by action demonstrations, allowing for sample-efficient recognition of human actions. Moreover, in the context of unsupervised multimodal representation learning, we explore the cross-modality inference problem -- the ability to infer missing perceptual data from available perceptions -- and contribute with a novel hierarchical multimodal generative model that addresses the requirements of computational cross-modality inference. Furthermore, we introduce cross-modality policy transfer in reinforcement learning, where an agent must learn and exploit policies over different subsets of input modalities and instantiate such problem in the context of Atari games. In future work, we propose to address the effect of the nature of perceptual information provided to the agent, in order to provide robustness to deteriorating perceptual conditions and compromised sensor attacks.

Provas de Doutoramento do aluno Alexandre Duarte de Almeida Lemos

25 junho 2021, 12:13 Sandra Espírito Santo


       Candidato:  Alexandre Duarte de Almeida Lemos

Título da Tese: "Solving Scheduling Problems under Disruptions"

Local da Prova:

Data :08/07/2021
Hora: 14:30H

Orientadora : Professora Doutora Maria Inês Camarate de Campos Lynce de Faria
Co-orientador: Professor Doutor Pedro Tiago Gonçalves Monteiro

Thesis Abstrat:
Scheduling problems are common in many applications that range from factories and transports to universities. Most times, these problems are optimization problems for which we want to find the best way to manage scarce resources. Reality is dynamic, and thus unexpected disruptions can make the original solution invalid. There are many methods to deal with disruptions well described in the literature. These methods can be divided into two main approaches: (i) create robust solutions for the most common disruptions, and (ii) solve the problem again from scratch extended with new constraints. The goal of creating robust solutions is to ensure their validity even after the most common disruptions occur. For this reason, it requires a detailed study of the most likely disruptive scenarios. The main disadvantage of creating a robust solution is a possible reduction in the overall quality (\emph{e.g.}, financial cost, customer satisfaction) to support the most likely disruptive scenarios that may never occur. Regardless of the robustness of the solution, we may need to solve the problem again. Most of the methods developed to recover solutions after disruptions occur consist of re-solving the problem from scratch with an additional cost function. This cost function ensures that the new solution is close to the original. In other words, the methods solve the Minimal Perturbation Problem (MPP). However, all these methods require more execution time than the original problem to find a new solution. This can be explained by the fact that they repeat the search process. Moreover, they use generic cost functions (\emph{e.g.}, Hamming distance) that may have little significance in practice. In this work, we propose novel algorithms to solve the MPP applied to two domains: university course timetabling and train scheduling. We tested our algorithms to solve university timetabling problems with data sets obtained from Instituto Superior Técnico and the 2019 International Timetabling Competition. One of these algorithms was ranked in the top 5 of the competition. When considering the train scheduling case study, we tested our algorithms with data from the Swiss Federal Railways and from PESPLib. The evaluation shows that the new algorithms are more efficient than the ones described in the literature. Summing up, the proposed algorithms show a significant improvement on the state-of-the-art to re-solve scheduling problems under disruptions.

Provas de CAT do aluno Guilherme Sant'Anna Varela

24 junho 2021, 11:01 Sandra Espírito Santo


Coordination Mechanisms for Large Scale Reinforcement Learning based Adaptive Traffic Signal Control

             Guilherme Sant'Anna Varela 

Provas no dia 30 de junho de 2021, às 10H00

Link de Zoom:

Orientador : Professor Doutor José Alberto Rodrigues Pereira Sardinha
Co-Orientador: Professor Doutor Francisco António Chaves Saraiva de Melo

Thesis Abstrat: Adaptive traffic signal control (ATSC) is at the core of intelligent transportation systems. Properly calibrated signal plans can alleviate bottlenecks preventing mounting congestion, while dysfunctional ones waste valuable public and private resources. Reinforcement Learning (RL) based controllers excel in efficiency and cost savings; they react online and are able to adapt those reactions as more data becomes available. In spite of the advantages, incumbent systems control hundreds of intersections while RL-based systems face challenges to scale up. The limitation is known as the curse of dimensionality and its intrinsic from the theoretical framework that underpins RL-based controllers; Markov decision processes. The explosion from the state space renders single agent RL-ATSC systems unfeasible at large scales. Such state of affairs requires that the computation be distributed across network through a multi-agent reinforcement learning system, furthermore the collective must learn fast. While function approximation is the established strategy to speed-up learning rates at the (intra-)agent level, unless they coordinate, the system won't benefit from learning similar patterns appearing at distant nodes of the network. Coordination mechanisms provide the means to learn at inter-agent level or across the network, saving computation time and accelerating learning. This thesis approaches coordination mechanisms from the perspective of generalization, or from the advancement of algorithms with the potential to generate coordination mechanisms. This thesis proceeds on three distinct stages, the first two are independent and are dedicated to the generalization of distinct classes of coordination mechanisms, the third aims at combining results from both. The first class of coordination mechanism found on RL-ATSC are model-based coordination mechanisms, stemming from fields as diverse as game theory and graph theory. We propose that such algorithms, are actually specializations from consensus and sharing problems found in distributed reinforcement learning (DiRL). In particular, alternating direction methods of multipliers (ADMM), is a meta algorithm, or an algorithm for generating algorithms. ADMM-RL has the benefit of providing a unifying framework for coordination mechanisms. It also allows for the partitioning of the global problem into multiple sub problems, each of which can be solved many times faster then the original problem. The second class of coordination mechanism found on RL-ATSC are data driven coordination mechanisms from deep reinforcement learning (DeRL). In that case both communication and joint actions are automatically generated by a neural network architecture. Two in particular have shown promising results to handle problems that scale: Graph convolution neural networks and graph attention mechanisms. We propose that such architectures are sub-cases of graph convolution reinforcement learning. This part of the thesis aims at combining both approaches into a single DeRL structure. Finally, benefiting from the newly gained knowledge the third and final stage of the thesis aims to integrate the advantages from both approaches into a coordination mechanism that can leverage from network wide knowledge and is data efficient. Hence, it can learn fast and scale.