Página Inicial


Talk by Prof. Dennis Shasha_Courant Institute of New York University

20 Junho 2017, 15:34 - Maria Lucilia Gonçalves Abreu

Title: VersionClimber: an algorithm and system for package evolution in data science

Speaker: Prof. Dennis Shasha_Courant Institute of New York University

When: 30 June 2017


Where: Anfiteatro VA1_Pav. de Civil_Instituto Superior Técnico


Imagine you are a data scientist (as many of us are/have become). Systems you build typically require many data sources and many packages (machine learning/data mining, data management, and visualization) to run. Your working configuration will consist of a set of packages each at a particular version.You want to update some packages (software or data) to their most recent versions possible, but you want your system to run after the upgrades, thus perhaps entailing changes to the versions of other packages. One approach is to hope the latest versions of all packages work.If that fails, the fallback is manual trial and error, but that quickly ends in frustration. We advocate a provenance-style approach in which tools like ptrace enable us to identify version combinations of different packages. Then version control systems like pip, and github and VirtualEnv enable us to fetch particular versions of packages and try them in a sandbox-like environment. Because the space of versions to explore grows exponentially with the number of packages, we have developed a memoizing algorithm that avoids exponential search while still finding an optimum version combination. Heuristics combined with certain empirical facts about packages (e.g. local upward compatibility) improves performance further still. We present experimental results on well known packages used in data science to illustrate the effectiveness of our approach.


Dennis Shasha is a professor of computer science at the Courant Institute of New York University and an Associate Director of NYU Wireless. He works with biologists on pattern discovery for network inference; with computational chemists on algorithms for protein design; with physicists and financial people on algorithms for time series; on clocked computation for DNA computing; and on computational reproducibility. Other areas of interest include database tuning as well as tree and graph matching. Because he likes to type, he has written six books of puzzles about a mathematical detective named Dr. Ecco, a biography about great computer scientists, and a book about the future of computing. He has also written five technical books about database tuning, biological pattern recognition, time series, DNA computing, resampling statistics, and causal inference in molecular networks. He has co-authored over eighty journal papers, seventy conference papers, and twenty-five patents.He has written the puzzle column for various publications including Scientific American, Dr. Dobb's Journal, and the Communications of the ACM. He is a fellow of the ACM and an INRIA International Chair.

Todos os Anúncios


Provas de Dissertação do MISE

23 Junho 2017, 09:38 - Maria Lucilia Gonçalves Abreu

Semana de 26 a 30 de Junho de 2017

Rui Fernando dos Santos Pereira Antunes Fernandes nº 85043

Título da Dissertação: Gestão do conhecimento na área da segurança da informação

Data: 28/06/2017


Local:Sala F8, Pav. Informática III, ALAMEDA

Orientação: Professor José Manuel Costa Dias de Figueiredo


Candidato: Tiago Miguel da Silva Fernandes nº 67091

Título da Dissertação: Valor de Projectos de BI com o Método Delphi


Hora: 11h00

Local:Sala Polivalente 0.17 no Informática II, ALAMEDA

Orientação: Prof. José Manuel Costa Dias de Figueiredo

Todos os eventos.