Planeamento

Aulas Teóricas

AT1 - Data Science

Data science: context and goals.
Course organization and planning.

AT2 - KDD Process

Basic concepts: data, records, attributes and information.
KDD process: from data to information.

AT3 - Deloitte presentation

Case study presentation by Deloitte.

AT3 - Data profiling

Data profiling: granularity, distribution, dimensionality and sparsity.

AT5 - Modelling

Modelling: mining tasks; supervised, semi-supervised and unsupervised learning.

Evaluation principles and the Occam's razor.

AT6 - Classification

Classification. Notion of Concept. 

Evaluation: measures, training strategies and statistical significance.

AT7 - Classification: analogizers

Analogizers: KNN algorithm and the idea behind Support Vector Machines.

Data preparation: scaling, distance measures and dummification 

AT8 - Classification: Bayesians

Bayesians: MAP and Naive Bayes. Other Bayesian approaches.

Data. preparation: missing value imputation. 

AT9 - Classification: symbolists

Symbolists: decision trees - algorithms, measures.  
Overfitting and the pruning.

AT10 - Classification: ensembles - bagging

Ensembles: bagging and random forests. 
Feature engineering: selection and generation. 

AT11 - Classification: evolutionists

Evolutionists: genetic algorithms - introduction and motivation. 
Data preparation: balancing - resampling and SMOTE.   
Evaluation: other measures and ROC charts.   

AT12 - Classification: connectionists

Neural networks: main idea and gradient descent algorithm.
Deep learning - a brief summary.

AT13 - Regression

Regression: linear and logistic.

AT14 - Classification: ensemble - boosting

Boosting and gradient boosting algorithm.

AT15 - Clustering

Introduction to clustering data analysis.
Clustering approaches: partitioning, hierarchical, density-based and model-based.
Evaluation criteria: cohesion and compactness.

AT16 - Feature extraction

Feature extraction: Principal component analysis (PCA) and linear discriminant analysis (LDA). 

AT17 - Pattern mining

Pattern Mining. Apriori algorithm and other approaches. 

Evaluation: support, confidence and lift.  

AT18 - Sequential pattern mining

Sequential Pattern Mining. 
Discretization methods.  

AT19 - Anomaly detection

Anomaly detection: main approaches and applications. 

AT20 - Time series

Time series: representations - SAX, discrete Fourier and wavelet transforms.

Measuring similarity among time series - dynamic time warping.

AT21 - Mining time series

Forecasting: time series decomposition, description and methods.
Other mining tasks: the matrix profile based approaches.

AT22 - Network data analysis

Network data analysis: description and properties.

AT23 - Social and web data analysis

SNA: HITS and PageRank algorithms. 

AT24 - Big data

Big data: challenges.
Data streams and large scale data mining. 
Parallelisation.

AT26 - Privacy and ethical concerns

Privacy issues, ethical concerns and the GDPR.
Anonymization.

AT27 - Closing remarks

Data science and main challenges.

Closing remarks. 

Aulas Laboratoriais

Enrollment

Group enrolment - no lab session.

Lab 0 (Week 2)

Project schedule and resource allocation -  non graded

Lab 1 (Week 3)

Data profiling - dimensionality, distribution and granularity

Lab 2 (Week 4)

Data profiling - sparsity and correlation

Lab 3 (Week 5)

Naive Bayes and KNN

Lab 4 (Week 6)

Decision trees and Overfitting

Lab 5 (Week 7)

Random forests and Feature selection

Lab 6 (Week 8)

Regression, Gradient Boosting and Balancing

Lab 7 (Week 9)

Clustering and Feature extraction

Lab 8 (Week 10)

Pattern Mining

Lab (Week 11)

Project support - non-graded.

Lab 9 (Week 12)

Time series and Forecasting

Lab 10 (Week 13)

Anomaly detection and SNA