Planeamento

Aulas Teóricas

AT1 - Data Science

Data science: context and goals.
Course organization and planning.

KDD ProcessBasic concepts: data, records, variables and information.
Evaluation.

AT2 - Data profiling

Data profiling: granularity, distribution, dimensionality and sparsity.

Visualization principles.
Data preparation: missing values imputation and dummification. 

AT3 - Classification and Analogizers

Modelling: mining tasks; supervised, semi-supervised and unsupervised learning.  

Classification and the notion of concept. 
Analogizers: KNN and SVMs. Data scaling and distance measures.
Evaluation: measures, training strategies and statistical significance.    

AT4 - Deloitte presentation

Deloitte case study: presentation.
The difficulty of labelling. Data profiling.

AT5 - Classification: Bayesians

Bayesians: MAP and Naive Bayes and Bayesian nets.

Data preparation: balancing.
Deloitte case study: comparing naive  Bayes and KNN with and without balancing.

AT6 - Classification: symbolists

Symbolists: decision trees - algorithms, measures and the pruning. 

Feature engineering: selection and generation. 
Deloitte case study: comparing naive  Bayes, KNN and decision trees. Overfitting identifying overfitting and discussion  the possibility of applying feature selection: pros  and cons; discussion of possible new  variables to generate, and false predictors.

AT7 - Classification: ensembles - bagging

Ensembles: bagging and random forests.  

Evaluation: generalization error, bias  and variance.
Case study - comparing accuracy and recall, with simple new variables.

AT8 - Classification: connectionists and boosting

Classification: Neural Networks - Gradient descent. MLP and backpropagation.  Brief summary of Deep Learning.  

Ensembles: Boosting and XGBoost.Deloitte case study: wrapping classification. 
Case study: comparison of KNN, Decision Trees, Random Forests and MultiLayerPerceptrons with new features.
Exercises: data profiling, feature engineering and modelling.  

AT9 - Clustering

Introduction to clustering data analysis.  

Clustering approaches: partitioning, hierarchical, density-based and model-based. The k-means algorithm.
Evaluation criteria: cohesion and compactness. 
Feature extraction and PCA. 
Deloitte case study: clustering and  PCA results.
Exercises: data preparation and clustering.

AT10 - Pattern mining and Anomaly detection

Pattern Mining and Sequential Pattern Mining. The Apriori algorithm.  

Evaluation: support, confidence and lift.   

Anomaly detection: main approaches and applications. LOF algorithm.
Deloitte case  study: pattern mining results and the role of discretization. Exercises. 

AT11 - Time series

Time series: representations - SAX, discrete Fourier and wavelet transforms.
Measuring similarity among time series - dynamic time warping.
Matrix profile.
Deloitte case study: discussion of the possibility of approaching  the case with SNA and Time series techniques.

AT12 - Forecasting

Time series forecasting: regression and LSTMs.
Case study.

AT13 - Social Network Analysis

Social network analysis: properties and description. 

The PageRank and HITS algorithms.
Case study.

AT14 - Privacy and ethical concerns

Technical challenges.
Ethical concerns, privacy issues and the GDPR. 

Closing remarks.  


Aulas Laboratoriais

Teams registration (Week 1)

Team registration on Fénix after 1st class at 20:00 - no lab session.

Project support (Week 1)

Project support. Configuration of the development environment.  Project goals. 

Project schedule and resource allocation -   non graded

Lab 1 (Week 2)

Data profiling - dimensionality, distribution, granularity, sparsity and correlation.

Lab 2 (Week 2)

Data preparation - missing values and outliers imputation, dummification.

Lab 3 (Week 3)

Training strategies.
KNN and scaling.

Lab 4 (Week 3)

Naive Bayes and KNN.

Balancing.

Lab 5 (Week 4)

Decision trees.

Overfitting identification over the different models.

Lab 6 (Week 4)

Random forests.

Feature selection: identification of its impact over the different models.

Lab 7 (Week 5)

MLP and Gradient Boosting.

Lab 8 (Week 5)

Clustering and Feature extraction

Lab 9 (Week 6)

Pattern Mining and anomaly detection.

Lab 10 (Week 6)

Time series analysis and Matrix Profile.

Lab 11 (Week 7)

Forecasting.

Project support (Week 7)

Project support.