Planeamento

Aulas Laboratoriais

L1: Teams registration

Team registration ignoring any previous enrolments after 1st class on 21st November at 20:00 - no lab session.

L2: Project support

Project support: development environment configuration.

Eval 1 L4: Data profiling

Data profiling - dimensionality, distribution, granularity, sparsity and correlation.

L4 - Project support

Data preparation: missing values imputation and variables encoding.

Eval 2 L5: Data preparation

Data preparation - missing values imputation and scaling.
Best approaches identification.

L6: Data balancing

Data balancing.


Eval 3 L7 -  Performance evaluation

KNN, Naive Bayes and Decision trees performance evaluation after balancing study.


L8 - Project support

Feature selection and generation.

Eval 4 L8 - Random forests

Random forests.

Eval 5 L10 - Overfitting

Overfitting study for all classifiers, except Naive Bayes.

L11: MLPs and GB

MLP and Gradient boosting performance. 


Eval 6 L12: Time Series

Time series: profiling and transformation.

Best transformations identification.

Eval 7 L13: Forecasting

Forecasting: basic regressors, ARIMA and LSTMs.

L14: Project support

Wrap-up.

Aulas Teóricas

AT1 - Data Science

Data science: context and goals.
Course organisation and planning.

KDD Process. 
Basic concepts: data, records, variables and information.
Modeling.
Evaluation principles.

AT2 - Classification

Classification: notion of concept.  
Evaluation: measures, training strategies and statistical significance.    
Labeling and type transformation. 


AT3 - Data profiling and Classification: Analogizers

Data profiling: granularity, distribution, dimensionality and sparsity.

Classification: analogizers - KNN algorithm. Data scaling and similarity measures. Missing values imputation.

AT4 - Deloitte presentation + Bayesians

Deloitte case study: presentation,  the difficulty of labelling, data profiling.
Classification: MAP and Naive Bayes.  

ROC charts. Other metrics.

AT5 - Classification: symbolists

Symbolists: decision trees - algorithms, measures and the pruning.  Overfitting. Data balancing. 

AT6 - Classification: ensembles

Ensembles: bagging and boosting.

Random forests.    
Feature Engineering: selection, extraction and generation.

AT7 - Classification: neural networks

Classification: Neural Networks - gradient descent, MLP and backpropagation.  

Brief summary of Deep Learning.
Support vector machines (SVMs).

AT8 - Prediction and Forecasting

Prediction and Forecasting. Evaluation.  Basic predictors: KNN, regression trees, MLP and RF. Gradient boosting. 

Time series: representations and properties.

AT9 - Time series

Time series: representations and properties.
Profiling and transformation.
Measuring similarity among time series - dynamic time warping.

AT10 - Forecasting

Time series forecasting: basic models, regression and LSTMs.

AT11 - Clustering and PCA

Clustering.

Clustering approaches: partitioning, hierarchical, density-based and model-based. 
The k-means algorithm.
Evaluation criteria: cohesion and compactness. 
Feature extraction and PCA. 

AT12 - Anomaly detection and Pattern mining

Anomaly detection: main approaches and applications. LOF algorithm. 
Pattern Mining and Sequential Pattern Mining. The Apriori algorithm.  Evaluation: support, confidence and lift.    

AT13 - Privacy and ethical concerns

Technical challenges.
Ethical concerns, privacy issues and the GDPR. 

Closing remarks.