Planeamento
Aulas Laboratoriais
L1: Teams registration
Team registration ignoring any previous enrolments after 1st class on 21st November at 20:00 - no lab session.
L2: Project support
Project support: development environment configuration.
Eval 1 L4: Data profiling
Data profiling - dimensionality, distribution, granularity, sparsity and correlation.
L4 - Project support
Data preparation: missing values imputation and variables encoding.
Eval 2 L5: Data preparation
Data preparation - missing values imputation and scaling.
Best approaches identification.
L6: Data balancing
Data balancing.
Eval 3 L7 - Performance evaluation
KNN, Naive Bayes and Decision trees performance evaluation after balancing study.
L8 - Project support
Feature selection and generation.
Eval 4 L8 - Random forests
Random forests.
Eval 5 L10 - Overfitting
Overfitting study for all classifiers, except Naive Bayes.
L11: MLPs and GB
MLP and Gradient boosting performance.
Eval 6 L12: Time Series
Time series: profiling and transformation.
Eval 7 L13: Forecasting
Forecasting: basic regressors, ARIMA and LSTMs.
L14: Project support
Wrap-up.
Aulas Teóricas
AT1 - Data Science
Data science: context and goals.
Course organisation and planning.
AT2 - Classification
Classification: notion of concept.
Evaluation: measures, training strategies and statistical significance.
Labeling and type transformation.
AT3 - Data profiling and Classification: Analogizers
Data profiling: granularity, distribution, dimensionality and sparsity.
AT4 - Deloitte presentation + Bayesians
Deloitte case study: presentation, the difficulty of labelling, data profiling.
Classification: MAP and Naive Bayes.
AT5 - Classification: symbolists
Symbolists: decision trees - algorithms, measures and the pruning. Overfitting. Data balancing.
AT6 - Classification: ensembles
Ensembles: bagging and boosting.
AT7 - Classification: neural networks
Classification: Neural Networks - gradient descent, MLP and backpropagation.
AT8 - Prediction and Forecasting
Prediction and Forecasting. Evaluation. Basic predictors: KNN, regression trees, MLP and RF. Gradient boosting.
AT9 - Time series
AT10 - Forecasting
Time series forecasting: basic models, regression and LSTMs.
AT11 - Clustering and PCA
Clustering.
Feature extraction and PCA.
AT12 - Anomaly detection and Pattern mining
AT13 - Privacy and ethical concerns
Technical challenges.
Ethical concerns, privacy issues and the GDPR.