Planeamento
Aulas Teóricas
AT1 - Data Science
Data science: context and goals.
Course organization and planning.
AT2 - KDD Process
Basic concepts: data, records, attributes and information.
KDD process: from data to information.
AT3 - Deloitte presentation
Case study presentation by Deloitte.
AT3 - Data profiling
Data profiling: granularity, distribution, dimensionality and sparsity.
AT5 - Modelling
Modelling: mining tasks; supervised, semi-supervised and unsupervised learning.
AT6 - Classification
Classification. Notion of Concept.
AT7 - Classification: analogizers
Analogizers: KNN algorithm and the idea behind Support Vector Machines.
AT8 - Classification: Bayesians
Bayesians: MAP and Naive Bayes. Other Bayesian approaches.
AT9 - Classification: symbolists
Symbolists: decision trees - algorithms, measures.
Overfitting and the pruning.
AT10 - Classification: ensembles - bagging
Ensembles: bagging and random forests.
Feature engineering: selection and generation.
AT11 - Classification: evolutionists
Evolutionists: genetic algorithms - introduction and motivation.
Data preparation: balancing - resampling and SMOTE.
Evaluation: other measures and ROC charts.
AT12 - Classification: connectionists
AT13 - Regression
Regression: linear and logistic.
AT14 - Classification: ensemble - boosting
Boosting and gradient boosting algorithm.
AT15 - Clustering
AT16 - Feature extraction
AT17 - Pattern mining
Pattern Mining. Apriori algorithm and other approaches.
AT18 - Sequential pattern mining
Sequential Pattern Mining.
Discretization methods.
AT19 - Anomaly detection
Anomaly detection: main approaches and applications.
AT20 - Time series
Time series: representations - SAX, discrete Fourier and wavelet transforms.
AT21 - Mining time series
AT22 - Network data analysis
Network data analysis: description and properties.
AT23 - Social and web data analysis
SNA: HITS and PageRank algorithms.
AT24 - Big data
AT26 - Privacy and ethical concerns
Privacy issues, ethical concerns and the GDPR.
Anonymization.
AT27 - Closing remarks
Data science and main challenges.
Aulas Laboratoriais
Enrollment
Group enrolment - no lab session.
Lab 0 (Week 2)
Project schedule and resource allocation - non graded
Lab 1 (Week 3)
Data profiling - dimensionality, distribution and granularity
Lab 2 (Week 4)
Data profiling - sparsity and correlation
Lab 3 (Week 5)
Naive Bayes and KNN
Lab 4 (Week 6)
Decision trees and Overfitting
Lab 5 (Week 7)
Random forests and Feature selection
Lab 6 (Week 8)
Regression, Gradient Boosting and Balancing
Lab 7 (Week 9)
Clustering and Feature extraction
Lab 8 (Week 10)
Pattern Mining
Lab (Week 11)
Project support - non-graded.
Lab 9 (Week 12)
Time series and Forecasting
Lab 10 (Week 13)
Anomaly detection and SNA