Planeamento

Aulas Teóricas

AT1 - Data Science

Data science: context and goals.
Data science process.
Course organization and planning.

AT2 - Data Exploration

Basic concepts: data, records, attributes and information.
KDD process: from data to information.

AT3 - Data Visualization

Data description: granularity, distribution, dimensionality and sparsity.
Statistical analysis.
Outliers.

AT4 - Classification

Classification. Notion of Concept. KNN. Accuracy. Data normalization and distance measures.

AT5 - Bayesian learning

Classification: MAP and Naive Bayes.  
Evaluation: training strategies.
Statistical significance.

AT6 - Data Balancing and Evaluation

Data balancing: resampling and SMOTE. 
Evaluation: measures and ROC charts. 
Basics on missing imputation.  

AT7 - Decision tree learning

Classification: decision trees - algorithms, measures.

AT8 - Overfitting

Overfitting and the Occam's razor.
Pruning.

AT9 - Ensemble classification

Classification ensembles: random forests and xGBoost.

AT10 - Connectionist and evolutionary learning

Classification: Neural Networks and Genetic algorithms - introduction and motivation.

AT11 - Pattern mining

Pattern Mining. Apriori algorithm. Evaluation. Discretization methods.

AT12 - Sequential pattern mining

Pattern Mining: other approaches. Sequential Pattern Mining.

AT13 - Clustering

Introduction to clustering data analysis.
Clustering approaches: partitioning, hierarchical, density-based and model-based.

AT14 - High-dimensional data analysis

Clustering measures and evaluation.
High-dimensional data analysis: revisiting overfitting and underfitting risks.

AT15 - Feature selection and extraction

Feature selection.
Introduction to feature extraction.
Principal component analysis (PCA) and linear discriminant analysis (LDA).

AT16 - Biclustering

Biclustering analysis: approaches and applications.
Pattern coherence, quality and statistical significance.

AT17 - Regression

Multiple linear regression.
Lazy and tree-based regression.
Evaluation of regression models.

AT18 - Anomaly detection

Outlier analysis: approaches and applications.

AT19 - Network data analysis

Network data analysis.

AT20 - Social and web data analysis

SNA: HITS and PageRank algorithms.

AT21 - Time series representations

Symbolic time series representations: SAX and codebooks.
Discrete Fourier and Wavelet transforms.

AT22 - Temporal data mining

Mining (multivariate) time series. 
Mining event and interval data. 
Temporal pattern mining, classification and regression.

AT23 - Time series forecasting

Time series decomposition, description and forecasting: classical approaches.

AT24 - Big data

Brief notes on distributed data mining and stream data mining.
Exercises on mining time series data.

AT25 - Privacy and ethical concerns

Privacy issues.
Anonymization.
AutoML and ethical concerns.

AT26 - Closing

Closing remarks.