Planeamento

Aulas Teóricas

Introduction to data science - CA

Introduction. Data Science, AI and ML. KDD process.

Data description - CA

Data exploration and statistical analysis.

Classification - CA

Classification. Notion of Concept. KNN. Accuracy. Data normalization and distance measures.

Bayesian learning - CA

Classification: MAP and Naive Bayes. Training strategies. Other evaluation metrics and ROC charts.

Data preprocessing - CA

Data balancing: resampling and SMOTE. Missing imputation. Outlier detection.

Decision tree learning - CA

Classification: decision trees - algorithms, measures.

Overfitting - CA

Overfitting. Occam's razor. Pruning.

Ensemble classification - CA

Classification ensembles: random forests + AdaBoost.

Connectionist and evolutionary learning - CA

Classification: other approaches (NN + SVMs + Genetic algorithms).

Pattern mining - CA

Pattern Mining. Apriori algorithm. Evaluation. Discretization methods.

Sequential pattern mining - CA

Pattern Mining: other approaches. Sequential Pattern Mining.

Clustering - RH

Clustering: k-means, EM and hierarchical. Evaluation.

Biclustering - RH

Subspace clustering. High-dimensional data analysis. Evaluation.

Dimensionality reduction - RH

Feature selection. PCA and SVD.

Network data analysis - CA

Network data analysis.

Social and web data analysis - CA

SNA: HITS and PageRank algorithms.

Regression - RH

Multivariate regression analysis.

Time series analysis and forecasting - RH

Time series description. Forecasting.

Time series representations - RH

Time series pre-processing and decomposition (SAX, DFT, wavelets).

Temporal data mining - RH

Pattern analysis, clustering and classification of (multivariate) time series data and event data.

Text and opinion mining - RH

Text and opinion mining. Recommendation systems.

Biomedical data analysis - RH

Biomedical data analysis (Computational biology).
Comprehensive review of data science concepts.

Complex data mining - RH

Analysis of relational and multi-dimensional data (Indexing: LSH, Multidimensional).
Analysis of spatial data.

Distributed and stream data mining - RH

Big data. Distributed data mining. Stream data mining.

Data visualization - CA

Data visualization.

Closing - CA, RH

Closing remarks.

Aulas Laboratoriais

Data description - CA

Data statistical analysis. Process documentation: notebooks.

Classification - CA

Classification: kNN. Data normalization. Training strategy. Evaluation.

Data preprocessing and Bayesian learning - CA

Classification: naive Bayes. Data balance. Outlier detection.

Overfitting and decision tree learning - CA

Classification: decision trees. Overfitting.

Ensemble models - CA

Classification: ensembles. Comparing classifiers' behavior.

Pattern mining - RH

Association rule mining. Sequential pattern mining.

Clustering and biclustering - RH

Clustering. Biclustering.

Project support - CA

Project support.

Dimensionality reduction and regression - RH

Feature selection and PCA. Regression.

Time series data analysis - RH

Time series decomposition (components, DFT, Wavelets). Time series forecasting.

Complex data mining - RH

Temporal, spatial and relational data mining.

Biomedical data analysis - RH

Comprehensive data mining exercises to answer biomedical questions.

Social network analysis - CA

Social network analysis.