Planeamento
Aulas Teóricas
Introduction to data science - CA
Introduction. Data Science, AI and ML. KDD process.
Data description - CA
Data exploration and statistical analysis.
Classification - CA
Classification. Notion of Concept. KNN. Accuracy. Data normalization and distance measures.
Bayesian learning - CA
Classification: MAP and Naive Bayes. Training strategies. Other evaluation metrics and ROC charts.
Data preprocessing - CA
Data balancing: resampling and SMOTE. Missing imputation. Outlier detection.
Decision tree learning - CA
Classification: decision trees - algorithms, measures.
Overfitting - CA
Overfitting. Occam's razor. Pruning.
Ensemble classification - CA
Classification ensembles: random forests + AdaBoost.
Connectionist and evolutionary learning - CA
Classification: other approaches (NN + SVMs + Genetic algorithms).
Pattern mining - CA
Pattern Mining. Apriori algorithm. Evaluation. Discretization methods.
Sequential pattern mining - CA
Pattern Mining: other approaches. Sequential Pattern Mining.
Clustering - RH
Clustering: k-means, EM and hierarchical. Evaluation.
Biclustering - RH
Subspace clustering. High-dimensional data analysis. Evaluation.
Dimensionality reduction - RH
Feature selection. PCA and SVD.
Network data analysis - CA
Network data analysis.
Social and web data analysis - CA
SNA: HITS and PageRank algorithms.
Regression - RH
Multivariate regression analysis.
Time series analysis and forecasting - RH
Time series description. Forecasting.
Time series representations - RH
Time series pre-processing and decomposition (SAX, DFT, wavelets).
Temporal data mining - RH
Pattern analysis, clustering and classification of (multivariate) time series data and event data.
Text and opinion mining - RH
Text and opinion mining. Recommendation systems.
Biomedical data analysis - RH
Biomedical data analysis (Computational biology).
Comprehensive review of data science concepts.
Complex data mining - RH
Analysis of relational and multi-dimensional data (Indexing: LSH, Multidimensional).
Analysis of spatial data.
Distributed and stream data mining - RH
Big data. Distributed data mining. Stream data mining.
Data visualization - CA
Data visualization.
Closing - CA, RH
Closing remarks.
Aulas Laboratoriais
Data description - CA
Data statistical analysis. Process documentation: notebooks.
Classification - CA
Classification: kNN. Data normalization. Training strategy. Evaluation.
Data preprocessing and Bayesian learning - CA
Classification: naive Bayes. Data balance. Outlier detection.
Overfitting and decision tree learning - CA
Classification: decision trees. Overfitting.
Ensemble models - CA
Classification: ensembles. Comparing classifiers' behavior.
Pattern mining - RH
Association rule mining. Sequential pattern mining.
Clustering and biclustering - RH
Clustering. Biclustering.
Project support - CA
Project support.
Dimensionality reduction and regression - RH
Feature selection and PCA. Regression.
Time series data analysis - RH
Time series decomposition (components, DFT, Wavelets). Time series forecasting.
Complex data mining - RH
Temporal, spatial and relational data mining.
Biomedical data analysis - RH
Comprehensive data mining exercises to answer biomedical questions.
Social network analysis - CA
Social network analysis.