Planeamento
Aulas Teóricas
Motivation and course overview
Motivation and course overview
Supervised learning
Supervised learning. Regression and classification. Basic concepts: feature vectors, outcomes, training set. Problem formulation. Architecture of a supervised learning system.
Linear regression
Linear model in 2D. Sum of squared errors criterion. Optimization. Normal equations.
Linear regression
Linear regression model (general case). Matrix notation. Sum of squared errors (SSE) criterion. Normal equations in matrix notation. Gauss-Markov theorem. Polynomial model. Radial basis function model.
Regularization
The role of regularization. Ridge regression. Lasso regression. Geometric interpretation. Exercises.
Optimization
Machine learning and optimization. Global and local minima. The gradient method in 1D. The gradient method in the general case. Interpretation using the Taylor series. Choice of the learning step. Acceleration techniques (momentum term, Nesterov method, adaptive gains). Newton method. Interpretation using the Taylor series. Comparison of different techniques in selected examples.
Evaluation and Generalization
Evaluation of classifiers and regressors. Loss function. Risk and empirical risk. Choice of the model complexity. Generalization and over fitting. Training set, validation set and test set. Cross-validation.
Neural networks
Motivation. McCulloch & Pitts model. Binary classification with the McCulloch & Pitts model. Rosemblatt training algorithm. Limitations of Rosemblatt algorithm. The multilayer perceptron. Continuous and differentiable activation functions (logistic function, arctan, linear, RELU).
Neural networks
Multilayer perceptron. Networks with linear activation functions. Network specification. Choice of the number of layers. Cybenko theorem. Choice of the number of units per layer. Choice of the activation functions. Network training as an optimization problem. Gradient method. Training modes (batch, on-line and mini batch).
Neural networks
Training of multilayer perceptron. Backpropagation algorithm. Backpropagation network.
Neural networks
Convolutional neural networks. Image recognition. End-to-end approach. Limitations of multi-layer perceptron. Receptive fields. Shared weights. Alexnet.
Data classification
Key concepts: data classifier, discriminant functions, decision regions, decision boundary. Loss function (binary loss and general case). Risk and empirical risk. Ideal case: known data distribution p(x,y). Data generation. Bayes classifier for the binary loss and for the general loss. Exercise.
Data classification
Data classification with known probability distributions. Bayes classifier. Exercise. Confusion matrix. Probability of error. Exercise.
Linear Classifiers
Linear classifiers. Linear discriminant functions. Class coding. Indicator variables. Linear regression of indicator variables. Probabilistic model for the a posteriori distribution of the classes. Logistic regression.
Linear classifiers
Multi-class classification. Softmax model. Training. Linear discriminan analysis (LDA). Multivariate normal distribution. Decision boundaries.
Support Vector Machines
Classification of linearly separable data. Choice of the decision hyperplane. Hyperplane margin. Support vectors. Constraints on the support vectors and training data.. Estimation of the decision boundary. Optimization problem.
Support Vector Machines
Estimation of the maximum-margin hyperplane. Optimization problem with constraints. Lagrangian function. Dual problem.Classifier discriminant function.
Support Vector Machines
Classification of non separable data. Soft margin and slack variables. Modified optimization problem.
Support vector machines with curved decision surfaces. Non-linear transformation of the input data into a high dimension feature space. Estimation of the classifier.The naive approach. Computation of inner products of transformed vectors in low dimension space. Estimation of the classifier using kernels (the kernel trick).
Decision trees
Decision trees. Types of nodes. Classification of data using a decision tree. Node impurity. Examples.
Decision trees
Training of a decision tree. Objective function: tree impurity. Recursive optimization based on the node impurities. Examples.
Decision trees
Overfitting. Early stop. Tree prunning using a validation set. Random forests.
Unsupervised learning
Unsupervised learning problems. Clustering of data using centroids. K-means algorithm.
Unsupervised learning and exercises
Hierarchical clustering. Distance between clusters. Proximity matrix. Single link algorithm. Exercises.
Exercises
Exercises about taught subjects.