Planeamento

Aulas Teóricas

Motivation and course overview

Motivation and course overview

Supervised learning

Supervised learning. Regression and classification. Basic concepts: feature vectors, outcomes, training set. Problem formulation. Architecture of a supervised learning system. 


Datasets. Iris data set. Imagenet Dataset.

Nearest neighbor and k nearest neighbor methods for classification and regression.

Linear regression

Linear model in 2D. Sum of squared errors criterion. Optimization. Normal equations.

Linear regression

Linear regression model (general case). Matrix notation. Sum of squared errors (SSE) criterion. Normal equations in matrix notation. Gauss-Markov theorem. Polynomial model. Radial basis function model.

Regularization

The role of regularization. Ridge regression. Lasso regression. Geometric interpretation. Exercises. 

Optimization

Machine learning and optimization. Global and local minima. The gradient method in 1D. The gradient method in the general case. Interpretation using the Taylor series. Choice of the learning step. Acceleration techniques (momentum term, Nesterov method, adaptive gains). Newton method. Interpretation using the Taylor series. Comparison of different techniques in selected examples.

Evaluation and Generalization

Evaluation of classifiers and regressors. Loss function. Risk and empirical risk. Choice of the model complexity. Generalization and over fitting. Training set, validation set and test set. Cross-validation.

Neural networks

Motivation. McCulloch & Pitts model. Binary classification with the McCulloch & Pitts model. Rosemblatt training algorithm. Limitations of Rosemblatt algorithm. The multilayer perceptron. Continuous and differentiable activation functions (logistic function, arctan, linear, RELU).

Neural networks

Multilayer perceptron. Networks with linear activation functions. Network specification. Choice of the number of layers. Cybenko theorem. Choice of the number of units per layer. Choice of the activation functions. Network training as an optimization problem. Gradient method. Training modes (batch, on-line and mini batch).

Neural networks

Training of multilayer perceptron. Backpropagation algorithm. Backpropagation network.

Neural networks

Convolutional neural networks. Image recognition. End-to-end approach. Limitations of multi-layer perceptron. Receptive fields. Shared weights. Alexnet.

Data classification

Key concepts: data classifier, discriminant functions, decision regions, decision boundary. Loss function (binary loss and general case). Risk and empirical risk. Ideal case: known data distribution p(x,y). Data generation. Bayes classifier for the binary loss and for the general loss. Exercise.

  

Data classification

Data classification with known probability distributions. Bayes classifier. Exercise. Confusion matrix. Probability of error. Exercise.

Linear Classifiers

Linear classifiers. Linear discriminant functions. Class coding. Indicator variables. Linear regression of indicator variables. Probabilistic model for the a posteriori distribution of the classes. Logistic regression.

Linear classifiers

Multi-class classification. Softmax model. Training. Linear discriminan analysis (LDA). Multivariate normal distribution. Decision boundaries.

Support Vector Machines

Classification of linearly separable data. Choice of the decision hyperplane. Hyperplane margin. Support vectors. Constraints on the support vectors and training data.. Estimation of the decision boundary. Optimization problem.

Support Vector Machines

Estimation of the maximum-margin hyperplane. Optimization problem with constraints. Lagrangian function. Dual problem.Classifier discriminant function.

Support Vector Machines

Classification of non separable data. Soft margin and slack variables. Modified optimization problem.
Support vector machines with curved decision surfaces. Non-linear transformation of the input data into a high dimension feature space. Estimation of the classifier.The naive approach. Computation of inner products of transformed vectors in low dimension space. Estimation of the classifier using kernels (the kernel trick).

Decision trees

Decision trees. Types of nodes. Classification of data using a decision tree. Node impurity. Examples.

Decision trees

Training of a decision tree. Objective function: tree impurity. Recursive optimization based on the node impurities. Examples.

Decision trees

Overfitting. Early stop. Tree prunning using a validation set. Random forests.

Unsupervised learning

Unsupervised learning problems. Clustering of data using centroids. K-means algorithm.

Unsupervised learning and exercises

Hierarchical clustering. Distance between clusters. Proximity matrix. Single link algorithm. Exercises.

Exercises

Exercises about taught subjects.