FenixEdu™

Motivation and course overview

Aula 1

Motivation and course overview

Supervised learning

Aula 2

Supervised learning. Regression and classification. Basic concepts: feature vectors, outcomes, training set. Problem formulation. Architecture of a supervised learning system.

Datasets. Iris data set. Imagenet Dataset.

Nearest neighbor and k nearest neighbor methods for classification and regression.

Linear regression

Aula 3

Linear model in 2D. Sum of squared errors criterion. Optimization. Normal equations.

Linear regression

Aula 4

Linear regression model (general case). Matrix notation. Sum of squared errors (SSE) criterion. Normal equations in matrix notation. Gauss-Markov theorem. Polynomial model. Radial basis function model.

Regularization

Aula 5

The role of regularization. Ridge regression. Lasso regression. Geometric interpretation. Exercises.

Optimization

Aula 6

Machine learning and optimization. Global and local minima. The gradient method in 1D. The gradient method in the general case. Interpretation using the Taylor series. Choice of the learning step. Acceleration techniques (momentum term, Nesterov method, adaptive gains). Newton method. Interpretation using the Taylor series. Comparison of different techniques in selected examples.

Evaluation and Generalization

Aula 7

Evaluation of classifiers and regressors. Loss function. Risk and empirical risk. Choice of the model complexity. Generalization and over fitting. Training set, validation set and test set. Cross-validation.

Neural networks

Aula 8

Motivation. McCulloch & Pitts model. Binary classification with the McCulloch & Pitts model. Rosemblatt training algorithm. Limitations of Rosemblatt algorithm. The multilayer perceptron. Continuous and differentiable activation functions (logistic function, arctan, linear, RELU).

Neural networks

Aula 9

Multilayer perceptron. Networks with linear activation functions. Network specification. Choice of the number of layers. Cybenko theorem. Choice of the number of units per layer. Choice of the activation functions. Network training as an optimization problem. Gradient method. Training modes (batch, on-line and mini batch).

Neural networks

Aula 10

Training of multilayer perceptron. Backpropagation algorithm. Backpropagation network.

Neural networks

Aula 11

Convolutional neural networks. Image recognition. End-to-end approach. Limitations of multi-layer perceptron. Receptive fields. Shared weights. Alexnet.

Data classification

Aula 12

Key concepts: data classifier, discriminant functions, decision regions, decision boundary. Loss function (binary loss and general case). Risk and empirical risk. Ideal case: known data distribution p(x,y). Data generation. Bayes classifier for the binary loss and for the general loss. Exercise.

Data classification

Aula 13

Data classification with known probability distributions. Bayes classifier. Exercise. Confusion matrix. Probability of error. Exercise.

Linear Classifiers

Aula 14

Linear classifiers. Linear discriminant functions. Class coding. Indicator variables. Linear regression of indicator variables. Probabilistic model for the a posteriori distribution of the classes. Logistic regression.

Linear classifiers

Aula 15

Multi-class classification. Softmax model. Training. Linear discriminan analysis (LDA). Multivariate normal distribution. Decision boundaries.

Support Vector Machines

Aula 16

Classification of linearly separable data. Choice of the decision hyperplane. Hyperplane margin. Support vectors. Constraints on the support vectors and training data.. Estimation of the decision boundary. Optimization problem.

Support Vector Machines

Aula 17

Estimation of the maximum-margin hyperplane. Optimization problem with constraints. Lagrangian function. Dual problem.Classifier discriminant function.

Support Vector Machines

Aula 18

Classification of non separable data. Soft margin and slack variables. Modified optimization problem.
Support vector machines with curved decision surfaces. Non-linear transformation of the input data into a high dimension feature space. Estimation of the classifier.The naive approach. Computation of inner products of transformed vectors in low dimension space. Estimation of the classifier using kernels (the kernel trick).

Decision trees

Aula 19

Decision trees. Types of nodes. Classification of data using a decision tree. Node impurity. Examples.

Decision trees

Aula 20

Training of a decision tree. Objective function: tree impurity. Recursive optimization based on the node impurities. Examples.

Decision trees

Aula 21

Overfitting. Early stop. Tree prunning using a validation set. Random forests.

Unsupervised learning

Aula 22

Unsupervised learning problems. Clustering of data using centroids. K-means algorithm.

Unsupervised learning and exercises

Aula 23

Hierarchical clustering. Distance between clusters. Proximity matrix. Single link algorithm. Exercises.

Exercises

Aula 24

Exercises about taught subjects.

Aprendizagem Automática

Planeamento

Aulas Teóricas

Motivation and course overview

Supervised learning

Linear regression

Linear regression

Regularization

Optimization

Evaluation and Generalization

Neural networks

Neural networks

Neural networks

Neural networks

Data classification

Data classification

Linear Classifiers

Linear classifiers

Support Vector Machines

Support Vector Machines

Support Vector Machines

Decision trees

Decision trees

Decision trees

Unsupervised learning

Unsupervised learning and exercises

Exercises