{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multilayer Perceptron and Backpropagation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 1 \n", "\n", "Consider the neural network considered in the first question of the theoretical component of the practical class, with number of units: 4,4,3,3.\n", "\n", "![](https://drive.google.com/uc?id=1SHUgdosKp6AX8rRAACCZ5nb4kUXreI3g)\n", "\n", "Assume all units, except the ones in the output layer, use the hyperbolic tangent activation function. \n", "\n", "Consider the following training example:\n", "\n", "$\\mathbf{x} =\\begin{bmatrix} 1, 0, 1, 0 \\end{bmatrix}^\\intercal $, $\\mathbf{y} =\\begin{bmatrix} 0\\\\ 1\\\\ 0 \\end{bmatrix}$\n", "\n", "❓ Using the squared error loss do a stochastic gradient descent update, initializing all connection weights and biases to 0.1 and a learning rate η = 0.1:\n", "\n", "1. Perform the forward pass\n", "2. Compute the loss\n", "3. Compute gradients with backpropagation\n", "4. Update weights" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "inputs = np.array([[1, 0, 1, 0]])\n", "labels = np.array([[0, 1, 0]])\n", "\n", "# First is input size, last is output size.\n", "units = [4, 4, 3, 3]\n", "\n", "# Initialize weights with correct shapes \n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Forward Pass\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Loss\n", "\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Backpropagation\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Update Gradients\n", "\n", "# Gradient updates.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "❓ Let's say we were using the same training example but with the following changes:\n", "- The output units have a softmax activation function\n", "- The error function is cross-entropy\n", "\n", "Keeping the same initializations and learning rate, adjust your computations to the new changes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Solution:** We need only to change: \n", "- the output, *i.e.*, $\\hat{y} = softmax(z_3)$ instead of $\\hat{y} = z_3$\n", "- the loss computation to $L = -y.log(\\hat{y})$\n", "- the gradient of the loss with respect to $z_3$: $\\frac{dL}{dz_3}$\n", "\n", "All other steps remain unchanged." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Your code here\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "❓ Complete functions `forward`, `compute_loss`, `backpropagation` and `update_weights` generalized to perform the same computations as before, but for any MLP architecture." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "'''\n", "x: single observation of shape (n,)\n", "weights: list of weight matrices [W1, W2, ...]\n", "biases: list of biases matrices [b1, b2, ...]\n", "\n", "y: final output\n", "hiddens: list of computed hidden layers [h1, h2, ...]\n", "'''\n", "\n", "def forward(x, weights, biases):\n", " num_layers = len(weights)\n", " g = np.tanh\n", " hiddens = []\n", " \n", " # compute hidden layers\n", " \n", " #compute output\n", " \n", " return output, hiddens\n", "\n", "def compute_loss(output, y):\n", " # compute loss\n", " \n", " return loss \n", "\n", "def backward(x, y, output, hiddens, weights):\n", " num_layers = len(weights)\n", " g = np.tanh\n", " z = output\n", "\n", " probs = np.exp(output) / np.sum(np.exp(output))\n", " grad_z = probs - y \n", " \n", " grad_weights = []\n", " grad_biases = []\n", " \n", " # Backpropagate gradient computations \n", " for i in range(num_layers-1, -1, -1):\n", " \n", " # Gradient of hidden parameters.\n", " pass\n", " \n", " # Making gradient vectors have the correct order\n", " grad_weights.reverse()\n", " grad_biases.reverse()\n", " return grad_weights, grad_biases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Question 2\n", "\n", "Now we will use the MLP on real data to classify handwritten digits.\n", "\n", "Data is loaded, split into train and test sets and target is one-hot encoded below:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from sklearn.datasets import load_digits\n", "from sklearn.model_selection import train_test_split\n", "\n", "data = load_digits()\n", "\n", "inputs = data.data \n", "labels = data.target \n", "n, p = np.shape(inputs)\n", "n_classes = len(np.unique(labels))\n", "\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(inputs, labels, test_size=0.2, random_state=42)\n", "\n", "# Encode labels as one-hot vectors.\n", "one_hot = np.zeros((np.size(y_train, 0), n_classes))\n", "for i in range(np.size(y_train, 0)):\n", " one_hot[i, y_train[i]] = 1\n", "y_train_ohe = one_hot\n", "one_hot = np.zeros((np.size(y_test, 0), n_classes))\n", "for i in range(np.size(y_test, 0)):\n", " one_hot[i, y_test[i]] = 1\n", "y_test_ohe = one_hot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "❓ Complete function `MLP_train_epoch` using your previously defined functions to compute one epoch of training using SGD:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "'''\n", "Outputs:\n", " - weights: list of updated weights\n", " - biases: list of updated \n", " - loss: scalar of total loss (sum for all observations)\n", "\n", "'''\n", "\n", "def MLP_train_epoch(inputs, labels, weights, biases):\n", " num_layers = len(weights)\n", " total_loss = 0\n", " \n", " # For each observation and target\n", " # Compute forward pass\n", " \n", " # Compute Loss and update total loss\n", " \n", " # Compute backpropagation\n", " \n", " # Update weights\n", " \n", " \n", " return weights, biases, total_loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use a MLP with a single hidden layer of 50 units and a learning rate of $0.001$. \n", "\n", "❓ Run 100 epochs of your MLP. Save the loss at each epoch in a list and plot the loss evolution after training." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Initialize weights\n", "\n", "# Empty loss list\n", "\n", "# Learning rate.\n", " \n", "# Run epochs and append loss to list\n", "\n", "# Plot loss evolution\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "❓ Complete function `MLP_predict` to get array of predictions from your trained MLP:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def MLP_predict(inputs, weights, biases):\n", " predicted_labels = []\n", " for x in inputs:\n", " # Compute forward pass and get the class with the highest probability\n", " pass\n", " predicted_labels = np.array(predicted_labels)\n", " return predicted_labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "❓ Compute the accuracy on the train and test sets." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can compare our results with Sklearn's implementation of the MLP. Compare their accuracies:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9993041057759221\n", "0.9722222222222222\n" ] } ], "source": [ "from sklearn.neural_network import MLPClassifier\n", "clf = MLPClassifier(hidden_layer_sizes=(50),\n", " activation='tanh',\n", " solver='sgd',\n", " learning_rate='constant',\n", " learning_rate_init=0.001,\n", " nesterovs_momentum=False,\n", " random_state=1,\n", " max_iter=1000)\n", "clf.fit(X_train, y_train)\n", "print(clf.score(X_train, y_train))\n", "print(clf.score(X_test, y_test))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }