{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "-jlGXqM_t452" }, "source": [ "# P8 - Convolutional Neural Networks (CNNs)\n", "We have now learned about the Perceptron, Linear and logistic regression, Multi-layer perceptron and backpropagation, Auto-encoders. \n", "\n", "In this pratical session about Convolutional Neural Networks (CNNs) we will use the MNIST datasets.\n", "\n", "First, we will obtain baselines using a Logistic Regression and a Feed-forward Neural Network." ] }, { "cell_type": "markdown", "metadata": { "id": "ITJR4snhxdT0" }, "source": [ "## 0.0 - Imports\n", "We will need to import some libraries to be used in this session. Libraries include data visualizers ([matplotlib](https://matplotlib.org/)), neural network package ([torch](https://pytorch.org/)), and other helper packages for data handling ([sklearn](https://scikit-learn.org/), [numpy](https://numpy.org/))." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "MWGjU3tDw4bD" }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from sklearn.base import BaseEstimator\n", "from sklearn.datasets import load_digits\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.utils import check_random_state\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torch.optim as optim\n", "from torchvision import datasets, transforms\n", "from torch.autograd import Variable\n", "from torch.utils.data import Dataset, DataLoader\n", "from torch.utils.data.sampler import SubsetRandomSampler\n", "import time\n", "import copy" ] }, { "cell_type": "markdown", "metadata": { "id": "W-od7M6WMN0N" }, "source": [ "Then, other variable definitions are needed to be set. This includes the size of the dataset we will use, and the configuration of the GPU to be activated:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ECqewHJ0MM62", "outputId": "e5377940-a224-4e98-b427-bad0a9579863" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cpu\n" ] } ], "source": [ "# Configure Device\n", "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')\n", "print(device)" ] }, { "cell_type": "markdown", "metadata": { "id": "odY0Ng9yycgr" }, "source": [ "### 0.1 - Create Dataloaders\n", "#### MNIST dataset \n", "Using torchvision we can easily download and use the MNIST dataset to create our train and validation dataloaders" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "snFv-Hu-zRnW" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\n", "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5ef88637fe884886acab052dc627132e", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/9912422 [00:00 creates an iterator of the dataloader and gets the next batchß\n", "batch_idx, (example_imgs, example_targets) = next(enumerate(mnist_train_dataloader))\n", "# info about the dataset\n", "D_in = np.prod(example_imgs.shape[1:])\n", "D_out = len(mnist_train_dataloader.dataset.targets.unique())\n", "print(\"Datasets shapes:\", {x: dataloaders[x].dataset.data.shape for x in ['train', 'val']})\n", "print(\"N input features:\", D_in, \"Output classes:\", D_out)\n", "print(\"Train batch:\", example_imgs.shape, example_targets.shape)\n", "batch_idx, (example_imgs, example_targets) = next(enumerate(mnist_val_dataloader))\n", "print(\"Val batch:\", example_imgs.shape, example_targets.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "JnFAmoinjY1T" }, "source": [ "We can plot some examples with corresponding labels using the following function. This function can also receive the predicted labels." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 284 }, "id": "5ZWvjQOvC2ep", "outputId": "c77ced2a-931a-4fb1-db71-5354316f0e6d" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def plot_img_label_prediction(imgs, y_true, y_pred=None, shape=(2, 3)):\n", " y_pred = [None] * len(y_true) if y_pred is None else y_pred\n", " fig = plt.figure()\n", " for i in range(np.prod(shape)):\n", " plt.subplot(*shape, i+1)\n", " plt.tight_layout()\n", " plt.imshow(imgs[i][0], cmap='gray', interpolation='none')\n", " plt.title(\"True: {} Pred: {}\".format(y_true[i], y_pred[i]))\n", " plt.xticks([])\n", " plt.yticks([])\n", "\n", "plot_img_label_prediction(imgs=example_imgs, y_true=example_targets, y_pred=None, shape=(2, 3))\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Mj3utDDuzDCj" }, "source": [ "### 1.1 Logistic Regression\n", "\n", "We can use a very simple Logistic Regression that receives our input images as a vector and predicts the digit. This will be our first baseline to compare with the CNNs." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "TniyY4bQzBMS", "outputId": "54a7e07e-3078-4a71-95f6-5670a051b4b2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Test score with penalty: 0.9030\n" ] } ], "source": [ "scaler = StandardScaler()\n", "X_train = scaler.fit_transform(np.reshape(X_train, (X_train.shape[0], -1)))\n", "X_val = scaler.transform(np.reshape(X_val, (X_val.shape[0], -1)))\n", "\n", "clf = LogisticRegression(C=50., multi_class='multinomial', solver='sag', tol=0.1)\n", "clf.fit(X_train, y_train)\n", "score = clf.score(X_val, y_val)\n", "\n", "print(\"Test score with penalty: %.4f\" % score)" ] }, { "cell_type": "markdown", "metadata": { "id": "A8rylkCnrwIy" }, "source": [ "We can select the coefficients for each class and reshape them into the image shape to plot them. This allows us to visualize what are the pixels that are contributing more to the classification for each of the digits. \n", "\n", "But what happens if the digits are not centered? Will we still get such a good performance? Lets test that out later!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 339 }, "id": "2pucfjpaDF9_", "outputId": "3d370f1b-27ce-4a5a-a05a-62e25f560876" }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "coef = clf.coef_.copy()\n", "plt.figure(figsize=(10, 5))\n", "scale = np.abs(coef).max()\n", "for i in range(10):\n", " l1_plot = plt.subplot(2, 5, i + 1)\n", " l1_plot.imshow(coef[i].reshape(28, 28), interpolation='nearest',\n", " cmap=plt.cm.RdBu, vmin=-scale, vmax=scale)\n", " l1_plot.set_xticks(())\n", " l1_plot.set_yticks(())\n", " l1_plot.set_xlabel('Class %i' % i)\n", "plt.suptitle('Classification coefficient vectors for...')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "kK8v34AK6xXJ" }, "source": [ "### 1.2 Feed-Forward Neural Network\n", "\n", "The first step is to create the functions that will allow us to implement a feed-forward neural network and manage the training and validation process.\n", "\n", "The MLP class will define the architecture of a feed-forward neural network, with a set of hidden layers (fully connected layers [nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)), with a activation function in between them ([relu](https://pytorch.org/docs/stable/generated/torch.nn.functional.relu.html#torch.nn.functional.relu)), and a [softmax](https://pytorch.org/docs/stable/generated/torch.nn.functional.log_softmax.html#torch.nn.functional.log_softmax) in the last layer. Since the dataset poses a multiclass classification problem, the last layer should have a number of neurons equal to the number of classes." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "In9r_o8vvNaz" }, "outputs": [], "source": [ "class MLP(nn.Module):\n", " def __init__(self, dim_layers):\n", " super(MLP, self).__init__()\n", " self.dim_layers = dim_layers\n", " layer_list = [nn.Linear(dim_layers[l], dim_layers[l+1]) for l in range(len(dim_layers) - 1)]\n", " self.lin_layers = nn.ModuleList(layer_list)\n", "\n", " def forward(self, X):\n", " X = X.view(-1, self.dim_layers[0])\n", " # apply relu\n", " for layer in self.lin_layers[:-1]:\n", " X = F.relu(layer(X))\n", " # use softmax for output layer\n", " return F.log_softmax(self.lin_layers[-1](X), dim=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "h6OVD_1xUwWH" }, "source": [ "##### training validation function for the MLP and CNN" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "B1eUu01N8wIR" }, "outputs": [], "source": [ "def train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25,\n", " scheduler=None, log_interval=None):\n", " since = time.time()\n", "\n", " best_model_wts = copy.deepcopy(model.state_dict())\n", " best_acc = 0.0\n", "\n", " # init dictionaries to save losses and accuracies of training and validation\n", " losses, accuracies = dict(train=[], val=[]), dict(train=[], val=[])\n", "\n", " for epoch in range(num_epochs):\n", " if log_interval is not None and epoch % log_interval == 0:\n", " print('Epoch {}/{}'.format(epoch, num_epochs - 1))\n", " print('-' * 10)\n", "\n", " # execute a training and validation phase for each epoch\n", " for phase in ['train', 'val']:\n", " if phase == 'train':\n", " model.train() # set model to train mode\n", " else:\n", " model.eval() # Set model to eval mode\n", "\n", " running_loss = 0.0\n", " running_corrects = 0\n", "\n", " # iterate over the data\n", " nsamples = 0\n", " for inputs, labels in dataloaders[phase]:\n", " inputs = inputs.to(device)\n", " labels = labels.to(device)\n", " nsamples += inputs.shape[0]\n", "\n", " # set the parameter gradients to zero\n", " optimizer.zero_grad()\n", "\n", " with torch.set_grad_enabled(phase == 'train'):\n", " outputs = model(inputs)\n", " _, preds = torch.max(outputs, 1)\n", " loss = criterion(outputs, labels)\n", "\n", " # if in training phase, perform backward prop and optimize\n", " if phase == 'train':\n", " loss.backward()\n", " optimizer.step()\n", "\n", " # increment loss and correct counts\n", " running_loss += loss.item() * inputs.size(0)\n", " running_corrects += torch.sum(preds == labels.data)\n", "\n", " if scheduler is not None and phase == 'train':\n", " scheduler.step()\n", "\n", " epoch_loss = running_loss / nsamples\n", " epoch_acc = running_corrects.double() / nsamples\n", "\n", " losses[phase].append(epoch_loss)\n", " accuracies[phase].append(epoch_acc)\n", " if log_interval is not None and epoch % log_interval == 0:\n", " print('{} Loss: {:.4f} Acc: {:.2f}%'.format(\n", " phase, epoch_loss, 100 * epoch_acc))\n", "\n", " # deep copy the best model\n", " if phase == 'val' and epoch_acc > best_acc:\n", " best_acc = epoch_acc\n", " best_model_wts = copy.deepcopy(model.state_dict())\n", " if log_interval is not None and epoch % log_interval == 0:\n", " print()\n", "\n", " time_elapsed = time.time() - since\n", " print('Training complete in {:.0f}m {:.0f}s'.format(\n", " time_elapsed // 60, time_elapsed % 60))\n", " print('Best val Acc: {:.2f}%'.format(100 * best_acc))\n", "\n", " # load best model weights to return\n", " model.load_state_dict(best_model_wts)\n", "\n", " return model, losses, accuracies" ] }, { "cell_type": "markdown", "metadata": { "id": "0CBE5tRMZEfr" }, "source": [ "We will start by creating a simple network with some hidden layers. Thus, in addition to the input, it will have 3 fully connected layer which, in this implemetation, is assigned to the input of the MLP Class. We will use the Stochastic Gradient Descend optimizer ([optim.SGD](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html)) with 0.01 learning rate and 0.5 momentum. The loss function to be optimized will be negative log likelihood ([nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html)). Training and validation will be managed by the function \"train_val_model\" previously define." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 981 }, "id": "200WI3xND6_M", "outputId": "79913a00-abf0-4e48-8177-64b49bbd6fac" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 0/14\n", "----------\n", "train Loss: 0.7707 Acc: 77.78%\n", "val Loss: 0.2943 Acc: 91.41%\n", "\n", "Epoch 2/14\n", "----------\n", "train Loss: 0.1756 Acc: 94.85%\n", "val Loss: 0.1514 Acc: 95.56%\n", "\n", "Epoch 4/14\n", "----------\n", "train Loss: 0.1097 Acc: 96.83%\n", "val Loss: 0.1115 Acc: 96.57%\n", "\n", "Epoch 6/14\n", "----------\n", "train Loss: 0.0750 Acc: 97.82%\n", "val Loss: 0.0881 Acc: 97.24%\n", "\n", "Epoch 8/14\n", "----------\n", "train Loss: 0.0546 Acc: 98.44%\n", "val Loss: 0.0816 Acc: 97.46%\n", "\n", "Epoch 10/14\n", "----------\n", "train Loss: 0.0406 Acc: 98.84%\n", "val Loss: 0.0704 Acc: 97.84%\n", "\n", "Epoch 12/14\n", "----------\n", "train Loss: 0.0296 Acc: 99.20%\n", "val Loss: 0.0738 Acc: 97.92%\n", "\n", "Epoch 14/14\n", "----------\n", "train Loss: 0.0218 Acc: 99.43%\n", "val Loss: 0.0755 Acc: 97.80%\n", "\n", "Training complete in 2m 28s\n", "Best val Acc: 97.92%\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "model_mlp = MLP([D_in, 256, 128, 64, D_out]).to(device) # [D_in, 512, 256, 128, 64, D_out]\n", "\n", "optimizer = optim.SGD(model_mlp.parameters(), lr=0.01, momentum=0.5)\n", "criterion = nn.NLLLoss()\n", "\n", "model_mlp, losses, accuracies = train_val_model(model_mlp, criterion, optimizer, dataloaders,\n", " num_epochs=15, log_interval=2)\n", "\n", "_ = plt.plot(losses['train'], '-b', losses['val'], '--r')" ] }, { "cell_type": "markdown", "metadata": { "id": "HXhHtX1TkUba" }, "source": [ "### 1.3 Convolutional Neural Network\n", "\n", "Convolutional layers capture patterns corresponding to relevant features independently of where they occur in the input. To do so, they slide a window over the input and apply the convolution operation with a set of kernels or filters that represent the features. Although it is not their only field of application, convolutional neural networks are mainly praised for their performance on image processing tasks.\n", "\n", "The training and validation management for the CNN implementation will be performed as the feed-forward network, however we will have to define the network's architecture.\n", "\n", "For that we will implement a CNN class to define how many layers it comprises and how the layers will be connected.\n", "\n", "The initialization (`__init__`) function will define the architecture and the `forward` function will implement how the different layers are connected. This architecture will be a sequece of 2 convolutional layers ([nn.Conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)) (1st: output channels 10, kernel size 5; 2nd: output channels 20, kernel size 5), then 2 fully connected layers ([nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)) (1st: output features 50; 2nd: output features 10 (the number of classes)). Once again, the final layer will be a [softmax](https://pytorch.org/docs/stable/generated/torch.nn.functional.log_softmax.html#torch.nn.functional.log_softmax) function that will choose the most probable class of the 10 in the input.\n", "\n", "Between the second convolution layer and the first fully connected, we will set a dropout layer ([nn.Dropout2d](https://pytorch.org/docs/stable/generated/torch.nn.Dropout2d.html)). The idea behind dropout is to disable a percentage of randomly selected neurons during each step of the training phase, in order to avoid overfitting." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "PZ0mCl24EoaM" }, "outputs": [], "source": [ "class CNN(nn.Module):\n", " \"\"\"Basic Pytorch CNN for MNIST-like data.\"\"\"\n", "\n", " def __init__(self):\n", " super(CNN, self).__init__()\n", " self.conv1 = nn.Conv2d(1, 10, kernel_size=5)\n", " self.conv2 = nn.Conv2d(10, 20, kernel_size=5)\n", " self.conv2_drop = nn.Dropout2d()\n", " self.fc1 = nn.Linear(320, 50)\n", " self.fc2 = nn.Linear(50, 10)\n", "\n", " def forward(self, x, T=1.0):\n", " # Batch size = 64, images 28x28 =>\n", " # x.shape = [64, 1, 28, 28]\n", " x = F.relu(F.max_pool2d(self.conv1(x), 2))\n", " # Convolution with 5x5 filter without padding and 10 channels =>\n", " # x.shape = [64, 10, 24, 24] since 24 = 28 - 5 + 1\n", " # Max pooling with stride of 2 =>\n", " # x.shape = [64, 10, 12, 12]\n", " x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))\n", " # Convolution with 5x5 filter without padding and 20 channels =>\n", " # x.shape = [64, 20, 8, 8] since 8 = 12 - 5 + 1\n", " # Max pooling with stride of 2 =>\n", " # x.shape = [64, 20, 4, 4]\n", " x = x.view(-1, 320)\n", " # Reshape =>\n", " # x.shape = [64, 320]\n", " x = F.relu(self.fc1(x))\n", " x = F.dropout(x, training=self.training)\n", " x = self.fc2(x)\n", " x = F.log_softmax(x, dim=1)\n", " return x" ] }, { "cell_type": "markdown", "metadata": { "id": "mv9vdZZ7OlSh" }, "source": [ "As previously, lets describe the model to be trained. We will use the ADAM optimizes ([optim.Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam)), with learning rate 0.001, and the same negative log likelihood ([nn.NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html))." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "ImTlr5JeEsb6", "outputId": "a8d8e0a6-e3cc-4b37-d022-adc0241e8b88" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 0/24\n", "----------\n", "train Loss: 0.5321 Acc: 83.19%\n", "val Loss: 0.0969 Acc: 97.07%\n", "\n", "Epoch 2/24\n", "----------\n", "train Loss: 0.1932 Acc: 94.42%\n", "val Loss: 0.0587 Acc: 98.02%\n", "\n", "Epoch 4/24\n", "----------\n", "train Loss: 0.1598 Acc: 95.36%\n", "val Loss: 0.0422 Acc: 98.65%\n", "\n", "Epoch 6/24\n", "----------\n", "train Loss: 0.1401 Acc: 95.99%\n", "val Loss: 0.0364 Acc: 98.78%\n", "\n", "Epoch 8/24\n", "----------\n", "train Loss: 0.1305 Acc: 96.24%\n", "val Loss: 0.0331 Acc: 98.90%\n", "\n", "Epoch 10/24\n", "----------\n", "train Loss: 0.1236 Acc: 96.32%\n", "val Loss: 0.0316 Acc: 99.02%\n", "\n", "Epoch 12/24\n", "----------\n", "train Loss: 0.1202 Acc: 96.44%\n", "val Loss: 0.0351 Acc: 98.82%\n", "\n", "Epoch 14/24\n", "----------\n", "train Loss: 0.1156 Acc: 96.50%\n", "val Loss: 0.0295 Acc: 99.01%\n", "\n", "Epoch 16/24\n", "----------\n", "train Loss: 0.1146 Acc: 96.56%\n", "val Loss: 0.0295 Acc: 99.01%\n", "\n", "Epoch 18/24\n", "----------\n", "train Loss: 0.1086 Acc: 96.82%\n", "val Loss: 0.0298 Acc: 98.98%\n", "\n", "Epoch 20/24\n", "----------\n", "train Loss: 0.1062 Acc: 96.81%\n", "val Loss: 0.0300 Acc: 98.97%\n", "\n", "Epoch 22/24\n", "----------\n", "train Loss: 0.1053 Acc: 96.80%\n", "val Loss: 0.0300 Acc: 98.93%\n", "\n", "Epoch 24/24\n", "----------\n", "train Loss: 0.1000 Acc: 97.01%\n", "val Loss: 0.0288 Acc: 99.08%\n", "\n", "Training complete in 6m 42s\n", "Best val Acc: 99.08%\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "model = CNN().to(device)\n", "optimizer = torch.optim.Adam(model.parameters(), lr=0.001)\n", "criterion = nn.NLLLoss()\n", "\n", "model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,\n", " num_epochs=25, log_interval=2)\n", "\n", "_ = plt.plot(losses['train'], '-b', losses['val'], '--r')" ] }, { "cell_type": "markdown", "metadata": { "id": "ULZ91b0cPhy5" }, "source": [ "We have now completed training and validation with 3 different models: Logistic Regression, Feed-Forward Network, and Convolutional Neural Network. \n", "\n", "We have seen that with the CNN, the performance of the model in the validation set, outperforms the other models (~99% accuracy against ~90% and ~98%). " ] }, { "cell_type": "markdown", "metadata": { "id": "PHyGUuZbTvhr" }, "source": [ "The difference in performance between CNNs and MLP is small but how many learnable parameters are we using in the MLP and in CNN models?\n", "\n", "We can find it out using the following lines of code:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "acy0l3-YQjT2", "outputId": "ceb6251b-4ea1-4168-f23a-0c42feea37ce" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of parameters in the MLP model: 242762\n", "Number of parameters in the CNN model: 21840\n" ] } ], "source": [ "#model_mlp = MLP([D_in, 256, 128, 64, D_out]).to(device)\n", "model_parameters_mlp = filter(lambda p: p.requires_grad, model_mlp.parameters())\n", "params_mlp = sum([np.prod(p.size()) for p in model_parameters_mlp])\n", "print('Number of parameters in the MLP model: {}'.format(params_mlp))\n", "\n", "model_parameters_cnn = filter(lambda p: p.requires_grad, model.parameters())\n", "params_cnn = sum([np.prod(p.size()) for p in model_parameters_cnn])\n", "print('Number of parameters in the CNN model: {}'.format(params_cnn))" ] }, { "cell_type": "markdown", "metadata": { "id": "Sj28CWvrMbOw" }, "source": [ "You can see that we have ~11x more learnable parameters to achieve almost the same performance.\n", "\n", "We can experiment and try to find out the number of layers and corresponding sizes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2RmgJhIPMECw", "outputId": "3c677343-5c57-4a1d-a047-9bca2d8fe52f" }, "outputs": [], "source": [ "model_mlp_test = MLP([D_in, 32, D_out]).to(device)\n", "model_parameters_mlp_test = filter(lambda p: p.requires_grad, model_mlp_test.parameters())\n", "params_mlp_test = sum([np.prod(p.size()) for p in model_parameters_mlp_test])\n", "print('Number of parameters in the MLP model: {}'.format(params_mlp_test))" ] }, { "cell_type": "markdown", "metadata": { "id": "B_oq9682QWCF" }, "source": [ "And how does that model perform? We are about to find out" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 979 }, "id": "w6oa0TeBQU9E", "outputId": "e8c478bf-27c6-4ade-f0bd-59302eaf0e49" }, "outputs": [], "source": [ "optimizer = optim.SGD(model_mlp_test.parameters(), lr=0.01, momentum=0.5)\n", "criterion = nn.NLLLoss()\n", "\n", "model_mlp_test, losses, accuracies = train_val_model(model_mlp_test, criterion, \n", " optimizer, dataloaders,\n", " num_epochs=15, \n", " log_interval=5)\n", "\n", "_ = plt.plot(losses['train'], '-b', losses['val'], '--r')" ] }, { "cell_type": "markdown", "metadata": { "id": "rpmgachOUCnX" }, "source": [ "We can see a drop in performance compared with the previous MLP model. So we can understand that although we have less learnable parameters, due to properties of CNNs (e.g., invariance and parameter sharing), which allow them to have fewer weights as some parameters are shared.\n", "\n", "CNNs are expected to be invariant to the location where important features occur in the input. In fact, it's not unusual that there is a dataset shift where the data acquisition process suffers some modification. We will do this by applying a transformation with horizontal translations to our validation dataset and see how robust each model is to these shifts.\n", "\n", "We can do this by going back to **0.1 - Create Dataloaders -\n", "MNIST dataset** cell to define the test transform using the following code \n", "\n", "```\n", "mnist_transform_test = transforms.Compose(\n", " [transforms.ToTensor(),\n", " transforms.RandomAffine(0, translate=[0.1, 0]),\n", " transforms.Normalize((0.1307,), (0.3081,))])\n", "```\n", "\n", "and replace\n", "\n", "`mnist_val_dataset = datasets.MNIST('../data', download=True, train=False, transform=mnist_transform)`\n", "\n", "with\n", "\n", "`mnist_val_dataset = datasets.MNIST('../data', download=True, train=False, transform=mnist_transform_test)`" ] }, { "cell_type": "markdown", "metadata": { "id": "-5gcf_gMlcqI" }, "source": [ "After rerunning the different models we can see that the accuracy of the Logistic Regression drops from ~90% to ~72%, the MLP drops from ~98% to ~87%, and the CNN drops from ~99% to ~97%. This shows that the learned features are more robust to variances in location, as expected." ] }, { "cell_type": "markdown", "metadata": { "id": "nU3NwQ7Nuvhv" }, "source": [ "# Bonus Case - Attention with small images and CNNs. (And how to create a dataset that takes numpy arrays)\n", "\n", "In this case we will use the Scikit-Learn's digits dataset\n", "\n", "## Scikit-Learn Digits\n", "\n", "This dataset is provided by scikit-learn and the digit images are returned as numpy ndarray. We will use PIL (Python Image Library) to convert the numpy ndarray to a image, tranform it to a tensor and normalize it.\n", "\n", "In this case we don't have a predefined Digits Dataset provided by torchvision so we will need to write a custom Dataset class and implement three functions: \n", "\n", "`__init__`, `__len__`, and `__getitem__`.\n", "\n", "Scikit-Learn return the digits images and labels as ndarrays. Each digit image is an 8x8 array.\n", "\n", "To use the previous CNN, we will use a transform to resize the images to the MNIST image size." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "A4v-XFzcv9If" }, "outputs": [], "source": [ "SKLEARN_DIGITS_TRAIN_SIZE = 1247\n", "SKLEARN_DIGITS_VAL_SIZE = 550\n", "\n", "class NumpyDataset(Dataset):\n", "\n", " def __init__(self, data, targets, transform=None):\n", " self.data = torch.from_numpy(data).float()\n", " self.targets = torch.from_numpy(targets).long()\n", " self.transform = transform\n", "\n", " def __getitem__(self, index):\n", " x = np.expand_dims(self.data[index], axis=2)\n", " y = self.targets[index]\n", " if self.transform:\n", " x = self.transform(x)\n", " return x, y\n", "\n", " def __len__(self):\n", " return len(self.data) \n", "\n", "digits_transform = transforms.Compose([\n", " transforms.ToPILImage(),\n", " transforms.Resize(28),\n", " transforms.ToTensor(),\n", " ])\n", "\n", "# Get sklearn digits dataset\n", "X, y = load_digits(return_X_y=True)\n", "X = X.reshape((len(X), 8, 8))\n", "y_train = y[:-SKLEARN_DIGITS_VAL_SIZE]\n", "y_val = y[-SKLEARN_DIGITS_VAL_SIZE:]\n", "X_train = X[:-SKLEARN_DIGITS_VAL_SIZE]\n", "X_val = X[-SKLEARN_DIGITS_VAL_SIZE:]\n", "\n", "digits_train_dataset = NumpyDataset(X_train, y_train, transform=digits_transform)\n", "digits_val_dataset = NumpyDataset(X_val, y_val, transform=digits_transform)\n", "digits_train_dataloader = torch.utils.data.DataLoader(digits_train_dataset, batch_size=64, shuffle=True)\n", "digits_val_dataloader = torch.utils.data.DataLoader(digits_val_dataset, batch_size=64, shuffle=True)\n", "\n", "dataloaders = dict(train=digits_train_dataloader, val=digits_val_dataloader)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dhQU3v7Zv9Ih", "outputId": "030f1fa0-62a0-4dc0-d61f-013e09e2457d" }, "outputs": [], "source": [ "# Get some examples of images and targets\n", "_, (example_train_imgs, example_train_targets) = next(enumerate(digits_train_dataloader))\n", "_, (example_val_imgs, example_val_targets) = next(enumerate(digits_val_dataloader))\n", "\n", "# Info about the dataset\n", "D_in = np.prod(example_imgs.shape[1:])\n", "D_out = len(digits_train_dataloader.dataset.targets.unique())\n", "\n", "# Output information\n", "print(\"Datasets shapes (before transformations):\", {x: dataloaders[x].dataset.data.shape for x in ['train', 'val']})\n", "print(\"N input features:\", D_in, \"Output classes:\", D_out)\n", "print(\"Train batch:\", example_train_imgs.shape, example_train_targets.shape)\n", "print(\"Val batch:\", example_val_imgs.shape, example_val_targets.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 284 }, "id": "Vx78pb7Ov9Ih", "outputId": "207e26b2-c2f7-41e7-ae13-9421b398027e" }, "outputs": [], "source": [ "plot_img_label_prediction(imgs=example_train_imgs, y_true=example_train_targets, y_pred=None, shape=(2, 3))\n" ] }, { "cell_type": "markdown", "metadata": { "id": "xbBAH9OTv9Ii" }, "source": [ "### Logistic Regression" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "W46ofUE6v9Ii", "outputId": "5e698e3a-c465-45c3-9f72-07f566ef8055" }, "outputs": [], "source": [ "scaler = StandardScaler()\n", "print(X_train.squeeze().shape)\n", "X_train = scaler.fit_transform(np.reshape(X_train, (X_train.shape[0], -1)))\n", "X_val = scaler.transform(np.reshape(X_val, (X_val.shape[0], -1)))\n", "\n", "# Turn up tolerance for faster convergence\n", "clf = LogisticRegression(C=50., multi_class='multinomial', solver='sag', tol=0.1)\n", "clf.fit(X_train, y_train)\n", "#sparsity = np.mean(clf.coef_ == 0) * 100\n", "score = clf.score(X_val, y_val)\n", "\n", "print(\"Test score with penalty: %.4f\" % score)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 339 }, "id": "E_G2Rt0Gv9Ij", "outputId": "59791f8a-e8ac-41f9-81e9-8cdc92cadd42" }, "outputs": [], "source": [ "coef = clf.coef_.copy()\n", "plt.figure(figsize=(10, 5))\n", "scale = np.abs(coef).max()\n", "for i in range(10):\n", " l1_plot = plt.subplot(2, 5, i + 1)\n", " l1_plot.imshow(coef[i].reshape(8, 8), interpolation='nearest',\n", " cmap=plt.cm.RdBu, vmin=-scale, vmax=scale)\n", " l1_plot.set_xticks(())\n", " l1_plot.set_yticks(())\n", " l1_plot.set_xlabel('Class %i' % i)\n", "plt.suptitle('Classification coefficient vectors for...')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "PfYdXpde4bg0" }, "source": [ "### Feed-forward using digits dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "UDKn4WS636Bg", "outputId": "754a52a5-d33e-4a7f-9431-df84fb38b3d8" }, "outputs": [], "source": [ "model = MLP([D_in, 512, 256, 128, 64, D_out]).to(device)\n", "\n", "optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)\n", "criterion = nn.NLLLoss()\n", "\n", "model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,\n", " num_epochs=20, log_interval=2)\n", "\n", "_ = plt.plot(losses['train'], '-b', losses['val'], '--r')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 724 }, "id": "R9ImEBeo4MW6", "outputId": "c402c8da-dd25-4212-86f6-d3768e1619ed" }, "outputs": [], "source": [ "model = CNN().to(device)\n", "optimizer = torch.optim.Adam(model.parameters(), lr=0.001)\n", "criterion = nn.NLLLoss()\n", "\n", "model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,\n", " num_epochs=50, log_interval=10)\n", "\n", "_ = plt.plot(losses['train'], '-b', losses['val'], '--r')" ] }, { "cell_type": "markdown", "metadata": { "id": "uIhei09Ruvf-" }, "source": [ "# Bonus Information - Visualizing CNN filters\n", "\n", "Some work have been done to demonstrate the type of features learned by different filters in different layers. \n", "\n", "For instance, considering a known CNN called **VGG16** which has the following architecture\n", "\n", "![image](https://media.geeksforgeeks.org/wp-content/uploads/20200219152327/conv-layers-vgg16.jpg)\\[taken from: https://www.geeksforgeeks.org/vgg-16-cnn-model/ \\]\n", "\n", "these would be some of the filters from some of the layers: \n", "\n", "\t \n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
Layer 2
(Conv 1-2)
Layer 10
(Conv 2-1)
Layer 17
(Conv 3-1)
Layer 24
(Conv 4-1)
\n", "\n", "or obtain the class activations:\n", "\n", "\t \n", " \t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
Input Image Layer Vis. (Filter=0) Filter Vis. (Layer=29)
\n", "\n", "\\[examples taken from: http://www.github.com/utkuozbulak/pytorch-cnn-visualizations \\]\n" ] }, { "cell_type": "markdown", "metadata": { "id": "VU21L86BvAXK" }, "source": [ "# Bonus Information - Predefined architectures, pre-trained models and transfer learning\n", "\n", "Packages like [torchvision](https://pytorch.org/vision/stable/index.html) and [timm](https://rwightman.github.io/pytorch-image-models/) offer you the possibility of using predefined architectures or even use pre-trained models that can be used to fine tune the models for that same task or used for transfer learning.\n", "\n", "Besides datasets, transforms and others, **Torchvision** has a large number of predefined architecture with the possibility of loading the pre-trained weights." ] }, { "cell_type": "markdown", "metadata": { "id": "w_1YpmkV-PbU" }, "source": [ "#### Torchvision classification models examples\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "l5yOWA9l4lrF" }, "outputs": [], "source": [ "import torchvision.models as models\n", "\n", "# construct a model with random weights to be trained\n", "resnet18 = models.resnet18()\n", "\n", "# load a pre-trained model\n", "resnet18 = models.resnet18(pretrained=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "nAsfqUIyDBD7" }, "source": [ "For examples of different models and how to use pre-trained weights please visit https://pytorch.org/vision/stable/models.html#\n", "\n", "\n", "\n", "Another possibility is **timm** which contains models for classification only.\n", "In **timm** you are not restricted to have inputs only with 1/3-channels, allowing you to use architectures or pre-trained models using images that have 2 or > 3-channels." ] }, { "cell_type": "markdown", "metadata": { "id": "4zS8Ykbo-ZUg" }, "source": [ "#### timm classification models examples" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "w0FQYnKQ-lr_", "outputId": "0d3ef180-ef41-4653-99dd-6721c41c2475" }, "outputs": [], "source": [ "if 'google.colab' in str(get_ipython()):\n", " !pip install -q timm\n", "import timm\n", "\n", "# list all models\n", "print(timm.list_models())\n", "\n", "# list pre-trained models\n", "print(timm.list_models(pretrained=True))\n", "\n", "# list models architectures by wildcards\n", "print(timm.list_models('*resne*t*'))\n", "\n", "# construct a model with random weights to be trained\n", "model = timm.create_model('resnet18')\n", "\n", "# load a pre-trained model\n", "model = timm.create_model('resnet18', pretrained=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "OSEwqB8ADIao" }, "source": [ "For more details on how to use this package visit https://rwightman.github.io/pytorch-image-models/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kaank-9kDI72" }, "outputs": [], "source": [] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "name": "P8-CNNs.ipynb", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 1 }