{ "cells": [ { "cell_type": "markdown", "id": "9784c439", "metadata": {}, "source": [ "# Transformers" ] }, { "cell_type": "markdown", "id": "027916b4", "metadata": {}, "source": [ "In this part we will cover Transformer based large pretrained models. \n", "\n", "This notebook focus on showing how you can use the widely known Hugging Face library to apply different types of transformer models to a different range of tasks, from NLP to CV.\n", "\n", "We hope you learn how you can levarage pretrained transformer-based models and how to fine-tune them to a specific downstream task. " ] }, { "cell_type": "markdown", "id": "5da8f539", "metadata": {}, "source": [ "To dive deep into transformers, we recommend to start by reading\n", " \n", "- http://jalammar.github.io/illustrated-transformer/\n", "\n", "This gives a step by step explanation of the original paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762)" ] }, { "cell_type": "markdown", "id": "57f4076c", "metadata": {}, "source": [ "#### Code implemantation\n", "\n", "It's also good pratice to try implementing yourself the original code before using a Transformers library:\n", "\n", "Code Implementation\n", "- http://nlp.seas.harvard.edu/2018/04/03/attention.html" ] }, { "cell_type": "markdown", "id": "184e2623", "metadata": {}, "source": [ "# Hugging Face πŸ€—" ] }, { "cell_type": "markdown", "id": "87e97321", "metadata": {}, "source": [ "\"Training a transformer model and deploying these models can be quite challeging. In general these models have millions to tens of billions of parameters and requires large amount of data. \n", "\n", "This becomes very costly in terms of time and compute resources. It even translates to environmental impact. Imagine if each time a research team, a student organization, or a company wanted to train a model, it did so from scratch. This would lead to huge, unnecessary global costs!\n", "\n", "\n", "The πŸ€— Transformers library was created to solve this problem. Its goal is to provide a single API through which any Transformer model can be loaded, trained, and saved. \n", "\n", "The Hugging Face πŸ€— Transformers library provides the functionality to create and use those shared models.\"\n", "\n", "For more details look at the official [Hugging Face course](https://huggingface.co/course/chapter1/1)" ] }, { "cell_type": "markdown", "id": "75759e71", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": 1, "id": "9ac93857", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting transformers\n", " Downloading transformers-4.16.1-py3-none-any.whl (3.5 MB)\n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.5 MB 3.1 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/site-packages (from transformers) (5.4.1)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.9/site-packages (from transformers) (2.26.0)\n", "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/site-packages (from transformers) (4.62.3)\n", "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/site-packages (from transformers) (20.8)\n", "Collecting filelock\n", " Downloading filelock-3.4.2-py3-none-any.whl (9.9 kB)\n", "Collecting sacremoses\n", " Using cached sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)\n", "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/site-packages (from transformers) (2021.11.10)\n", "Collecting tokenizers!=0.11.3,>=0.10.1\n", " Downloading tokenizers-0.11.4-cp39-cp39-macosx_10_11_x86_64.whl (3.7 MB)\n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3.7 MB 33.9 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/site-packages (from transformers) (1.19.5)\n", "Collecting huggingface-hub<1.0,>=0.1.0\n", " Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)\n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 67 kB 20.4 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers) (3.7.4.3)\n", "Collecting packaging>=20.0\n", " Downloading packaging-21.3-py3-none-any.whl (40 kB)\n", "\u001b[K |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 40 kB 15.4 MB/s eta 0:00:01\n", "\u001b[?25hRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.9/site-packages (from packaging>=20.0->transformers) (2.4.7)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/site-packages (from requests->transformers) (2021.10.8)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/site-packages (from requests->transformers) (3.3)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/site-packages (from requests->transformers) (1.26.7)\n", "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/site-packages (from requests->transformers) (2.0.9)\n", "Requirement already satisfied: joblib in /usr/local/lib/python3.9/site-packages (from sacremoses->transformers) (1.0.1)\n", "Requirement already satisfied: click in /usr/local/lib/python3.9/site-packages (from sacremoses->transformers) (8.0.3)\n", "Requirement already satisfied: six in /usr/local/lib/python3.9/site-packages (from sacremoses->transformers) (1.15.0)\n", "Installing collected packages: packaging, filelock, tokenizers, sacremoses, huggingface-hub, transformers\n", " Attempting uninstall: packaging\n", " Found existing installation: packaging 20.8\n", " Uninstalling packaging-20.8:\n", " Successfully uninstalled packaging-20.8\n", "Successfully installed filelock-3.4.2 huggingface-hub-0.4.0 packaging-21.3 sacremoses-0.0.47 tokenizers-0.11.4 transformers-4.16.1\n", "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 21.3.1 is available.\n", "You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\n", "Requirement already satisfied: ipywidgets in /usr/local/lib/python3.9/site-packages (7.6.3)\n", "Requirement already satisfied: nbformat>=4.2.0 in /usr/local/lib/python3.9/site-packages (from ipywidgets) (5.1.2)\n", "Requirement already satisfied: traitlets>=4.3.1 in /usr/local/lib/python3.9/site-packages (from ipywidgets) (5.0.5)\n", "Requirement already satisfied: ipython>=4.0.0 in /usr/local/lib/python3.9/site-packages (from ipywidgets) (7.21.0)\n", "Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.9/site-packages (from ipywidgets) (1.0.0)\n", "Requirement already satisfied: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.9/site-packages (from ipywidgets) (3.5.1)\n", "Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.9/site-packages (from ipywidgets) (5.5.0)\n", "Requirement already satisfied: appnope in /usr/local/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets) (0.1.2)\n", "Requirement already satisfied: jupyter-client in /usr/local/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets) (6.1.11)\n", "Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.9/site-packages (from ipykernel>=4.5.1->ipywidgets) (6.1)\n", "Requirement already satisfied: backcall in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (0.2.0)\n", "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (3.0.17)\n", "Requirement already satisfied: pygments in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (2.6.1)\n", "Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (0.18.0)\n", "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (53.0.0)\n", "Requirement already satisfied: decorator in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (4.4.2)\n", "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (4.8.0)\n", "Requirement already satisfied: pickleshare in /usr/local/lib/python3.9/site-packages (from ipython>=4.0.0->ipywidgets) (0.7.5)\n", "Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.9/site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets) (0.8.1)\n", "Requirement already satisfied: jupyter-core in /usr/local/lib/python3.9/site-packages (from nbformat>=4.2.0->ipywidgets) (4.7.1)\n", "Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.9/site-packages (from nbformat>=4.2.0->ipywidgets) (3.2.0)\n", "Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.9/site-packages (from nbformat>=4.2.0->ipywidgets) (0.2.0)\n", "Requirement already satisfied: six>=1.11.0 in /usr/local/lib/python3.9/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (1.15.0)\n", "Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.9/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (20.3.0)\n", "Requirement already satisfied: pyrsistent>=0.14.0 in /usr/local/lib/python3.9/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (0.17.3)\n", "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.9/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets) (0.7.0)\n", "Requirement already satisfied: wcwidth in /usr/local/lib/python3.9/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets) (0.2.5)\n", "Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.9/site-packages (from widgetsnbextension~=3.5.0->ipywidgets) (6.2.0)\n", "Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.9.2)\n", "Requirement already satisfied: nbconvert in /usr/local/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (6.0.7)\n", "Requirement already satisfied: jinja2 in /usr/local/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.11.3)\n", "Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (22.0.3)\n", "Requirement already satisfied: prometheus-client in /usr/local/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.9.0)\n", "Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (20.1.0)\n", "Requirement already satisfied: Send2Trash>=1.5.0 in /usr/local/lib/python3.9/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.5.0)\n", "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.9/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets) (2.8.1)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: cffi>=1.0.0 in /usr/local/lib/python3.9/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.14.5)\n", "Requirement already satisfied: pycparser in /usr/local/lib/python3.9/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.20)\n", "Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.9/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.1.1)\n", "Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.8.4)\n", "Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.4.3)\n", "Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.3)\n", "Requirement already satisfied: bleach in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (3.3.0)\n", "Requirement already satisfied: testpath in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.4.4)\n", "Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.3)\n", "Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.1.2)\n", "Requirement already satisfied: defusedxml in /usr/local/lib/python3.9/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.7.1)\n", "Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.9/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.5.1)\n", "Requirement already satisfied: async-generator in /usr/local/lib/python3.9/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.10)\n", "Requirement already satisfied: webencodings in /usr/local/lib/python3.9/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.1)\n", "Requirement already satisfied: packaging in /usr/local/lib/python3.9/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (21.3)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.9/site-packages (from packaging->bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.4.7)\n", "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 21.3.1 is available.\n", "You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\n", "Requirement already satisfied: torchtext in /usr/local/lib/python3.9/site-packages (0.11.1)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.9/site-packages (from torchtext) (2.26.0)\n", "Requirement already satisfied: tqdm in /usr/local/lib/python3.9/site-packages (from torchtext) (4.62.3)\n", "Requirement already satisfied: numpy in /usr/local/lib/python3.9/site-packages (from torchtext) (1.19.5)\n", "Requirement already satisfied: torch==1.10.1 in /usr/local/lib/python3.9/site-packages (from torchtext) (1.10.1)\n", "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/site-packages (from torch==1.10.1->torchtext) (3.7.4.3)\n", "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/site-packages (from requests->torchtext) (3.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/site-packages (from requests->torchtext) (2021.10.8)\n", "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/site-packages (from requests->torchtext) (1.26.7)\n", "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/site-packages (from requests->torchtext) (2.0.9)\n", "\u001b[33mWARNING: You are using pip version 21.0.1; however, version 21.3.1 is available.\n", "You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.\u001b[0m\n" ] } ], "source": [ "!pip3 install transformers\n", "!pip3 install ipywidgets --user\n", "!pip3 install torchtext #we will also use this later on for the data, so install it if you don't have it" ] }, { "cell_type": "markdown", "id": "550bc2f1", "metadata": {}, "source": [ "### Important:\n", "\n", "After this instalation --> don't forget to restart the kernel of the jupyter" ] }, { "cell_type": "markdown", "id": "8e15dc4c", "metadata": {}, "source": [ "## Examples of transformer architectures available\n", "\n", "As menioned above, the Transformer architecture was introduced in the paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762), 2017, in which the focus of the original research was on translation tasks using encoder-decoder blocks. \n", "\n", "This was then followed by the introduction of several influential models, including encoder only models (e.g., BERT) and decoder only models (e.g., GPT), and there have been also a variety of encoder-decoder transformer based models (e.g., BART and T5). " ] }, { "cell_type": "markdown", "id": "7d609d06", "metadata": {}, "source": [ "### Encoder" ] }, { "cell_type": "markdown", "id": "259a6701", "metadata": {}, "source": [ "Encoder models use only the encoder block of a Transformer model. \n", "\n", "These models are often characterized as having β€œbi-directional” attention, and are often called auto-encoding models.\n", "\n", "Encoder models are best suited for tasks requiring an understanding of the full sentence, such as:\n", " - sentence classification\n", " - named entity recognition (word classification in general), \n", " - extractive question answering\n", " - sentence representation (contextual embeddings)\n", "\n", " \n", "The pretraining of these models usually revolves around somehow corrupting a given sentence (for instance, by masking random words in it) and tasking the model with finding or reconstructing the initial sentence.\n", "\n", "\n", "There a variaty of encoder models available at Hugging Face. Some examples include:\n", "\n", "- [BERT](https://huggingface.co/docs/transformers/model_doc/bert)\n", "- [DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)\n", "- [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)\n", "\n", "Hugging Face also provides recent vision and multi-modal encoder models (see at the end of this notebook):\n", "- [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)\n", "- [LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)\n", "- [VISUALBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)\n", "- [CLIP](https://huggingface.co/docs/transformers/model_doc/clip)" ] }, { "cell_type": "markdown", "id": "cdcbd0be", "metadata": {}, "source": [ "### Decoder\n", "\n", "Decoder models use only the decoder of a Transformer model. At each stage, for a given word, the self-attention layers can only access the words positioned before it in the sentence. These models are often called auto-regressive models.\n", "\n", "The pretraining of decoder models usually revolves around predicting the next word in the sentence.\n", "\n", "These models are best suited for tasks involving text generation.\n", "\n", "Representatives of this family of models include:\n", "\n", "Some examples of decoder models available in Hugging Face include:\n", "- [GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)\n", "- [GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)\n", "- [Transformer XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)" ] }, { "cell_type": "markdown", "id": "5499b5e1", "metadata": {}, "source": [ "## Encoder-decoder models\n", "\n", "Encoder-decoder models (also called sequence-to-sequence models) use both parts of the Transformer architecture. \n", "\n", "Sequence-to-sequence models are best suited for tasks revolving around generating new sentences depending on a given input (conditional text generation), such as:\n", "- summarization\n", "- translation\n", "- or generative question answering\n", "\n", "Representatives of this family of models include:\n", "\n", "- [BART](https://huggingface.co/docs/transformers/model_doc/bart)\n", "- [Marian](https://huggingface.co/docs/transformers/model_doc/marian)\n", "- [T5](https://huggingface.co/docs/transformers/model_doc/t5)" ] }, { "cell_type": "markdown", "id": "4413d4ed", "metadata": {}, "source": [ "# Pipeline" ] }, { "cell_type": "markdown", "id": "4ac0324d", "metadata": {}, "source": [ "The most basic object in the πŸ€— Transformers library is the pipeline() function. It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer.\n", "\n", "See all details of pipelone here: https://huggingface.co/docs/transformers/v4.16.0/en/main_classes/pipelines#transformers.TranslationPipeline" ] }, { "cell_type": "code", "execution_count": 1, "id": "f4809123", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1af0f58487864927a94884d311838610", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading: 0%| | 0.00/629 [00:00 Preprocessing with a tokenizer: The text is preprocessed into a format the model can understand.\n", "2. Going through the model: The preprocessed inputs are passed to the model.\n", "3. Postprocessing the output\n", ": The predictions of the model are post-processed, so you can make sense of them." ] }, { "cell_type": "markdown", "id": "585fd991", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "673759c2", "metadata": {}, "source": [ "### 1. Preprocessing with a tokenizer:" ] }, { "cell_type": "markdown", "id": "e504c406", "metadata": {}, "source": [ "Like other neural networks, Transformer models can’t process raw text directly.\n", "\n", "So the first step of our pipeline is to convert the text inputs into numbers that the model can make sense of. \n", "\n", "To do this we use a tokenizer. They serve the purpose to translate text into data that can be processed by the model.\n", "\n", "Tokenizer will be responsible for:\n", "\n", "- Splitting the input into words, subwords, or symbols (like punctuation) that are called tokens\n", "- Mapping each token to an integer\n", "- Adding additional inputs that may be useful to the model (e.g., attention mask)" ] }, { "cell_type": "markdown", "id": "84fad112", "metadata": {}, "source": [ "#### Define tokenizer " ] }, { "cell_type": "markdown", "id": "dd978608", "metadata": {}, "source": [ "The tokenizer and the model should always be from the same checkpoint. Therefore, we need to define the tokenizer with the checkpoint name of the corresponding model. \n", "\n", "\n", "To do this, we use the AutoTokenizer class and its from_pretrained() method with the checkpoint name of the corresponding model inside it.\n", "\n", "This will automatically fetch the data associated with the model’s tokenizer (as the vocabulary) and cache it (so it’s only downloaded the first time you run the code below)." ] }, { "cell_type": "code", "execution_count": 11, "id": "c67a0e36", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "29749fbf06f6489b8d7c31e2c61fb7bd", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading: 0%| | 0.00/28.0 [00:00), hidden_states=None, attentions=None)\n" ] } ], "source": [ "outputs = model(**inputs) #you give further arguments: output_attentions=True and output_hidden_states:True\n", "print(outputs)" ] }, { "cell_type": "markdown", "id": "1cb6905f", "metadata": {}, "source": [ "Note that the outputs of πŸ€— Transformers models behave like named tuples or dictionaries. \n", "\n", "You can access the elements by:\n", "- attributes (like we did) \n", "- or by key (outputs[\"last_hidden_state\"])\n", "- or even by index if you know exactly where the thing you are looking for is (outputs[0]).\n" ] }, { "cell_type": "code", "execution_count": 23, "id": "7dc44cc2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "last hidden state tensor([[[-0.1798, 0.2333, 0.6321, ..., -0.3017, 0.5008, 0.1481],\n", " [ 0.2758, 0.6497, 0.3200, ..., -0.0760, 0.5136, 0.1329],\n", " [ 0.9046, 0.0985, 0.2950, ..., 0.3352, -0.1407, -0.6464],\n", " ...,\n", " [ 0.1466, 0.5661, 0.3235, ..., -0.3376, 0.5100, -0.0561],\n", " [ 0.7500, 0.0487, 0.1738, ..., 0.4684, 0.0030, -0.6084],\n", " [ 0.0519, 0.3729, 0.5223, ..., 0.3584, 0.6500, -0.3883]],\n", "\n", " [[-0.2937, 0.7283, -0.1497, ..., -0.1187, -1.0227, -0.0422],\n", " [-0.2206, 0.9384, -0.0951, ..., -0.3643, -0.6605, 0.2407],\n", " [-0.1536, 0.8987, -0.0728, ..., -0.2189, -0.8528, 0.0710],\n", " ...,\n", " [-0.3017, 0.9002, -0.0200, ..., -0.1082, -0.8412, -0.0861],\n", " [-0.3338, 0.9674, -0.0729, ..., -0.1952, -0.8181, -0.0634],\n", " [-0.3454, 0.8824, -0.0426, ..., -0.0993, -0.8329, -0.1065]],\n", "\n", " [[-0.3841, -0.1072, 0.3243, ..., 0.2156, 0.2593, 0.0866],\n", " [ 0.3034, 0.3026, 0.2005, ..., 0.4373, 0.5169, -0.0845],\n", " [-0.4797, 0.1210, 0.5126, ..., 0.1014, 0.4086, 0.2696],\n", " ...,\n", " [-0.3598, -0.0655, 0.4618, ..., 0.3586, 0.2877, 0.0428],\n", " [-0.3949, -0.0937, 0.4477, ..., 0.3372, 0.2124, 0.0469],\n", " [-0.4066, -0.0487, 0.4370, ..., 0.3481, 0.2206, 0.1476]]],\n", " grad_fn=)\n" ] } ], "source": [ "print(\"last hidden state\", outputs.last_hidden_state)\n" ] }, { "cell_type": "markdown", "id": "8c240be2", "metadata": {}, "source": [ "Note also that given the input to the model, it will output a high-dimensional vector representing the contextual understanding of that input by the Transformer model.\n", "\n", "The vector output by the Transformer module generally has three dimensions:\n", "\n", "- Batch size: The number of sequences processed at a time (2 in our example).\n", "- Sequence length: The length of the numerical representation of the sequence (16 in our example).\n", "- Hidden size: The vector dimension of each model input.\n", "\n" ] }, { "cell_type": "code", "execution_count": 24, "id": "68774b73", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "size torch.Size([3, 16, 768])\n" ] } ], "source": [ "print(\"size\", outputs.last_hidden_state.size())" ] }, { "cell_type": "markdown", "id": "9eb55929", "metadata": {}, "source": [ "## Postprocessing the output\n", " \n", "While the hidden states that were outputed can be useful on their own, for instance: \n", "\n", "- You can have a contextual representation for the entire sentence deriving a fixed sized vector by averaging the outputs. Instead of having static embeddings like in part 1)\n", "\n", "- You can also give this hidden states of the encoder to an auto-regressive decoder that is equipped with cross-attention layers to perform condional text generation (such as MT). \n", "\n", "\n", "In this case for sentiment analysis we are more interesting in using the corresponding classification head. \n", "\n", "For our example, we will use the DistilBERT model with a sequence classification head (to be able to classify the sentences as positive or negative). So, we won’t actually use the DistilBertModel class but the DistilBertForSequenceClassification (or AutoModelForSequenceClassification class). " ] }, { "cell_type": "code", "execution_count": 25, "id": "8fe22dcc", "metadata": {}, "outputs": [], "source": [ "from transformers import DistilBertForSequenceClassification, DistilBertTokenizer\n", "\n", "tokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased-finetuned-sst-2-english\")\n", "raw_inputs = [\n", " \"I've been waiting for a HuggingFace course my whole life.\",\n", " \"I hate this so much!\",\n", " \"I find this very easy!\"\n", "]\n", "inputs = tokenizer(\n", " raw_inputs,\n", " padding=True, #since the sentences might not have the same size, don't forget to padding. \n", " return_tensors=\"pt\" #to return with pytorch tensors\n", ")\n", "\n", "model = DistilBertForSequenceClassification.from_pretrained(\"distilbert-base-uncased-finetuned-sst-2-english\")\n", "outputs = model(**inputs)\n", "logits = outputs.logits.detach()" ] }, { "cell_type": "code", "execution_count": 26, "id": "cf4f2a94", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tensor([[-1.5607, 1.6123],\n", " [ 4.1692, -3.3464],\n", " [-3.0649, 3.0434]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "logits" ] }, { "cell_type": "markdown", "id": "be44f7f8", "metadata": {}, "source": [ "Those are not probabilities but logits, the raw, unnormalized scores outputted by the last layer of the model. \n", "\n", "To be converted to probabilities, they need to go through a SoftMax layer" ] }, { "cell_type": "code", "execution_count": 35, "id": "560642cd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[4.0195e-02, 9.5981e-01],\n", " [9.9946e-01, 5.4419e-04],\n", " [2.2193e-03, 9.9778e-01]])\n" ] } ], "source": [ "predictions = torch.nn.functional.softmax(logits, dim=-1)\n", "print(predictions)" ] }, { "cell_type": "code", "execution_count": 36, "id": "08542ca7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I've been waiting for a HuggingFace course my whole life. POSITIVE\n", "I hate this so much! NEGATIVE\n", "I find this very easy! POSITIVE\n" ] }, { "data": { "text/plain": [ "{0: 'NEGATIVE', 1: 'POSITIVE'}" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_inputs\n", "for i in range(len(predictions)):\n", " if predictions[i][1].item()>=0.5:\n", " print(raw_inputs[i],model.config.id2label[1])\n", " else:\n", " print(raw_inputs[i],model.config.id2label[0])\n", "\n", "model.config.id2label" ] }, { "cell_type": "markdown", "id": "cd7c0c1e", "metadata": {}, "source": [ "We have successfully reproduced the three steps of the pipeline: preprocessing with tokenizers, passing the inputs through the model, and postprocessing! πŸ€—\n" ] }, { "cell_type": "markdown", "id": "6517b868", "metadata": {}, "source": [ "# Fine-tuning" ] }, { "cell_type": "markdown", "id": "3586063b", "metadata": {}, "source": [ "To fine-tuning a transformer model to our specific dataset (using corresponding labels), you can just use the core of the training loop that we've been using since the Pytorch pratical:\n", "\n", "1. Put the model and the corresponding data (batches) into the corresponding device (\"cpu\" or \"gpu\")\n", "2. When iterate over the corresponding batches, each update to the model involves the same general pattern comprised of:\n", " - Clearing the last error gradient. ( stop gradient accumulation )\n", " - A forward pass of the input through the model\n", " - Calculating the loss for the model output.\n", " - Backpropagating the error through the model.\n", " - Update the model in an effort to reduce loss.\n", "\n", " \n" ] }, { "cell_type": "markdown", "id": "92669e3a", "metadata": {}, "source": [ "There's just a slightly difference: transformer models are usually trained with a learning rate scheduler.\n", "\n", "For instance, a default learning rate scheduler is a linear decay from the maximum value (5e-5) to 0. To properly define it, we need to know the number of training steps we will take, which is the number of epochs we want to run multiplied by the number of training batches (which is the length of our training dataloader). " ] }, { "cell_type": "markdown", "id": "b92a15f7", "metadata": {}, "source": [ "\n", " from transformers import get_scheduler\n", "\n", " num_epochs = 3\n", " num_training_steps = num_epochs * len(train_dataloader)\n", " lr_scheduler = get_scheduler(\n", " \"linear\",\n", " optimizer=optimizer,\n", " num_warmup_steps=0,\n", " num_training_steps=num_training_steps,\n", " )" ] }, { "cell_type": "markdown", "id": "abcf3024", "metadata": {}, "source": [ "Summing up, to fine-tune or training or model from scratch we could simply use the core of the training loop that we've been using:" ] }, { "cell_type": "markdown", "id": "cd21efe4", "metadata": {}, "source": [ " model.train()\n", " for epoch in range(num_epochs):\n", " for batch in train_dataloader:\n", " \n", " outputs = model(**batch)\n", " \n", " # you can use a given pytorch loss (torch.nn.CrossEntropyLoss)\n", " # or you can also use outputs.loss if you give \"labels\" directly to your model\n", " loss = criterion(outputs) \n", " \n", " loss.backward()\n", " \n", " optimizer.step()\n", " \n", " lr_scheduler.step()\n", " \n", " optimizer.zero_grad()\n", " " ] }, { "cell_type": "markdown", "id": "889cdcd3", "metadata": {}, "source": [ "Alternative, in case you would prefer to avoid defining manually this training loop, Hugging Face Transformers also provides a [Trainer API](https://huggingface.co/course/chapter3/3?fw=pt) that does that for you." ] }, { "cell_type": "markdown", "id": "73427319", "metadata": {}, "source": [ "# Vision and Multi-modal Transformers" ] }, { "cell_type": "markdown", "id": "c4b0f3a7", "metadata": {}, "source": [ "Transformer architecture are currently achieving state-of-the-art results not just on a variety of language processing, but also on computer vision.\n", "\n", "Hugging Face contains the very recent [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) model from Google. \n", "\n", "Lets try it out for image classification with our own images:" ] }, { "cell_type": "code", "execution_count": 37, "id": "2d104bca", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "shape torch.Size([480, 640, 3])\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import torchvision.transforms.functional as TF\n", "from PIL import Image\n", "import requests\n", "import matplotlib.pyplot as plt\n", "\n", "\n", "# Choose your image here\n", "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", "image = Image.open(requests.get(url, stream=True).raw)\n", "\n", "image_tensor = TF.to_tensor(image)\n", "image_tensor = image_tensor.permute(\n", " [1,2,0]) # make the color dimention the last one.\n", "print(\"shape\", image_tensor.shape)\n", "plt.imshow(image_tensor)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "38b1f225", "metadata": {}, "source": [ "We can predict the class of the chosen image with the same logic that we saw before:\n", " 1. define \"tokenizer\" \n", " - Actually a feature extractor since we are processing images\n", " 2. define the model\n", " 3. postprocessing the output" ] }, { "cell_type": "code", "execution_count": 38, "id": "f73c56f4", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "101b3ef70975461c92934aae0244a66e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading: 0%| | 0.00/160 [00:00)\n" ] } ], "source": [ "print(\"last_hidden_states\", last_hidden_states) " ] }, { "cell_type": "markdown", "id": "b1fb5cab", "metadata": {}, "source": [ "Hugging Face has also been incorporating multi-modal encoders (vision and text), including VisualBERT, LXMERT and even the CLIP model from openai.\n", "\n", "With CLIP we can for instance predict image-text similarity:" ] }, { "cell_type": "code", "execution_count": null, "id": "a239ca36", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "af91cdc333934fba9134b79818c6ea28", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading: 0%| | 0.00/4.03k [00:00