From 52fbdb01800723ff12f5498673ccf8ed6acbc7e4 Mon Sep 17 00:00:00 2001 From: andrewrgarcia Date: Sun, 22 Jan 2023 15:21:52 -0500 Subject: [PATCH 1/6] adapt notebook to 2023 (Google Colab) --- ...ral_Language_Processing_with_Pytorch.ipynb | 2422 +++++++++++++++++ 1 file changed, 2422 insertions(+) create mode 100644 Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb diff --git a/Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb b/Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb new file mode 100644 index 0000000..4148e6d --- /dev/null +++ b/Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb @@ -0,0 +1,2422 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "fL5F_iyFvH9s" + }, + "source": [ + "# Deep Learning for Natural Language Processing with Pytorch\n", + "This tutorial will walk you through the key ideas of deep learning programming using Pytorch.\n", + "Many of the concepts (such as the computation graph abstraction and autograd) are not unique to Pytorch and are relevant to any deep learning tool kit out there.\n", + "\n", + "I am writing this tutorial to focus specifically on NLP for people who have never written code in any deep learning framework (e.g, TensorFlow, Theano, Keras, Dynet). It assumes working knowledge of core NLP problems: part-of-speech tagging, language modeling, etc. It also assumes familiarity with neural networks at the level of an intro AI class (such as one from the Russel and Norvig book). Usually, these courses cover the basic backpropagation algorithm on feed-forward neural networks, and make the point that they are chains of compositions of linearities and non-linearities. This tutorial aims to get you started writing deep learning code, given you have this prerequisite knowledge.\n", + "\n", + "Note this is about *models*, not data. For all of the models, I just create a few test examples with small dimensionality so you can see how the weights change as it trains. If you have some real data you want to try, you should be able to rip out any of the models from this notebook and use them on it." + ] + }, + { + "cell_type": "code", + "source": [ + "# install needed modules\n", + "# import sys\n", + "# if 'google.colab' in sys.modules:\n", + "# %pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html\n", + "# %pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html" + ], + "metadata": { + "id": "SPomhYdxyS90" + }, + "execution_count": 1, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "GjlsQX1nvH9x", + "outputId": "bffd4fde-ec86-428b-bd20-dba28b0279a1" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "" + ] + }, + "metadata": {}, + "execution_count": 2 + } + ], + "source": [ + "import torch\n", + "import torch.autograd as autograd\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "import torch.optim as optim\n", + "\n", + "torch.manual_seed(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lGZep2pIvH90" + }, + "source": [ + "# 1. Introduction to Torch's tensor library" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c2UJIM7pvH92" + }, + "source": [ + "All of deep learning is computations on tensors, which are generalizations of a matrix that can be indexed in more than 2 dimensions. We will see exactly what this means in-depth later. First, lets look what we can do with tensors." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iZMWTR-KvH93" + }, + "source": [ + "### Creating Tensors\n", + "Tensors can be created from Python lists with the torch.Tensor() function." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0jx6vh43vH95", + "outputId": "5db25a53-a6af-401c-d77e-f270aca23de7" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([1., 2., 3.])\n", + "tensor([[1., 2., 3.],\n", + " [4., 5., 6.]])\n", + "tensor([[[1., 2.],\n", + " [3., 4.]],\n", + "\n", + " [[5., 6.],\n", + " [7., 8.]]])\n" + ] + } + ], + "source": [ + "# Create a torch.Tensor object with the given data. It is a 1D vector\n", + "V_data = [1., 2., 3.]\n", + "V = torch.Tensor(V_data)\n", + "print(V)\n", + "\n", + "# Creates a matrix\n", + "M_data = [[1., 2., 3.], [4., 5., 6]]\n", + "M = torch.Tensor(M_data)\n", + "print(M)\n", + "\n", + "# Create a 3D tensor of size 2x2x2.\n", + "T_data = [[[1.,2.], [3.,4.]],\n", + " [[5.,6.], [7.,8.]]]\n", + "T = torch.Tensor(T_data)\n", + "print(T) " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GK3K3Qs6vH96" + }, + "source": [ + "What is a 3D tensor anyway?\n", + "Think about it like this.\n", + "If you have a vector, indexing into the vector gives you a scalar. If you have a matrix, indexing into the matrix gives you a vector. If you have a 3D tensor, then indexing into the tensor gives you a matrix!\n", + "\n", + "A note on terminology: when I say \"tensor\" in this tutorial, it refers to any torch.Tensor object. Vectors and matrices are special cases of torch.Tensors, where their dimension is 1 and 2 respectively. When I am talking about 3D tensors, I will explicitly use the term \"3D tensor\"." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "d_lmP8pkvH97", + "outputId": "151df143-e9b9-4e4e-82dc-48da9addd058" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor(1.)\n", + "tensor([1., 2., 3.])\n", + "tensor([[1., 2.],\n", + " [3., 4.]])\n" + ] + } + ], + "source": [ + "# Index into V and get a scalar\n", + "print(V[0])\n", + "\n", + "# Index into M and get a vector\n", + "print(M[0])\n", + "\n", + "# Index into T and get a matrix\n", + "print(T[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Y0UM61QMvH99" + }, + "source": [ + "You can also create tensors of other datatypes. The default, as you can see, is Float.\n", + "To create a tensor of integer types, try torch.LongTensor(). Check the documentation for more data types, but Float and Long will be the most common." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "E7vRM0IPvH9-" + }, + "source": [ + "You can create a tensor with random data and the supplied dimensionality with torch.randn()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "wZK_FfwtvH9-", + "outputId": "41106698-554b-4a19-80dc-1325284c784e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[[-1.5256, -0.7502, -0.6540, -1.6095, -0.1002],\n", + " [-0.6092, -0.9798, -1.6091, -0.7121, 0.3037],\n", + " [-0.7773, -0.2515, -0.2223, 1.6871, 0.2284],\n", + " [ 0.4676, -0.6970, -1.1608, 0.6995, 0.1991]],\n", + "\n", + " [[ 0.8657, 0.2444, -0.6629, 0.8073, 1.1017],\n", + " [-0.1759, -2.2456, -1.4465, 0.0612, -0.6177],\n", + " [-0.7981, -0.1316, 1.8793, -0.0721, 0.1578],\n", + " [-0.7735, 0.1991, 0.0457, 0.1530, -0.4757]],\n", + "\n", + " [[-0.1110, 0.2927, -0.1578, -0.0288, 0.4533],\n", + " [ 1.1422, 0.2486, -1.7754, -0.0255, -1.0233],\n", + " [-0.5962, -1.0055, 0.4285, 1.4761, -1.7869],\n", + " [ 1.6103, -0.7040, -0.1853, -0.9962, -0.8313]]])\n" + ] + } + ], + "source": [ + "x = torch.randn((3, 4, 5))\n", + "print(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qKIIteJdvH9_" + }, + "source": [ + "### Operations with Tensors\n", + "You can operate on tensors in the ways you would expect." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Pw3S0abWvH-A", + "outputId": "b9c3fa41-49e3-4f1c-e654-4b941d552023" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([5., 7., 9.])\n" + ] + } + ], + "source": [ + "x = torch.Tensor([ 1., 2., 3. ])\n", + "y = torch.Tensor([ 4., 5., 6. ])\n", + "z = x + y\n", + "print(z)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MCji-O6bvH-A" + }, + "source": [ + "See [the documentation](http://pytorch.org/docs/torch.html) for a complete list of the massive number of operations available to you. They expand beyond just mathematical operations.\n", + "\n", + "One helpful operation that we will make use of later is concatenation." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "xQXgGgqjvH-B", + "outputId": "a2b49a6d-be3f-47d9-e7fa-5df4de7ba637" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.8029, 0.2366, 0.2857, 0.6898, -0.6331],\n", + " [ 0.8795, -0.6842, 0.4533, 0.2912, -0.8317],\n", + " [-0.5525, 0.6355, -0.3968, -0.6571, -1.6428],\n", + " [ 0.9803, -0.0421, -0.8206, 0.3133, -1.1352],\n", + " [ 0.3773, -0.2824, -2.5667, -1.4303, 0.5009]])\n", + "tensor([[ 0.5438, -0.4057, 1.1341, -0.1473, 0.6272, 1.0935, 0.0939, 1.2381],\n", + " [-1.1115, 0.3501, -0.7703, -1.3459, 0.5119, -0.6933, -0.1668, -0.9999]])\n" + ] + } + ], + "source": [ + "# By default, it concatenates along the first axis (concatenates rows)\n", + "x_1 = torch.randn(2, 5)\n", + "y_1 = torch.randn(3, 5)\n", + "z_1 =torch.cat([x_1, y_1])\n", + "print(z_1)\n", + "\n", + "# Concatenate columns:\n", + "x_2 = torch.randn(2, 3)\n", + "y_2 = torch.randn(2, 5)\n", + "z_2 = torch.cat([x_2, y_2], 1) # second arg specifies which axis to concat along\n", + "print(z_2)\n", + "\n", + "# If your tensors are not compatible, torch will complain. Uncomment to see the error\n", + "# torch.cat([x_1, x_2])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HGBFOW5bvH-C" + }, + "source": [ + "### Reshaping Tensors\n", + "Use the .view() method to reshape a tensor.\n", + "This method receives heavy use, because many neural network components expect their inputs to have a certain shape.\n", + "Often you will need to reshape before passing your data to the component." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fFpq2JnZvH-D", + "outputId": "9e59b975-70a4-4bdd-dc1e-5a75aa3c866e" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[[ 0.4175, -0.2127, -0.8400, -0.4200],\n", + " [-0.6240, -0.9773, 0.8748, 0.9873],\n", + " [-0.0594, -2.4919, 0.2423, 0.2883]],\n", + "\n", + " [[-0.1095, 0.3126, 1.5038, 0.5038],\n", + " [ 0.6223, -0.4481, -0.2856, 0.3880],\n", + " [-1.1435, -0.6512, -0.1032, 0.6937]]])\n", + "tensor([[ 0.4175, -0.2127, -0.8400, -0.4200, -0.6240, -0.9773, 0.8748, 0.9873,\n", + " -0.0594, -2.4919, 0.2423, 0.2883],\n", + " [-0.1095, 0.3126, 1.5038, 0.5038, 0.6223, -0.4481, -0.2856, 0.3880,\n", + " -1.1435, -0.6512, -0.1032, 0.6937]])\n", + "tensor([[ 0.4175, -0.2127, -0.8400, -0.4200, -0.6240, -0.9773, 0.8748, 0.9873,\n", + " -0.0594, -2.4919, 0.2423, 0.2883],\n", + " [-0.1095, 0.3126, 1.5038, 0.5038, 0.6223, -0.4481, -0.2856, 0.3880,\n", + " -1.1435, -0.6512, -0.1032, 0.6937]])\n" + ] + } + ], + "source": [ + "x = torch.randn(2, 3, 4)\n", + "print(x)\n", + "print(x.view(2, 12)) # Reshape to 2 rows, 12 columns\n", + "print(x.view(2, -1)) # Same as above. If one of the dimensions is -1, its size can be inferred" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YSPIFD1_vH-D" + }, + "source": [ + "\n", + "# 2. Computation Graphs and Automatic Differentiation" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tqdoFVA0vH-E" + }, + "source": [ + "The concept of a computation graph is essential to efficient deep learning programming, because it allows you to not have to write the back propagation gradients yourself. A computation graph is simply a specification of how your data is combined to give you the output. Since the graph totally specifies what parameters were involved with which operations, it contains enough information to compute derivatives. This probably sounds vague, so lets see what is going on using the fundamental class of Pytorch: autograd.Variable.\n", + "\n", + "First, think from a programmers perspective. What is stored in the torch.Tensor objects we were creating above?\n", + "Obviously the data and the shape, and maybe a few other things. But when we added two tensors together, we got an output tensor. All this output tensor knows is its data and shape. It has no idea that it was the sum of two other tensors (it could have been read in from a file, it could be the result of some other operation, etc.)\n", + "\n", + "The Variable class keeps track of how it was created. Lets see it in action." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "1wm4TJtEvH-E", + "outputId": "37aa0b1c-ab36-4073-c4cc-3a0a3f0bcfc0" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([1., 2., 3.])\n", + "tensor([5., 7., 9.])\n", + "\n" + ] + } + ], + "source": [ + "# Variables wrap tensor objects\n", + "x = autograd.Variable( torch.Tensor([1., 2., 3]), requires_grad=True )\n", + "# You can access the data with the .data attribute\n", + "print(x.data)\n", + "\n", + "# You can also do all the same operations you did with tensors with Variables.\n", + "y = autograd.Variable( torch.Tensor([4., 5., 6]), requires_grad=True )\n", + "z = x + y\n", + "print(z.data)\n", + "\n", + "# BUT z knows something extra.\n", + "print(z.grad_fn)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "e7D5BmX_vH-F" + }, + "source": [ + "So Variables know what created them. z knows that it wasn't read in from a file, it wasn't the result of a multiplication or exponential or whatever. And if you keep following z.grad_fn, you will find yourself at x and y.\n", + "\n", + "But how does that help us compute a gradient?" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "0kDriTUVvH-F", + "outputId": "4624ce21-6097-493f-9e41-84e9d216c5e6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor(21., grad_fn=)\n", + "\n" + ] + } + ], + "source": [ + "# Lets sum up all the entries in z\n", + "s = z.sum()\n", + "print(s)\n", + "print(s.grad_fn)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bRVl6VzBvH-G" + }, + "source": [ + "So now, what is the derivative of this sum with respect to the first component of x? In math, we want\n", + "$$ \\frac{\\partial s}{\\partial x_0} $$\n", + "Well, s knows that it was created as a sum of the tensor z. z knows that it was the sum x + y.\n", + "So \n", + "$$ s = \\overbrace{x_0 + y_0}^\\text{$z_0$} + \\overbrace{x_1 + y_1}^\\text{$z_1$} + \\overbrace{x_2 + y_2}^\\text{$z_2$} $$\n", + "And so s contains enough information to determine that the derivative we want is 1!\n", + "\n", + "Of course this glosses over the challenge of how to actually compute that derivative. The point here is that s is carrying along enough information that it is possible to compute it. In reality, the developers of Pytorch program the sum() and + operations to know how to compute their gradients, and run the back propagation algorithm. An in-depth discussion of that algorithm is beyond the scope of this tutorial." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iL0s-p9YvH-G" + }, + "source": [ + "Lets have Pytorch compute the gradient, and see that we were right: (note if you run this block multiple times, the gradient will increment. That is because Pytorch *accumulates* the gradient into the .grad property, since for many models this is very convenient.)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "xDoeZQ1qvH-H", + "outputId": "390b74d4-c536-4895-a709-0e8ef72877cb" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([1., 1., 1.])\n" + ] + } + ], + "source": [ + "s.backward() # calling .backward() on any variable will run backprop, starting from it.\n", + "print(x.grad)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DgvG01lavH-H" + }, + "source": [ + "Understanding what is going on in the block below is crucial for being a successful programmer in deep learning." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "KHJ2-qbavH-I", + "outputId": "9bc6d25b-2c85-41eb-ae38-737cfeb1c348" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "None\n", + "None\n" + ] + } + ], + "source": [ + "x = torch.randn((2,2))\n", + "y = torch.randn((2,2))\n", + "z = x + y # These are Tensor types, and backprop would not be possible\n", + "\n", + "var_x = autograd.Variable( x )\n", + "var_y = autograd.Variable( y )\n", + "var_z = var_x + var_y # var_z contains enough information to compute gradients, as we saw above\n", + "print(var_z.grad_fn)\n", + "\n", + "var_z_data = var_z.data # Get the wrapped Tensor object out of var_z...\n", + "new_var_z = autograd.Variable( var_z_data ) # Re-wrap the tensor in a new variable\n", + "\n", + "# ... does new_var_z have information to backprop to x and y?\n", + "# NO!\n", + "print(new_var_z.grad_fn)\n", + "# And how could it? We yanked the tensor out of var_z (that is what var_z.data is). This tensor\n", + "# doesn't know anything about how it was computed. We pass it into new_var_z, and this is all the information\n", + "# new_var_z gets. If var_z_data doesn't know how it was computed, theres no way new_var_z will.\n", + "# In essence, we have broken the variable away from its past history" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2uakgaC8vH-I" + }, + "source": [ + "Here is the basic, extremely important rule for computing with autograd.Variables (note this is more general than Pytorch. There is an equivalent object in every major deep learning toolkit):\n", + "\n", + "** If you want the error from your loss function to backpropogate to a component of your network, you MUST NOT break the Variable chain from that component to your loss Variable. If you do, the loss will have no idea your component exists, and its parameters can't be updated. **\n", + "\n", + "I say this in bold, because this error can creep up on you in very subtle ways (I will show some such ways below), and it will not cause your code to crash or complain, so you must be careful." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wkISCzyhvH-I" + }, + "source": [ + "# 3. Deep Learning Building Blocks: Affine maps, non-linearities and objectives" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WTusZIDZvH-J" + }, + "source": [ + "Deep learning consists of composing linearities with non-linearities in clever ways. The introduction of non-linearities allows for powerful models. In this section, we will play with these core components, make up an objective function, and see how the model is trained." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZiI2Z_WtvH-J" + }, + "source": [ + "### Affine Maps\n", + "One of the core workhorses of deep learning is the affine map, which is a function $f(x)$ where\n", + "$$ f(x) = Ax + b $$ for a matrix $A$ and vectors $x, b$. The parameters to be learned here are $A$ and $b$. Often, $b$ is refered to as the *bias* term." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EeSzZicJvH-J" + }, + "source": [ + "Pytorch and most other deep learning frameworks do things a little differently than traditional linear algebra. It maps the rows of the input instead of the columns. That is, the $i$'th row of the output below is the mapping of the $i$'th row of the input under $A$, plus the bias term. Look at the example below." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JJGlaziPvH-K", + "outputId": "d53ba1ac-c587-41f5-800b-65bb5d0aba3a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.6831, 0.3639, -0.7709],\n", + " [ 0.6161, 1.2096, -0.3063]], grad_fn=)\n" + ] + } + ], + "source": [ + "lin = nn.Linear(5, 3) # maps from R^5 to R^3, parameters A, b\n", + "data = autograd.Variable( torch.randn(2, 5) ) # data is 2x5. A maps from 5 to 3... can we map \"data\" under A?\n", + "print(lin(data)) # yes" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FIgvvGjQvH-K" + }, + "source": [ + "### Non-Linearities\n", + "First, note the following fact, which will explain why we need non-linearities in the first place.\n", + "Suppose we have two affine maps $f(x) = Ax + b$ and $g(x) = Cx + d$. What is $f(g(x))$?\n", + "$$ f(g(x)) = A(Cx + d) + b = ACx + (Ad + b) $$\n", + "$AC$ is a matrix and $Ad + b$ is a vector, so we see that composing affine maps gives you an affine map.\n", + "\n", + "From this, you can see that if you wanted your neural network to be long chains of affine compositions, that this adds no new power to your model than just doing a single affine map.\n", + "\n", + "If we introduce non-linearities in between the affine layers, this is no longer the case, and we can build much more powerful models.\n", + "\n", + "There are a few core non-linearities. $\\tanh(x), \\sigma(x), \\text{ReLU}(x)$ are the most common.\n", + "You are probably wondering: \"why these functions? I can think of plenty of other non-linearities.\"\n", + "The reason for this is that they have gradients that are easy to compute, and computing gradients is essential for learning. For example\n", + "$$ \\frac{d\\sigma}{dx} = \\sigma(x)(1 - \\sigma(x)) $$\n", + "\n", + "A quick note: although you may have learned some neural networks in your intro to AI class where $\\sigma(x)$ was the default non-linearity, typically people shy away from it in practice. This is because the gradient *vanishes* very quickly as the absolute value of the argument grows. Small gradients means it is hard to learn. Most people default to tanh or ReLU." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "qS5597VEvH-L", + "outputId": "2b13a682-348e-44dd-e194-038a4c8fa0bb" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.1024, -0.8491],\n", + " [ 0.1112, 0.1618]])\n", + "tensor([[0.0000, 0.0000],\n", + " [0.1112, 0.1618]])\n" + ] + } + ], + "source": [ + "# In pytorch, most non-linearities are in torch.functional (we have it imported as F)\n", + "# Note that non-linearites typically don't have parameters like affine maps do.\n", + "# That is, they don't have weights that are updated during training.\n", + "data = autograd.Variable( torch.randn(2, 2) )\n", + "print(data)\n", + "print(F.relu(data))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cFHLjiNevH-L" + }, + "source": [ + "### Softmax and Probabilities\n", + "The function $\\text{Softmax}(x)$ is also just a non-linearity, but it is special in that it usually is the last operation done in a network. This is because it takes in a vector of real numbers and returns a probability distribution. Its definition is as follows. Let $x$ be a vector of real numbers (positive, negative, whatever, there are no constraints). Then the i'th component of $\\text{Softmax}(x)$ is\n", + "$$ \\frac{\\exp(x_i)}{\\sum_j \\exp(x_j)} $$\n", + "It should be clear that the output is a probability distribution: each element is non-negative and the sum over all components is 1.\n", + "\n", + "You could also think of it as just applying an element-wise exponentiation operator to the input to make everything non-negative and then dividing by the normalization constant." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "pJ-XchHvvH-M", + "outputId": "3b7db16e-c41e-46fc-aa92-3b38f7e307b6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([-1.4105, -0.3404, -3.0121, 0.5710, 1.4330])\n", + "tensor([0.0350, 0.1021, 0.0071, 0.2541, 0.6017])\n", + "tensor(1.)\n", + "tensor([-3.3515, -2.2815, -4.9531, -1.3700, -0.5080])\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":4: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " print(F.softmax(data))\n", + ":5: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " print(F.softmax(data).sum()) # Sums to 1 because it is a distribution!\n", + ":6: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " print(F.log_softmax(data)) # theres also log_softmax\n" + ] + } + ], + "source": [ + "# Softmax is also in torch.functional\n", + "data = autograd.Variable( torch.randn(5) )\n", + "print(data)\n", + "print(F.softmax(data))\n", + "print(F.softmax(data).sum()) # Sums to 1 because it is a distribution!\n", + "print(F.log_softmax(data)) # theres also log_softmax" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VbEd74bgvH-M" + }, + "source": [ + "### Objective Functions\n", + "The objective function is the function that your network is being trained to minimize (in which case it is often called a *loss function* or *cost function*).\n", + "This proceeds by first choosing a training instance, running it through your neural network, and then computing the loss of the output. The parameters of the model are then updated by taking the derivative of the loss function. Intuitively, if your model is completely confident in its answer, and its answer is wrong, your loss will be high. If it is very confident in its answer, and its answer is correct, the loss will be low.\n", + "\n", + "The idea behind minimizing the loss function on your training examples is that your network will hopefully generalize well and have small loss on unseen examples in your dev set, test set, or in production.\n", + "An example loss function is the *negative log likelihood loss*, which is a very common objective for multi-class classification. For supervised multi-class classification, this means training the network to minimize the negative log probability of the correct output (or equivalently, maximize the log probability of the correct output)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ULdd3wbXvH-M" + }, + "source": [ + "# 4. Optimization and Training" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vxo9SC0nvH-M" + }, + "source": [ + "So what we can compute a loss function for an instance? What do we do with that?\n", + "We saw earlier that autograd.Variable's know how to compute gradients with respect to the things that were used to compute it. Well, since our loss is an autograd.Variable, we can compute gradients with respect to all of the parameters used to compute it! Then we can perform standard gradient updates. Let $\\theta$ be our parameters, $L(\\theta)$ the loss function, and $\\eta$ a positive learning rate. Then:\n", + "\n", + "$$ \\theta^{(t+1)} = \\theta^{(t)} - \\eta \\nabla_\\theta L(\\theta) $$\n", + "\n", + "There are a huge collection of algorithms and active research in attempting to do something more than just this vanilla gradient update. Many attempt to vary the learning rate based on what is happening at train time. You don't need to worry about what specifically these algorithms are doing unless you are really interested. Torch provies many in the torch.optim package, and they are all completely transparent. Using the simplest gradient update is the same as the more complicated algorithms. Trying different update algorithms and different parameters for the update algorithms (like different initial learning rates) is important in optimizing your network's performance. Often, just replacing vanilla SGD with an optimizer like Adam or RMSProp will boost performance noticably." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TQfhQL_ivH-N" + }, + "source": [ + "# 5. Creating Network Components in Pytorch\n", + "Before we move on to our focus on NLP, lets do an annotated example of building a network in Pytorch using only affine maps and non-linearities. We will also see how to compute a loss function, using Pytorch's built in negative log likelihood, and update parameters by backpropagation.\n", + "\n", + "All network components should inherit from nn.Module and override the forward() method. That is about it, as far as the boilerplate is concerned. Inheriting from nn.Module provides functionality to your component. For example, it makes it keep track of its trainable parameters, you can swap it between CPU and GPU with the .cuda() or .cpu() functions, etc.\n", + "\n", + "Let's write an annotated example of a network that takes in a sparse bag-of-words representation and outputs a probability distribution over two labels: \"English\" and \"Spanish\". This model is just logistic regression." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lC3SA7TnvH-N" + }, + "source": [ + "### Example: Logistic Regression Bag-of-Words classifier\n", + "Our model will map a sparse BOW representation to log probabilities over labels. We assign each word in the vocab an index. For example, say our entire vocab is two words \"hello\" and \"world\", with indices 0 and 1 respectively.\n", + "The BoW vector for the sentence \"hello hello hello hello\" is\n", + "$$ \\left[ 4, 0 \\right] $$\n", + "For \"hello world world hello\", it is \n", + "$$ \\left[ 2, 2 \\right] $$\n", + "etc.\n", + "In general, it is\n", + "$$ \\left[ \\text{Count}(\\text{hello}), \\text{Count}(\\text{world}) \\right] $$\n", + "\n", + "Denote this BOW vector as $x$.\n", + "The output of our network is:\n", + "$$ \\log \\text{Softmax}(Ax + b) $$\n", + "That is, we pass the input through an affine map and then do log softmax." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "14KQExwRvH-N", + "outputId": "4660da0c-1a3a-4430-952b-aaf5b0882bb7" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "{'me': 0, 'gusta': 1, 'comer': 2, 'en': 3, 'la': 4, 'cafeteria': 5, 'Give': 6, 'it': 7, 'to': 8, 'No': 9, 'creo': 10, 'que': 11, 'sea': 12, 'una': 13, 'buena': 14, 'idea': 15, 'is': 16, 'not': 17, 'a': 18, 'good': 19, 'get': 20, 'lost': 21, 'at': 22, 'Yo': 23, 'si': 24, 'on': 25}\n" + ] + } + ], + "source": [ + "data = [ (\"me gusta comer en la cafeteria\".split(), \"SPANISH\"),\n", + " (\"Give it to me\".split(), \"ENGLISH\"),\n", + " (\"No creo que sea una buena idea\".split(), \"SPANISH\"),\n", + " (\"No it is not a good idea to get lost at sea\".split(), \"ENGLISH\") ]\n", + "\n", + "test_data = [ (\"Yo creo que si\".split(), \"SPANISH\"),\n", + " (\"it is lost on me\".split(), \"ENGLISH\")]\n", + "\n", + "# word_to_ix maps each word in the vocab to a unique integer, which will be its\n", + "# index into the Bag of words vector\n", + "word_to_ix = {}\n", + "for sent, _ in data + test_data:\n", + " for word in sent:\n", + " if word not in word_to_ix:\n", + " word_to_ix[word] = len(word_to_ix)\n", + "print(word_to_ix) \n", + "\n", + "VOCAB_SIZE = len(word_to_ix)\n", + "NUM_LABELS = 2" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "collapsed": true, + "id": "uhu6a7KovH-O" + }, + "outputs": [], + "source": [ + "class BoWClassifier(nn.Module): # inheriting from nn.Module!\n", + " \n", + " def __init__(self, num_labels, vocab_size):\n", + " # calls the init function of nn.Module. Dont get confused by syntax,\n", + " # just always do it in an nn.Module\n", + " super(BoWClassifier, self).__init__()\n", + " \n", + " # Define the parameters that you will need. In this case, we need A and b,\n", + " # the parameters of the affine mapping.\n", + " # Torch defines nn.Linear(), which provides the affine map.\n", + " # Make sure you understand why the input dimension is vocab_size\n", + " # and the output is num_labels!\n", + " self.linear = nn.Linear(vocab_size, num_labels)\n", + " \n", + " # NOTE! The non-linearity log softmax does not have parameters! So we don't need\n", + " # to worry about that here\n", + " \n", + " def forward(self, bow_vec):\n", + " # Pass the input through the linear layer,\n", + " # then pass that through log_softmax.\n", + " # Many non-linearities and other functions are in torch.nn.functional\n", + " return F.log_softmax(self.linear(bow_vec))" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "collapsed": true, + "id": "_tdQMvRCvH-O" + }, + "outputs": [], + "source": [ + "def make_bow_vector(sentence, word_to_ix):\n", + " vec = torch.zeros(len(word_to_ix))\n", + " for word in sentence:\n", + " vec[word_to_ix[word]] += 1\n", + " return vec.view(1, -1)\n", + "\n", + "def make_target(label, label_to_ix):\n", + " return torch.LongTensor([label_to_ix[label]])" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "HBoU0HMCvH-P", + "outputId": "dd57207f-8406-4608-913d-a885c6e2dd86" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Parameter containing:\n", + "tensor([[ 0.0555, 0.0597, 0.0466, 0.1627, -0.0815, -0.0828, -0.1699, -0.0080,\n", + " -0.0929, 0.0079, -0.0402, 0.0651, 0.1697, 0.0579, -0.0632, -0.0962,\n", + " -0.1710, 0.1650, -0.0372, 0.0396, 0.0073, -0.1250, 0.1104, 0.1099,\n", + " 0.0099, -0.1115],\n", + " [-0.0833, 0.0027, -0.1120, -0.1094, -0.0293, -0.0565, 0.0481, -0.0515,\n", + " -0.0260, -0.0749, -0.1792, 0.1710, 0.0374, 0.1754, -0.0316, -0.0493,\n", + " -0.1844, -0.0744, 0.1286, -0.1921, -0.0686, 0.1195, 0.1130, 0.0724,\n", + " -0.0388, -0.0148]], requires_grad=True)\n", + "Parameter containing:\n", + "tensor([-0.0372, -0.0723], requires_grad=True)\n" + ] + } + ], + "source": [ + "model = BoWClassifier(NUM_LABELS, VOCAB_SIZE)\n", + "\n", + "# the model knows its parameters. The first output below is A, the second is b.\n", + "# Whenever you assign a component to a class variable in the __init__ function of a module,\n", + "# which was done with the line\n", + "# self.linear = nn.Linear(...)\n", + "# Then through some Python magic from the Pytorch devs, your module (in this case, BoWClassifier)\n", + "# will store knowledge of the nn.Linear's parameters\n", + "for param in model.parameters():\n", + " print(param)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "qXLD5QanvH-P", + "outputId": "f00d547e-c200-4bf9-9353-4defef4d8e86" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.4435, -1.0266]], grad_fn=)\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":22: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " return F.log_softmax(self.linear(bow_vec))\n" + ] + } + ], + "source": [ + "# To run the model, pass in a BoW vector, but wrapped in an autograd.Variable\n", + "sample = data[0]\n", + "bow_vector = make_bow_vector(sample[0], word_to_ix)\n", + "log_probs = model(autograd.Variable(bow_vector))\n", + "print(log_probs)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6ikctAMgvH-P" + }, + "source": [ + "Which of the above values corresponds to the log probability of ENGLISH, and which to SPANISH? We never defined it, but we need to if we want to train the thing." + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "collapsed": true, + "id": "FdmwPRUmvH-Q" + }, + "outputs": [], + "source": [ + "label_to_ix = { \"SPANISH\": 0, \"ENGLISH\": 1 }" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bqYY152YvH-Q" + }, + "source": [ + "So lets train! To do this, we pass instances through to get log probabilities, compute a loss function, compute the gradient of the loss function, and then update the parameters with a gradient step. Loss functions are provided by Torch in the nn package. nn.NLLLoss() is the negative log likelihood loss we want. It also defines optimization functions in torch.optim. Here, we will just use SGD.\n", + "\n", + "Note that the *input* to NLLLoss is a vector of log probabilities, and a target label. It doesn't compute the log probabilities for us. This is why the last layer of our network is log softmax.\n", + "The loss function nn.CrossEntropyLoss() is the same as NLLLoss(), except it does the log softmax for you." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "qcw0Z70JvH-R", + "outputId": "342a4e1d-d8b6-4839-b48a-7810ce3b15b8" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.6190, -0.7733]], grad_fn=)\n", + "tensor([[-0.7499, -0.6395]], grad_fn=)\n", + "tensor([-0.0402, -0.1792], grad_fn=)\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":22: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " return F.log_softmax(self.linear(bow_vec))\n" + ] + } + ], + "source": [ + "# Run on test data before we train, just to see a before-and-after\n", + "for instance, label in test_data:\n", + " bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))\n", + " log_probs = model(bow_vec)\n", + " print(log_probs)\n", + "print(next(model.parameters())[:,word_to_ix[\"creo\"]]) # printthe matrix column corresponding to \"creo\"" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": { + "collapsed": true, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "U5TlojdNvH-R", + "outputId": "b16f30ad-2626-471c-e8c0-efc58ccdaa43" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":22: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " return F.log_softmax(self.linear(bow_vec))\n" + ] + } + ], + "source": [ + "loss_function = nn.NLLLoss()\n", + "optimizer = optim.SGD(model.parameters(), lr=0.1)\n", + "\n", + "# Usually you want to pass over the training data several times.\n", + "# 100 is much bigger than on a real data set, but real datasets have more than\n", + "# two instances. Usually, somewhere between 5 and 30 epochs is reasonable.\n", + "for epoch in range(100):\n", + " for instance, label in data:\n", + " # Step 1. Remember that Pytorch accumulates gradients. We need to clear them out\n", + " # before each instance\n", + " model.zero_grad()\n", + " \n", + " # Step 2. Make our BOW vector and also we must wrap the target in a Variable\n", + " # as an integer. For example, if the target is SPANISH, then we wrap the integer\n", + " # 0. The loss function then knows that the 0th element of the log probabilities is\n", + " # the log probability corresponding to SPANISH\n", + " bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))\n", + " target = autograd.Variable(make_target(label, label_to_ix))\n", + " \n", + " # Step 3. Run our forward pass.\n", + " log_probs = model(bow_vec)\n", + " \n", + " # Step 4. Compute the loss, gradients, and update the parameters by calling\n", + " # optimizer.step()\n", + " loss = loss_function(log_probs, target)\n", + " loss.backward()\n", + " optimizer.step()" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "chM-jo8tvH-R", + "outputId": "ea077997-26c9-4f69-c2a3-0aa3d65b3cad" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.1241, -2.1481]], grad_fn=)\n", + "tensor([[-2.9001, -0.0566]], grad_fn=)\n", + "tensor([ 0.3966, -0.6160], grad_fn=)\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":22: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " return F.log_softmax(self.linear(bow_vec))\n" + ] + } + ], + "source": [ + "for instance, label in test_data:\n", + " bow_vec = autograd.Variable(make_bow_vector(instance, word_to_ix))\n", + " log_probs = model(bow_vec)\n", + " print(log_probs)\n", + "print(next(model.parameters())[:,word_to_ix[\"creo\"]]) # Index corresponding to Spanish goes up, English goes down!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eQFwKNu4vH-R" + }, + "source": [ + "We got the right answer! You can see that the log probability for Spanish is much higher in the first example, and the log probability for English is much higher in the second for the test data, as it should be.\n", + "\n", + "Now you see how to make a Pytorch component, pass some data through it and do gradient updates.\n", + "We are ready to dig deeper into what deep NLP has to offer." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "58kv_UkAvH-S" + }, + "source": [ + "# 6. Word Embeddings: Encoding Lexical Semantics" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "n0Nd9AS-vH-S" + }, + "source": [ + "Word embeddings are dense vectors of real numbers, one per word in your vocabulary.\n", + "In NLP, it is almost always the case that your features are words! But how should you represent a word in a computer?\n", + "You could store its ascii character representation, but that only tells you what the word *is*, it doesn't say much about what it *means* (you might be able to derive its part of speech from its affixes, or properties from its capitalization, but not much). Even more, in what sense could you combine these representations?\n", + "We often want dense outputs from our neural networks, where the inputs are $|V|$ dimensional, where $V$ is our vocabulary, but often the outputs are only a few dimensional (if we are only predicting a handful of labels, for instance). How do we get from a massive dimensional space to a smaller dimensional space?\n", + "\n", + "How about instead of ascii representations, we use a one-hot encoding? That is, we represent the word $w$ by\n", + "$$ \\overbrace{\\left[ 0, 0, \\dots, 1, \\dots, 0, 0 \\right]}^\\text{|V| elements} $$\n", + "where the 1 is in a location unique to $w$. Any other word will have a 1 in some other location, and a 0 everywhere else.\n", + "\n", + "There is an enormous drawback to this representation, besides just how huge it is. It basically treats all words as independent entities with no relation to each other. What we really want is some notion of *similarity* between words. Why? Let's see an example." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r-fNjIicvH-T" + }, + "source": [ + "Suppose we are building a language model. Suppose we have seen the sentences\n", + "* The mathematician ran to the store.\n", + "* The physicist ran to the store.\n", + "* The mathematician solved the open problem.\n", + "\n", + "in our training data.\n", + "Now suppose we get a new sentence never before seen in our training data:\n", + "* The physicist solved the open problem.\n", + "\n", + "Our language model might do OK on this sentence, but wouldn't it be much better if we could use the following two facts:\n", + "* We have seen mathematician and physicist in the same role in a sentence. Somehow they have a semantic relation.\n", + "* We have seen mathematician in the same role in this new unseen sentence as we are now seeing physicist.\n", + "\n", + "and then infer that physicist is actually a good fit in the new unseen sentence? This is what we mean by a notion of similarity: we mean *semantic similarity*, not simply having similar orthographic representations. It is a technique to combat the sparsity of linguistic data, by connecting the dots between what we have seen and what we haven't. This example of course relies on a fundamental linguistic assumption: that words appearing in similar contexts are related to each other semantically. This is called the [distributional hypothesis](https://en.wikipedia.org/wiki/Distributional_semantics)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bk_d4DLbvH-T" + }, + "source": [ + "### Getting Dense Word Embeddings\n", + "\n", + "How can we solve this problem? That is, how could we actually encode semantic similarity in words?\n", + "Maybe we think up some semantic attributes. For example, we see that both mathematicians and physicists can run, so maybe we give these words a high score for the \"is able to run\" semantic attribute. Think of some other attributes, and imagine what you might score some common words on those attributes.\n", + "\n", + "If each attribute is a dimension, then we might give each word a vector, like this:\n", + "$$ q_\\text{mathematician} = \\left[ \\overbrace{2.3}^\\text{can run},\n", + "\\overbrace{9.4}^\\text{likes coffee}, \\overbrace{-5.5}^\\text{majored in Physics}, \\dots \\right] $$\n", + "$$ q_\\text{physicist} = \\left[ \\overbrace{2.5}^\\text{can run},\n", + "\\overbrace{9.1}^\\text{likes coffee}, \\overbrace{6.4}^\\text{majored in Physics}, \\dots \\right] $$\n", + "\n", + "Then we can get a measure of similarity between these words by doing:\n", + "$$ \\text{Similarity}(\\text{physicist}, \\text{mathematician}) = q_\\text{physicist} \\cdot q_\\text{mathematician} $$\n", + "\n", + "Although it is more common to normalize by the lengths:\n", + "$$ \\text{Similarity}(\\text{physicist}, \\text{mathematician}) = \\frac{q_\\text{physicist} \\cdot q_\\text{mathematician}}\n", + "{\\| q_\\text{\\physicist} \\| \\| q_\\text{mathematician} \\|} = \\cos (\\phi) $$\n", + "Where $\\phi$ is the angle between the two vectors. That way, extremely similar words (words whose embeddings point in the same direction) will have similarity 1. Extremely dissimilar words should have similarity -1." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cOHnX6d9vH-T" + }, + "source": [ + "You can think of the sparse one-hot vectors from the beginning of this section as a special case of these new vectors we have defined, where each word basically has similarity 0, and we gave each word some unique semantic attribute. These new vectors are *dense*, which is to say their entries are (typically) non-zero.\n", + "\n", + "But these new vectors are a big pain: you could think of thousands of different semantic attributes that might be relevant to determining similarity, and how on earth would you set the values of the different attributes? Central to the idea of deep learning is that the neural network learns representations of the features, rather than requiring the programmer to design them herself. So why not just let the word embeddings be parameters in our model, and then be updated during training? This is exactly what we will do. We will have some *latent semantic attributes* that the network can, in principle, learn. Note that the word embeddings will probably not be interpretable. That is, although with our hand-crafted vectors above we can see that mathematicians and physicists are similar in that they both like coffee, if we allow a neural network to learn the embeddings and see that both mathematicians and physicisits have a large value in the second dimension, it is not clear what that means. They are similar in some latent semantic dimension, but this probably has no interpretation to us." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5CKpy6sYvH-T" + }, + "source": [ + "In summary, **word embeddings are a representation of the *semantics* of a word, efficiently encoding semantic information that might be relevant to the task at hand**. You can embed other things too: part of speech tags, parse trees, anything! The idea of feature embeddings is central to the field." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2bb0ccYHvH-U" + }, + "source": [ + "### Word Embeddings in Pytorch\n", + "Before we get to a worked example and an exercise, a few quick notes about how to use embeddings in Pytorch and in deep learning programming in general.\n", + "Similar to how we defined a unique index for each word when making one-hot vectors, we also need to define an index for each word when using embeddings. These will be keys into a lookup table. That is, embeddings are stored as a $|V| \\times D$ matrix, where $D$ is the dimensionality of the embeddings, such that the word assigned index $i$ has its embedding stored in the $i$'th row of the matrix. In all of my code, the mapping from words to indices is a dictionary named word_to_ix.\n", + "\n", + "The module that allows you to use embeddings is torch.nn.Embedding, which takes two arguments: the vocabulary size, and the dimensionality of the embeddings.\n", + "\n", + "To index into this table, you must use torch.LongTensor (since the indices are integers, not floats)." + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "aw8AQsDjvH-U", + "outputId": "375e5b4f-b018-40d7-e5c7-91bf20dcc1ee" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.8120, -1.4617, 0.2328, 0.1896, -0.2204]],\n", + " grad_fn=)\n" + ] + } + ], + "source": [ + "word_to_ix = { \"hello\": 0, \"world\": 1 }\n", + "embeds = nn.Embedding(2, 5) # 2 words in vocab, 5 dimensional embeddings\n", + "lookup_tensor = torch.LongTensor([word_to_ix[\"hello\"]])\n", + "hello_embed = embeds( autograd.Variable(lookup_tensor) )\n", + "print(hello_embed)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0wCyslM6vH-U" + }, + "source": [ + "### An Example: N-Gram Language Modeling\n", + "Recall that in an n-gram language model, given a sequence of words $w$, we want to compute\n", + "$$ P(w_i | w_{i-1}, w_{i-2}, \\dots, w_{i-n+1} ) $$\n", + "Where $w_i$ is the ith word of the sequence.\n", + "\n", + "In this example, we will compute the loss function on some training examples and update the parameters with backpropagation." + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ly0ES4N_vH-V", + "outputId": "18b9d27f-fba9-4e5a-c198-f8a547c5337f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[(['When', 'forty'], 'winters'), (['forty', 'winters'], 'shall'), (['winters', 'shall'], 'besiege')]\n" + ] + } + ], + "source": [ + "CONTEXT_SIZE = 2\n", + "EMBEDDING_DIM = 10\n", + "# We will use Shakespeare Sonnet 2\n", + "test_sentence = \"\"\"When forty winters shall besiege thy brow,\n", + "And dig deep trenches in thy beauty's field,\n", + "Thy youth's proud livery so gazed on now,\n", + "Will be a totter'd weed of small worth held:\n", + "Then being asked, where all thy beauty lies,\n", + "Where all the treasure of thy lusty days;\n", + "To say, within thine own deep sunken eyes,\n", + "Were an all-eating shame, and thriftless praise.\n", + "How much more praise deserv'd thy beauty's use,\n", + "If thou couldst answer 'This fair child of mine\n", + "Shall sum my count, and make my old excuse,'\n", + "Proving his beauty by succession thine!\n", + "This were to be new made when thou art old,\n", + "And see thy blood warm when thou feel'st it cold.\"\"\".split()\n", + "# we should tokenize the input, but we will ignore that for now\n", + "# build a list of tuples. Each tuple is ([ word_i-2, word_i-1 ], target word)\n", + "trigrams = [ ([test_sentence[i], test_sentence[i+1]], test_sentence[i+2]) for i in range(len(test_sentence) - 2) ]\n", + "print(trigrams[:3]) # printthe first 3, just so you can see what they look like" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "collapsed": true, + "id": "zVLzeff6vH-W" + }, + "outputs": [], + "source": [ + "vocab = set(test_sentence)\n", + "word_to_ix = { word: i for i, word in enumerate(vocab) }" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "collapsed": true, + "id": "45ri5p0CvH-W" + }, + "outputs": [], + "source": [ + "class NGramLanguageModeler(nn.Module):\n", + " \n", + " def __init__(self, vocab_size, embedding_dim, context_size):\n", + " super(NGramLanguageModeler, self).__init__()\n", + " self.embeddings = nn.Embedding(vocab_size, embedding_dim)\n", + " self.linear1 = nn.Linear(context_size * embedding_dim, 128)\n", + " self.linear2 = nn.Linear(128, vocab_size)\n", + " \n", + " def forward(self, inputs):\n", + " embeds = self.embeddings(inputs).view((1, -1))\n", + " out = F.relu(self.linear1(embeds))\n", + " out = self.linear2(out)\n", + " log_probs = F.log_softmax(out)\n", + " return log_probs" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "VWeGU1zavH-X", + "outputId": "1a164b5b-cefa-4145-a876-7beb546f5ae1" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":13: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " log_probs = F.log_softmax(out)\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[tensor([518.9899]), tensor([516.5867]), tensor([514.2001]), tensor([511.8294]), tensor([509.4730]), tensor([507.1305]), tensor([504.8020]), tensor([502.4858]), tensor([500.1811]), tensor([497.8864])]\n" + ] + } + ], + "source": [ + "losses = []\n", + "loss_function = nn.NLLLoss()\n", + "model = NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)\n", + "optimizer = optim.SGD(model.parameters(), lr=0.001)\n", + "\n", + "for epoch in range(10):\n", + " total_loss = torch.Tensor([0])\n", + " for context, target in trigrams:\n", + " \n", + " # Step 1. Prepare the inputs to be passed to the model (i.e, turn the words\n", + " # into integer indices and wrap them in variables)\n", + " context_idxs = list(map(lambda w: word_to_ix[w], context))\n", + " context_var = autograd.Variable( torch.LongTensor(context_idxs) )\n", + " \n", + " # Step 2. Recall that torch *accumulates* gradients. Before passing in a new instance,\n", + " # you need to zero out the gradients from the old instance\n", + " model.zero_grad()\n", + " \n", + " # Step 3. Run the forward pass, getting log probabilities over next words\n", + " log_probs = model(context_var)\n", + " \n", + " # Step 4. Compute your loss function. (Again, Torch wants the target word wrapped in a variable)\n", + " loss = loss_function(log_probs, autograd.Variable(torch.LongTensor([word_to_ix[target]])))\n", + " \n", + " # Step 5. Do the backward pass and update the gradient\n", + " loss.backward()\n", + " optimizer.step()\n", + " \n", + " total_loss += loss.data\n", + " losses.append(total_loss)\n", + "print(losses) # The loss decreased every iteration over the training data!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "j4ke3l3VvH-Y" + }, + "source": [ + "### Exercise: Computing Word Embeddings: Continuous Bag-of-Words" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A7rZyCPavH-Y" + }, + "source": [ + "The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word. This is distinct from language modeling, since CBOW is not sequential and does not have to be probabilistic. Typcially, CBOW is used to quickly train word embeddings, and these embeddings are used to initialize the embeddings of some more complicated model. Usually, this is referred to as *pretraining embeddings*. It almost always helps performance a couple of percent.\n", + "\n", + "The CBOW model is as follows. Given a target word $w_i$ and an $N$ context window on each side, $w_{i-1}, \\dots, w_{i-N}$ and $w_{i+1}, \\dots, w_{i+N}$, referring to all context words collectively as $C$, CBOW tries to minimize\n", + "$$ -\\log p(w_i | C) = \\log \\text{Softmax}(A(\\sum_{w \\in C} q_w) + b) $$\n", + "where $q_w$ is the embedding for word $w$.\n", + "\n", + "Implement this model in Pytorch by filling in the class below. Some tips:\n", + "* Think about which parameters you need to define.\n", + "* Make sure you know what shape each operation expects. Use .view() if you need to reshape." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "6zxV6CaavH-Y", + "outputId": "6d371aec-f8eb-49e0-986d-c91cb4848cc0" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "[(['We', 'are', 'to', 'study'], 'about'), (['are', 'about', 'study', 'the'], 'to'), (['about', 'to', 'the', 'idea'], 'study'), (['to', 'study', 'idea', 'of'], 'the'), (['study', 'the', 'of', 'a'], 'idea')]\n" + ] + } + ], + "source": [ + "CONTEXT_SIZE = 2 # 2 words to the left, 2 to the right\n", + "raw_text = \"\"\"We are about to study the idea of a computational process. Computational processes are abstract\n", + "beings that inhabit computers. As they evolve, processes manipulate other abstract\n", + "things called data. The evolution of a process is directed by a pattern of rules\n", + "called a program. People create programs to direct processes. In effect,\n", + "we conjure the spirits of the computer with our spells.\"\"\".split()\n", + "word_to_ix = { word: i for i, word in enumerate(set(raw_text)) }\n", + "data = []\n", + "for i in range(2, len(raw_text) - 2):\n", + " context = [ raw_text[i-2], raw_text[i-1], raw_text[i+1], raw_text[i+2] ]\n", + " target = raw_text[i]\n", + " data.append( (context, target) )\n", + "print(data[:5])" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "collapsed": true, + "id": "OWd6uMCgvH-Z" + }, + "outputs": [], + "source": [ + "class CBOW(nn.Module):\n", + " \n", + " def __init__(self):\n", + " pass\n", + " \n", + " def forward(self, inputs):\n", + " pass" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "E432t2D0vH-Z", + "outputId": "926352c3-eb78-4ca7-f1d6-b8138c5ce064" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([23, 36, 31, 47])" + ] + }, + "metadata": {}, + "execution_count": 32 + } + ], + "source": [ + "# create your model and train. here are some functions to help you make the data ready for use by your module\n", + "def make_context_vector(context, word_to_ix):\n", + " idxs = list(map(lambda w: word_to_ix[w], context))\n", + " tensor = torch.LongTensor(idxs)\n", + " return autograd.Variable(tensor)\n", + "\n", + "make_context_vector(data[0][0], word_to_ix) # example" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CYicIo2svH-Z" + }, + "source": [ + "# 7. Sequence Models and Long-Short Term Memory Networks" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XzLGShhNvH-a" + }, + "source": [ + "At this point, we have seen various feed-forward networks.\n", + "That is, there is no state maintained by the network at all.\n", + "This might not be the behavior we want.\n", + "Sequence models are central to NLP: they are models where there is some sort of dependence through time between your inputs.\n", + "The classical example of a sequence model is the Hidden Markov Model for part-of-speech tagging. Another example is the conditional random field.\n", + "\n", + "A recurrent neural network is a network that maintains some kind of state.\n", + "For example, its output could be used as part of the next input, so that information can propogate along as the network passes over the sequence.\n", + "In the case of an LSTM, for each element in the sequence, there is a corresponding *hidden state* $h_t$, which in principle can contain information from arbitrary points earlier in the sequence.\n", + "We can use the hidden state to predict words in a language model, part-of-speech tags, and a myriad of other things." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bb0JE-RhvH-a" + }, + "source": [ + "### LSTM's in Pytorch\n", + "\n", + "Before getting to the example, note a few things.\n", + "Pytorch's LSTM expects all of its inputs to be 3D tensors.\n", + "The semantics of the axes of these tensors is important.\n", + "The first axis is the sequence itself, the second indexes instances in the mini-batch, and the third indexes elements of the input.\n", + "We haven't discussed mini-batching, so lets just ignore that and assume we will always have just 1 dimension on the second axis.\n", + "If we want to run the sequence model over the sentence \"The cow jumped\", our input should look like\n", + "$$ \n", + "\\begin{bmatrix}\n", + "\\overbrace{q_\\text{The}}^\\text{row vector} \\\\\n", + "q_\\text{cow} \\\\\n", + "q_\\text{jumped}\n", + "\\end{bmatrix}\n", + "$$\n", + "Except remember there is an additional 2nd dimension with size 1.\n", + "\n", + "In addition, you could go through the sequence one at a time, in which case the 1st axis will have size 1 also.\n", + "\n", + "Let's see a quick example." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hGZvG85avH-b", + "outputId": "77d5d890-5b55-4941-e10b-9e779a89e935" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[[-0.0955, -0.2526, -0.2748]],\n", + "\n", + " [[-0.3970, -0.0469, -0.1467]],\n", + "\n", + " [[-0.1144, -0.0950, -0.1793]],\n", + "\n", + " [[-0.0543, 0.0513, -0.1117]],\n", + "\n", + " [[-0.2153, 0.0141, -0.1913]]], grad_fn=)\n", + "(tensor([[[-0.2153, 0.0141, -0.1913]]], grad_fn=), tensor([[[-0.5291, 0.0546, -0.3974]]], grad_fn=))\n" + ] + } + ], + "source": [ + "lstm = nn.LSTM(3, 3) # Input dim is 3, output dim is 3\n", + "inputs = [ autograd.Variable(torch.randn((1,3))) for _ in range(5) ] # make a sequence of length 5\n", + "\n", + "# initialize the hidden state. \n", + "hidden = (autograd.Variable(torch.randn(1,1,3)), autograd.Variable(torch.randn((1,1,3))))\n", + "for i in inputs:\n", + " # Step through the sequence one element at a time.\n", + " # after each step, hidden contains the hidden state.\n", + " out, hidden = lstm(i.view(1,1,-1), hidden)\n", + " \n", + "# alternatively, we can do the entire sequence all at once.\n", + "# the first value returned by LSTM is all of the hidden states throughout the sequence.\n", + "# the second is just the most recent hidden state (compare the last slice of \"out\" with \"hidden\" below,\n", + "# they are the same)\n", + "# The reason for this is that:\n", + "# \"out\" will give you access to all hidden states in the sequence\n", + "# \"hidden\" will allow you to continue the sequence and backpropogate, by passing it as an argument\n", + "# to the lstm at a later time\n", + "inputs = torch.cat(inputs).view(len(inputs), 1, -1) # Add the extra 2nd dimension\n", + "hidden = (autograd.Variable(torch.randn(1,1,3)), autograd.Variable(torch.randn((1,1,3)))) # clean out hidden state\n", + "out, hidden = lstm(inputs, hidden)\n", + "print(out)\n", + "print(hidden)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "5UL4RTo-vH-b" + }, + "source": [ + "### Example: An LSTM for Part-of-Speech Tagging" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7TZrH0_zvH-b" + }, + "source": [ + "In this section, we will use an LSTM to get part of speech tags.\n", + "We will not use Viterbi or Forward-Backward or anything like that, but as a (challenging) exercise to the reader, think about how Viterbi could be used after you have seen what is going on.\n", + "\n", + "The model is as follows: let our input sentence be $w_1, \\dots, w_M$, where $w_i \\in V$, our vocab.\n", + "Also, let $T$ be our tag set, and $y_i$ the tag of word $w_i$. Denote our prediction of the tag of word $w_i$ by $\\hat{y}_i$.\n", + "\n", + "This is a structure prediction, model, where our output is a sequence $\\hat{y}_1, \\dots, \\hat{y}_M$, where $\\hat{y}_i \\in T$.\n", + "\n", + "To do the prediction, pass an LSTM over the sentence. Denote the hidden state at timestep $i$ as $h_i$. Also, assign each tag a unique index (like how we had word_to_ix in the word embeddings section).\n", + "Then our prediction rule for $\\hat{y}_i$ is\n", + "$$ \\hat{y}_i = \\text{argmax}_j \\ (\\log \\text{Softmax}(Ah_i + b))_j $$\n", + "That is, take the log softmax of the affine map of the hidden state, and the predicted tag is the tag that has the maximum value in this vector. Note this implies immediately that the dimensionality of the target space of $A$ is $|T|$." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "collapsed": true, + "id": "eR9kWCSFvH-c" + }, + "outputs": [], + "source": [ + "def prepare_sequence(seq, to_ix):\n", + " idxs = list(map(lambda w: to_ix[w], seq))\n", + " tensor = torch.LongTensor(idxs)\n", + " return autograd.Variable(tensor)" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8YDIsb_nvH-c", + "outputId": "3eeb9815-f9e3-4a69-ad09-53c4c0f750b5" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "{'The': 0, 'dog': 1, 'ate': 2, 'the': 3, 'apple': 4, 'Everybody': 5, 'read': 6, 'that': 7, 'book': 8}\n" + ] + } + ], + "source": [ + "training_data = [\n", + " (\"The dog ate the apple\".split(), [\"DET\", \"NN\", \"V\", \"DET\", \"NN\"]),\n", + " (\"Everybody read that book\".split(), [\"NN\", \"V\", \"DET\", \"NN\"])\n", + "]\n", + "word_to_ix = {}\n", + "for sent, tags in training_data:\n", + " for word in sent:\n", + " if word not in word_to_ix:\n", + " word_to_ix[word] = len(word_to_ix)\n", + "print(word_to_ix)\n", + "tag_to_ix = {\"DET\": 0, \"NN\": 1, \"V\": 2}\n", + "\n", + "# These will usually be more like 32 or 64 dimensional.\n", + "# We will keep them small, so we can see how the weights change as we train.\n", + "EMBEDDING_DIM = 6\n", + "HIDDEN_DIM = 6" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": { + "collapsed": true, + "id": "3doBH6M0vH-c" + }, + "outputs": [], + "source": [ + "class LSTMTagger(nn.Module):\n", + " \n", + " def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):\n", + " super(LSTMTagger, self).__init__()\n", + " self.hidden_dim = hidden_dim\n", + " \n", + " self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)\n", + " \n", + " # The LSTM takes word embeddings as inputs, and outputs hidden states\n", + " # with dimensionality hidden_dim.\n", + " self.lstm = nn.LSTM(embedding_dim, hidden_dim)\n", + " \n", + " # The linear layer that maps from hidden state space to tag space\n", + " self.hidden2tag = nn.Linear(hidden_dim, tagset_size)\n", + " self.hidden = self.init_hidden()\n", + " \n", + " def init_hidden(self):\n", + " # Before we've done anything, we dont have any hidden state.\n", + " # Refer to the Pytorch documentation to see exactly why they have this dimensionality.\n", + " # The axes semantics are (num_layers, minibatch_size, hidden_dim)\n", + " return (autograd.Variable(torch.zeros(1, 1, self.hidden_dim)),\n", + " autograd.Variable(torch.zeros(1, 1, self.hidden_dim)))\n", + " \n", + " def forward(self, sentence):\n", + " embeds = self.word_embeddings(sentence)\n", + " lstm_out, self.hidden = self.lstm(embeds.view(len(sentence), 1, -1), self.hidden)\n", + " tag_space = self.hidden2tag(lstm_out.view(len(sentence), -1))\n", + " tag_scores = F.log_softmax(tag_space)\n", + " return tag_scores" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": { + "collapsed": true, + "id": "iCHTkTHXvH-d" + }, + "outputs": [], + "source": [ + "model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, len(word_to_ix), len(tag_to_ix))\n", + "loss_function = nn.NLLLoss()\n", + "optimizer = optim.SGD(model.parameters(), lr=0.1)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "7aRFWaHxvH-d", + "outputId": "4ba212ba-865e-42b7-ea6d-67e1dd6c2939" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-1.2503, -1.2347, -0.8612],\n", + " [-1.1501, -1.3444, -0.8611],\n", + " [-1.1286, -1.3385, -0.8812],\n", + " [-1.0812, -1.3481, -0.9136],\n", + " [-1.1147, -1.3992, -0.8552]], grad_fn=)\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":28: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " tag_scores = F.log_softmax(tag_space)\n" + ] + } + ], + "source": [ + "# See what the scores are before training\n", + "# Note that element i,j of the output is the score for tag j for word i.\n", + "inputs = prepare_sequence(training_data[0][0], word_to_ix)\n", + "tag_scores = model(inputs)\n", + "print(tag_scores)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "collapsed": true, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "PKQ7zkKzvH-e", + "outputId": "bf16770d-e10c-4c94-953f-bd246585cece" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":28: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " tag_scores = F.log_softmax(tag_space)\n" + ] + } + ], + "source": [ + "for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data\n", + " for sentence, tags in training_data:\n", + " # Step 1. Remember that Pytorch accumulates gradients. We need to clear them out\n", + " # before each instance\n", + " model.zero_grad()\n", + " \n", + " # Also, we need to clear out the hidden state of the LSTM, detaching it from its\n", + " # history on the last instance.\n", + " model.hidden = model.init_hidden()\n", + " \n", + " # Step 2. Get our inputs ready for the network, that is, turn them into Variables\n", + " # of word indices.\n", + " sentence_in = prepare_sequence(sentence, word_to_ix)\n", + " targets = prepare_sequence(tags, tag_to_ix)\n", + " \n", + " # Step 3. Run our forward pass.\n", + " tag_scores = model(sentence_in)\n", + " \n", + " # Step 4. Compute the loss, gradients, and update the parameters by calling\n", + " # optimizer.step()\n", + " loss = loss_function(tag_scores, targets)\n", + " loss.backward()\n", + " optimizer.step()" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_MPatfm7vH-e", + "outputId": "7ab8ee74-129c-4b5b-c729-125a5d71564a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([[-0.2538, -1.6349, -3.5352],\n", + " [-3.7887, -0.0456, -3.8188],\n", + " [-3.0550, -3.3311, -0.0865],\n", + " [-0.0394, -4.2496, -3.7141],\n", + " [-4.2480, -0.0164, -6.2326]], grad_fn=)\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + ":28: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.\n", + " tag_scores = F.log_softmax(tag_space)\n" + ] + } + ], + "source": [ + "# See what the scores are after training\n", + "inputs = prepare_sequence(training_data[0][0], word_to_ix)\n", + "tag_scores = model(inputs)\n", + "# The sentence is \"the dog ate the apple\". i,j corresponds to score for tag j for word i.\n", + "# The predicted tag is the maximum scoring tag.\n", + "# Here, we can see the predicted sequence below is 0 1 2 0 1\n", + "# since 0 is index of the maximum value of row 1,\n", + "# 1 is the index of maximum value of row 2, etc.\n", + "# Which is DET NOUN VERB DET NOUN, the correct sequence!\n", + "print(tag_scores)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D3GHSZFNvH-e" + }, + "source": [ + "### Exercise: Augmenting the LSTM part-of-speech tagger with character-level features\n", + "In the example above, each word had an embedding, which served as the inputs to our sequence model.\n", + "Let's augment the word embeddings with a representation derived from the characters of the word.\n", + "We expect that this should help significantly, since character-level information like affixes have\n", + "a large bearing on part-of-speech. For example, words with the affix *-ly* are almost always tagged as adverbs in English.\n", + "\n", + "Do do this, let $c_w$ be the character-level representation of word $w$. Let $x_w$ be the word embedding as before.\n", + "Then the input to our sequence model is the concatenation of $x_w$ and $c_w$. So if $x_w$ has dimension 5, and $c_w$ dimension 3, then our LSTM should accept an input of dimension 8.\n", + "\n", + "To get the character level representation, do an LSTM over the characters of a word, and let $c_w$ be the final hidden state of this LSTM.\n", + "Hints:\n", + "* There are going to be two LSTM's in your new model. The original one that outputs POS tag scores, and the new one that outputs a character-level representation of each word.\n", + "* To do a sequence model over characters, you will have to embed characters. The character embeddings will be the input to the character LSTM." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hG2WClFlvH-e" + }, + "source": [ + "# 8. Advanced: Dynamic Toolkits, Dynamic Programming, and the BiLSTM-CRF" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true, + "id": "xmAXlT6gvH-f" + }, + "source": [ + "### Dyanmic versus Static Deep Learning Toolkits\n", + "\n", + "Pytorch is a *dynamic* neural network kit. Another example of a dynamic kit is [Dynet](https://github.com/clab/dynet) (I mention this because working with Pytorch and Dynet is similar. If you see an example in Dynet, it will probably help you implement it in Pytorch). The opposite is the *static* tool kit, which includes Theano, Keras, TensorFlow, etc.\n", + "The core difference is the following:\n", + "* In a static toolkit, you define a computation graph once, compile it, and then stream instances to it.\n", + "* In a dynamic toolkit, you define a computation graph *for each instance*. It is never compiled and is executed on-the-fly\n", + "\n", + "Without a lot of experience, it is difficult to appreciate the difference.\n", + "One example is to suppose we want to build a deep constituent parser.\n", + "Suppose our model involves roughly the following steps:\n", + "* We build the tree bottom up\n", + "* Tag the root nodes (the words of the sentence)\n", + "* From there, use a neural network and the embeddings of the words\n", + "to find combinations that form constituents. Whenever you form a new constituent,\n", + "use some sort of technique to get an embedding of the constituent.\n", + "In this case, our network architecture will depend completely on the input sentence.\n", + "In the sentence \"The green cat scratched the wall\", at some point in the model, we will want to combine\n", + "the span $(i,j,r) = (1, 3, \\text{NP})$ (that is, an NP constituent spans word 1 to word 3, in this case \"The green cat\").\n", + "\n", + "However, another sentence might be \"Somewhere, the big fat cat scratched the wall\". In this sentence, we will want to form the constituent $(2, 4, NP)$ at some point.\n", + "The constituents we will want to form will depend on the instance. If we just compile the computation graph once, as in a static toolkit, it will be exceptionally difficult or impossible to program this logic. In a dynamic toolkit though, there isn't just 1 pre-defined computation graph. There can be a new computation graph for each instance, so this problem goes away.\n", + "\n", + "Dynamic toolkits also have the advantage of being easier to debug and the code more closely resembling the host language (by that I mean that Pytorch and Dynet look more like actual Python code than Keras or Theano).\n", + "\n", + "I mention this distinction here, because the exercise in this section is to implement a model which closely resembles structure perceptron, and I believe this model would be difficult to implement in a static toolkit. I think that the advantage of dynamic toolkits for linguistic structure prediction cannot be overstated." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4KPEDPoTvH-f" + }, + "source": [ + "### Bi-LSTM Conditional Random Field Discussion\n", + "\n", + "For this section, we will see a full, complicated example of a Bi-LSTM Conditional Random Field for named-entity recognition. The LSTM tagger above is typically sufficient for part-of-speech tagging, but a sequence model like the CRF is really essential for strong performance on NER. Familiarity with CRF's is assumed. Although this name sounds scary, all the model is is a CRF but where an LSTM provides the features. This is an advanced model though, far more complicated than any earlier model in this tutorial. If you want to skip it, that is fine. To see if you're ready, see if you can:\n", + "\n", + "* Write the recurrence for the viterbi variable at step i for tag k.\n", + "* Modify the above recurrence to compute the forward variables instead.\n", + "* Modify again the above recurrence to compute the forward variables in log-space (hint: log-sum-exp)\n", + "\n", + "If you can do those three things, you should be able to understand the code below.\n", + "Recall that the CRF computes a conditional probability. Let $y$ be a tag sequence and $x$ an input sequence of words. Then we compute\n", + "$$ P(y|x) = \\frac{\\exp{(\\text{Score}(x, y)})}{\\sum_{y'} \\exp{(\\text{Score}(x, y')})} $$\n", + "\n", + "Where the score is determined by defining some log potentials $\\log \\psi_i(x,y)$ such that\n", + "$$ \\text{Score}(x,y) = \\sum_i \\log \\psi_i(x,y) $$\n", + "To make the partition function tractable, the potentials must look only at local features.\n", + "\n", + "In the Bi-LSTM CRF, we define two kinds of potentials: emission and transition. The emission potential for the word at index $i$ comes from the hidden state of the Bi-LSTM at timestep $i$. The transition scores are stored in a $|T|x|T|$ matrix $\\textbf{P}$, where $T$ is the tag set. In my implementation, $\\textbf{P}_{j,k}$ is the score of transitioning to tag $j$ from tag $k$. So:\n", + "\n", + "$$ \\text{Score}(x,y) = \\sum_i \\log \\psi_\\text{EMIT}(y_i \\rightarrow x_i) + \\log \\psi_\\text{TRANS}(y_{i-1} \\rightarrow y_i) $$\n", + "$$ = \\sum_i h_i[y_i] + \\textbf{P}_{y_i, y_{i-1}} $$\n", + "where in this second expression, we think of the tags as being assigned unique non-negative indices.\n", + "\n", + "If the above discussion was too brief, you can check out [this](http://www.cs.columbia.edu/%7Emcollins/crf.pdf) write up from Michael Collins on CRFs.\n", + "\n", + "### The Forward Algorithm in Log-Space and the Log-Sum-Exp Trick\n", + "\n", + "As hinted at above, computing the forward variables requires using a log-sum-exp. I want to explain why, since it was a little confusing to me at first, and many resources just present the forward algorithm in potential space. The recurrence for the forward variable at the $i$'th word for the tag $j$, $\\alpha_i(j)$, is\n", + "$$ \\alpha_i(j) = \\sum_{j' \\in T} \\psi_\\text{EMIT}(j \\rightarrow i) \\times \\psi_\\text{TRANS}(j' \\rightarrow j) \\times \\alpha_{i-1}(j') $$\n", + "\n", + "This is numerically unstable, and underflow is likely. It is also inconvenient to work with proper non-negative potentials in our model. We instead want to compute $\\log \\alpha_i(j)$. What we need to do is to multiply the potentials, which corresponds to adding log potentials. Then, we have to sum over tags, but what is the corresponding operation to summing over tags in log space? It is not clear. Instead, we need to transform out of log-space, take the product of potentials, do the sum over tags, and then transform back to log space. This is broken down in the revised recurrence below:\n", + "\n", + "$$ \\log \\alpha_i(j) = \\log \\overbrace{\\sum_{j' \\in T} \\exp{(\\log \\psi_\\text{EMIT}(j \\rightarrow i) + \\log \\psi_\\text{TRANS}(j' \\rightarrow j) + \\log \\alpha_{i-1}(j'))}}^\\text{transform out of log-space and compute forward variable} $$\n", + "\n", + "If you carry out elementary exponential / logarithm identities in the stuff under the overbrace above, you will see that it computes the same thing as the first recurrence, then just takes the logarithm. Log-sum-exp appears a fair bit in machine learning, and there is a [well-known trick](https://en.wikipedia.org/wiki/LogSumExp) to computing it in a numerically stable way. I use this trick in my log_sum_exp function below (I don't think Pytorch provides this function in its library).\n", + "\n", + "### Implementation Notes\n", + "\n", + "The example below implements the forward algorithm in log space to compute the partition function, and the viterbi algorithm to decode. Backpropagation will compute the gradients automatically for us. We don't have to do anything by hand.\n", + "\n", + "The implementation is not optimized. If you understand what is going on, you'll probably quickly see that iterating over the next tag in the forward algorithm could probably be done in one big operation. I wanted to code to be more readable. If you want to make the relevant change, you could probably use this tagger for real tasks." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Fg3iSfukvH-g" + }, + "source": [ + "### Example: Bidirectional LSTM Conditional Random Field for Named-Entity Recognition" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": { + "collapsed": true, + "id": "4SNc1h2yvH-g" + }, + "outputs": [], + "source": [ + "# Helper functions to make the code more readable.\n", + "def to_scalar(var):\n", + " # returns a python float\n", + " return var.view(-1).data.tolist()[0]\n", + "\n", + "def argmax(vec):\n", + " # return the argmax as a python int\n", + " _, idx = torch.max(vec, 1)\n", + " return to_scalar(idx)\n", + "\n", + "# Compute log sum exp in a numerically stable way for the forward algorithm\n", + "def log_sum_exp(vec):\n", + " max_score = vec[0, argmax(vec)]\n", + " max_score_broadcast = max_score.view(1, -1).expand(1, vec.size()[1])\n", + " return max_score + torch.log(torch.sum(torch.exp(vec - max_score_broadcast)))\n", + " \n", + "\n", + "class BiLSTM_CRF(nn.Module):\n", + " \n", + " def __init__(self, vocab_size, tag_to_ix, embedding_dim, hidden_dim):\n", + " super(BiLSTM_CRF, self).__init__()\n", + " self.embedding_dim = embedding_dim\n", + " self.hidden_dim = hidden_dim\n", + " self.vocab_size = vocab_size\n", + " self.tag_to_ix = tag_to_ix\n", + " self.tagset_size = len(tag_to_ix)\n", + " \n", + " self.word_embeds = nn.Embedding(vocab_size, embedding_dim)\n", + " self.lstm = nn.LSTM(embedding_dim, int(hidden_dim/2), num_layers=1, bidirectional=True)\n", + " \n", + " # Maps the output of the LSTM into tag space.\n", + " self.hidden2tag = nn.Linear(hidden_dim, self.tagset_size)\n", + " \n", + " # Matrix of transition parameters. Entry i,j is the score of transitioning *to* i *from* j.\n", + " self.transitions = nn.Parameter(torch.randn(self.tagset_size, self.tagset_size))\n", + " \n", + " # These two statements enforce the constraint that we never transfer *to* the start tag,\n", + " # and we never transfer *from* the stop tag (the model would probably learn this anyway,\n", + " # so this enforcement is likely unimportant)\n", + " self.transitions.data[tag_to_ix[START_TAG], :] = -10000\n", + " self.transitions.data[:, tag_to_ix[STOP_TAG]] = -10000\n", + " \n", + " self.hidden = self.init_hidden()\n", + " \n", + " def init_hidden(self):\n", + " return ( autograd.Variable( torch.randn(2, 1, self.hidden_dim)),\n", + " autograd.Variable( torch.randn(2, 1, self.hidden_dim)) )\n", + " \n", + " \n", + " def _forward_alg(self, feats):\n", + " # Do the forward algorithm to compute the partition function\n", + " init_alphas = torch.Tensor(1, self.tagset_size).fill_(-10000.)\n", + " # START_TAG has all of the score.\n", + " init_alphas[0][self.tag_to_ix[START_TAG]] = 0\n", + " \n", + " # Wrap in a variable so that we will get automatic backprop\n", + " forward_var = autograd.Variable(init_alphas)\n", + " \n", + " # Iterate through the sentence\n", + " for feat in feats:\n", + " alphas_t = [] # The forward variables at this timestep\n", + " for next_tag in range(self.tagset_size):\n", + " # broadcast the emission score: it is the same regardless of the previous tag\n", + " emit_score = feat[next_tag].view(1, -1).expand(1, self.tagset_size)\n", + " # the ith entry of trans_score is the score of transitioning to next_tag from i\n", + " trans_score = self.transitions[next_tag].view(1, -1)\n", + " # The ith entry of next_tag_var is the value for the edge (i -> next_tag)\n", + " # before we do log-sum-exp\n", + " next_tag_var = forward_var + trans_score + emit_score\n", + " # The forward variable for this tag is log-sum-exp of all the scores.\n", + " alphas_t.append(log_sum_exp(next_tag_var))\n", + " forward_var = torch.cat(alphas_t).view(1, -1)\n", + " terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]\n", + " alpha = log_sum_exp(terminal_var)\n", + " return alpha\n", + " \n", + " def _get_lstm_features(self, sentence):\n", + " self.hidden = self.init_hidden()\n", + " embeds = self.word_embeds(sentence).view(len(sentence), 1, -1)\n", + " lstm_out, self.hidden = self.lstm(embeds)\n", + " lstm_out = lstm_out.view(len(sentence), self.hidden_dim)\n", + " lstm_feats = self.hidden2tag(lstm_out)\n", + " return lstm_feats\n", + " \n", + " def _score_sentence(self, feats, tags):\n", + " # Gives the score of a provided tag sequence\n", + " score = autograd.Variable( torch.Tensor([0]) )\n", + " tags = torch.cat( [torch.LongTensor([self.tag_to_ix[START_TAG]]), tags] )\n", + " for i, feat in enumerate(feats):\n", + " score = score + self.transitions[tags[i+1], tags[i]] + feat[tags[i+1]]\n", + " score = score + self.transitions[self.tag_to_ix[STOP_TAG], tags[-1]]\n", + " return score\n", + " \n", + " def _viterbi_decode(self, feats):\n", + " backpointers = []\n", + " \n", + " # Initialize the viterbi variables in log space\n", + " init_vvars = torch.Tensor(1, self.tagset_size).fill_(-10000.)\n", + " init_vvars[0][self.tag_to_ix[START_TAG]] = 0\n", + " \n", + " # forward_var at step i holds the viterbi variables for step i-1 \n", + " forward_var = autograd.Variable(init_vvars)\n", + " for feat in feats:\n", + " bptrs_t = [] # holds the backpointers for this step\n", + " viterbivars_t = [] # holds the viterbi variables for this step\n", + " \n", + " for next_tag in range(self.tagset_size):\n", + " # next_tag_var[i] holds the viterbi variable for tag i at the previous step,\n", + " # plus the score of transitioning from tag i to next_tag.\n", + " # We don't include the emission scores here because the max\n", + " # does not depend on them (we add them in below)\n", + " next_tag_var = forward_var + self.transitions[next_tag]\n", + " best_tag_id = argmax(next_tag_var)\n", + " bptrs_t.append(best_tag_id)\n", + " viterbivars_t.append(next_tag_var[0][best_tag_id])\n", + " # Now add in the emission scores, and assign forward_var to the set\n", + " # of viterbi variables we just computed\n", + " forward_var = (torch.cat(viterbivars_t) + feat).view(1, -1)\n", + " backpointers.append(bptrs_t)\n", + " \n", + " # Transition to STOP_TAG\n", + " terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]\n", + " best_tag_id = argmax(terminal_var)\n", + " path_score = terminal_var[0][best_tag_id]\n", + " \n", + " # Follow the back pointers to decode the best path.\n", + " best_path = [best_tag_id]\n", + " for bptrs_t in reversed(backpointers):\n", + " best_tag_id = bptrs_t[best_tag_id]\n", + " best_path.append(best_tag_id)\n", + " # Pop off the start tag (we dont want to return that to the caller)\n", + " start = best_path.pop()\n", + " assert start == self.tag_to_ix[START_TAG] # Sanity check\n", + " best_path.reverse()\n", + " return path_score, best_path\n", + " \n", + " def neg_log_likelihood(self, sentence, tags):\n", + " self.hidden = self.init_hidden()\n", + " feats = self._get_lstm_features(sentence)\n", + " forward_score = self._forward_alg(feats)\n", + " gold_score = self._score_sentence(feats, tags)\n", + " return forward_score - gold_score\n", + " \n", + " def forward(self, sentence): # dont confuse this with _forward_alg above.\n", + " self.hidden = self.init_hidden()\n", + " # Get the emission scores from the BiLSTM\n", + " lstm_feats = self._get_lstm_features(sentence)\n", + " \n", + " # Find the best path, given the features.\n", + " score, tag_seq = self._viterbi_decode(lstm_feats)\n", + " return score, tag_seq\n" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": { + "collapsed": true, + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "McTsJA8VvH-h", + "outputId": "73731660-29c6-46e4-8f23-76b5c7ffafb6" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "17\n", + "{'B': 0, 'I': 1, 'O': 2, '': 3, '': 4}\n" + ] + } + ], + "source": [ + "START_TAG = \"\"\n", + "STOP_TAG = \"\"\n", + "EMBEDDING_DIM = 5\n", + "HIDDEN_DIM = 4\n", + "\n", + "# Make up some training data\n", + "training_data = [ (\n", + " \"the wall street journal reported today that apple corporation made money\".split(),\n", + " \"B I I I O O O B I O O\".split()\n", + "), (\n", + " \"georgia tech is a university in georgia\".split(),\n", + " \"B I O O O O B\".split()\n", + ") ]\n", + "\n", + "word_to_ix = {}\n", + "for sentence, tags in training_data:\n", + " for word in sentence:\n", + " if word not in word_to_ix:\n", + " word_to_ix[word] = len(word_to_ix)\n", + " \n", + "tag_to_ix = { \"B\": 0, \"I\": 1, \"O\": 2, START_TAG: 3, STOP_TAG: 4 }\n", + "\n", + "print(len(word_to_ix))\n", + "print(tag_to_ix)" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": { + "collapsed": true, + "id": "8SajbqK4vH-h" + }, + "outputs": [], + "source": [ + "model = BiLSTM_CRF( len(word_to_ix), tag_to_ix, EMBEDDING_DIM, HIDDEN_DIM)\n", + "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 348 + }, + "id": "X7CTWZIivH-i", + "outputId": "9e4e858e-4d0b-4745-a497-65ebafeea9bd" + }, + "outputs": [ + { + "output_type": "error", + "ename": "RuntimeError", + "evalue": "ignored", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mprecheck_sent\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mprepare_sequence\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtraining_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mword_to_ix\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mprecheck_tags\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mLongTensor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m \u001b[0mtag_to_ix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtraining_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mprecheck_sent\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\u001b[0m in \u001b[0;36m_call_impl\u001b[0;34m(self, *input, **kwargs)\u001b[0m\n\u001b[1;32m 1100\u001b[0m if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks\n\u001b[1;32m 1101\u001b[0m or _global_forward_hooks or _global_forward_pre_hooks):\n\u001b[0;32m-> 1102\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mforward_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0minput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1103\u001b[0m \u001b[0;31m# Do not call functions when jit is used\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1104\u001b[0m \u001b[0mfull_backward_hooks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnon_full_backward_hooks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36mforward\u001b[0;34m(self, sentence)\u001b[0m\n\u001b[1;32m 148\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 149\u001b[0m \u001b[0;31m# Find the best path, given the features.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 150\u001b[0;31m \u001b[0mscore\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtag_seq\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_viterbi_decode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlstm_feats\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 151\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mscore\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtag_seq\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m\u001b[0m in \u001b[0;36m_viterbi_decode\u001b[0;34m(self, feats)\u001b[0m\n\u001b[1;32m 116\u001b[0m \u001b[0;31m# Now add in the emission scores, and assign forward_var to the set\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[0;31m# of viterbi variables we just computed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 118\u001b[0;31m \u001b[0mforward_var\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mviterbivars_t\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mfeat\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mview\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 119\u001b[0m \u001b[0mbackpointers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbptrs_t\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 120\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mRuntimeError\u001b[0m: zero-dimensional tensor (at position 0) cannot be concatenated" + ] + } + ], + "source": [ + "# Check predictions before training\n", + "precheck_sent = prepare_sequence(training_data[0][0], word_to_ix)\n", + "precheck_tags = torch.LongTensor([ tag_to_ix[t] for t in training_data[0][1] ])\n", + "print(model(precheck_sent))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "id": "mcw4UPj2vH-i" + }, + "outputs": [], + "source": [ + "# Make sure prepare_sequence from earlier in the LSTM section is loaded\n", + "for epoch in range(300): # again, normally you would NOT do 300 epochs, it is toy data\n", + " for sentence, tags in training_data:\n", + " # Step 1. Remember that Pytorch accumulates gradients. We need to clear them out\n", + " # before each instance\n", + " model.zero_grad()\n", + " \n", + " # Step 2. Get our inputs ready for the network, that is, turn them into Variables\n", + " # of word indices.\n", + " sentence_in = prepare_sequence(sentence, word_to_ix)\n", + " targets = torch.LongTensor([ tag_to_ix[t] for t in tags ])\n", + " \n", + " # Step 3. Run our forward pass.\n", + " neg_log_likelihood = model.neg_log_likelihood(sentence_in, targets)\n", + " \n", + " # Step 4. Compute the loss, gradients, and update the parameters by calling\n", + " # optimizer.step()\n", + " neg_log_likelihood.backward()\n", + " optimizer.step()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "ukwdEqmjvH-j" + }, + "outputs": [], + "source": [ + "# Check predictions after training\n", + "precheck_sent = prepare_sequence(training_data[0][0], word_to_ix)\n", + "print(model(precheck_sent))\n", + "# We got it!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MxMLDHIzvH-j" + }, + "source": [ + "### Exercise: A new loss function for discriminative tagging\n", + "It wasn't really necessary for us to create a computation graph when doing decoding, since we do not backpropagate from the viterbi path score. Since we have it anyway, try training the tagger where the loss function is the difference between the Viterbi path score and the score of the gold-standard path. It should be clear that this function is non-negative and 0 when the predicted tag sequence is the correct tag sequence. This is essentially *structured perceptron*.\n", + "\n", + "This modification should be short, since Viterbi and score_sentence are already implemented. This is an example of the shape of the computation graph *depending on the training instance*. Although I haven't tried implementing this in a static toolkit, I imagine that it is possible but much less straightforward.\n", + "\n", + "Pick up some real data and do a comparison!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.3" + }, + "colab": { + "provenance": [] + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file From 7e9a9a60205c7c6a5df44f9c45a1b774d9a506c2 Mon Sep 17 00:00:00 2001 From: andrewrgarcia Date: Sun, 22 Jan 2023 15:31:32 -0500 Subject: [PATCH 2/6] readme --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index ca13f2f..6453650 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,10 @@ +# Files + +- [Deep Learning for Natural Language Processing with Pytorch.ipynb](https://github.com/andrewrgarcia/DeepLearningForNLPInPytorch/blob/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb) - 2017 +- [Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb](https://github.com/andrewrgarcia/DeepLearningForNLPInPytorch/blob/master/Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb) - Google Colab (2023) adaptation. Preview at [**Colab Google**](https://colab.research.google.com/drive/1aomR0tLaPRuFLtnHeDIU3pYbY99loMf1?usp=sharing) + + + # Table of Contents: 1. Introduction to Torch's Tensor Library 2. Computation Graphs and Automatic Differentiation From 43a894b9dc342792988dcc544a01c8c2052b58ab Mon Sep 17 00:00:00 2001 From: andrewrgarcia Date: Sun, 22 Jan 2023 15:38:07 -0500 Subject: [PATCH 3/6] readme --- Google_Colaboratory.png | Bin 0 -> 31715 bytes README.md | 6 ++---- 2 files changed, 2 insertions(+), 4 deletions(-) create mode 100644 Google_Colaboratory.png diff --git a/Google_Colaboratory.png b/Google_Colaboratory.png new file mode 100644 index 0000000000000000000000000000000000000000..2a30d7a18c666bbb9e0a995dcb6a700c92504861 GIT binary patch literal 31715 zcmeFY^;eW%xCcrOjetlu131#%AR#bxcO&4?p|pgABHby73@F__z|c4dC`i}PA|Oae z!+rUlbMGH;e?Cjra?Q-U_kQ;0>3CyxwN*(7=?F0}Fi4IJFoUSnK#8;2W&n z>q_819A^bB1q_V3B%*6OT;MZ{gPOh;21XDk21a-k2F4}uRroIqj8_5}7{6^XFr>0E zFsQxq+x29CA3*jRs>&F5_y0b2eog~cT|$)=397GRvM0S>ti++wMS%Seo{LGIbega z9LeCDTQx%@2E^CtpWOCM5EB!5FQl;PY<~Ajwj&%2P5r;W|3l#a5cod?{(m8G%0UwW zhIraj^WHq-q%>0{Cid@kB+TbN$m|P5B4>2it>A8gnRjhvwXeHtOBgPaQrlE zH*8Hz73W_LXXK{ZC?PD3h)o_2Ry_ncoR{0iW?CaaT8I|NNlai00Ep&?0ira`l-uhnMXrk7Q^8txCb3xD~oz<)?~$P#g-v|&y0$K7)DW)=HRibyjRUS1!; z0WG&)^_O$Nqt7vq$cl-|qL?F^rGFvx;j<|v+ZlN)b8F=A>jbavtW!01ap#m;YKVan zdf;irj+-uS01fanKk`P*vNnet(P)sYOERpFWV>rPJ>sj4ja{AFFY?*$K+iL(sfpc zlmd%!Tus;uv?%634R(H#2ov4BEFR}6HgpV4TOEeK2=YJ)&(R?T?9M}PnhM{h?4U6q zMNrK?2oOEgM-U5eGrjZD3a=Q!t)>-CC{bZkD2wZ~@1Hum@P zWfy)91{?&h^73$dUM08uG*&lB%E%MkYPX!RJ{2}-kPtu=7~`^iwu~%5DV{+QeTp+{ODtaBt1W9 z_`)dP;$%ZN7?Cdw%3L#O+HkYQC5EH--!}BCfy?A(k*RQV{@bxnbs_~bn3Ji)(yz@w zihabnju{?*&wDMypk9;mLV24v&v;(q&cmta**+tW7zMQ43ebcym5h!6+z57dhM26m z7wc8He2FQ?xoEmV!qQcWnv$Z%H{x%HgCkn7+)F=c=@O*4LR&uymTy%>CyZmo=hhB*;xuo|i^AkOVus0)dgmKYNfPAHN7-I}lOolm z5ih{0>0!Y6kZr(tZN3t>lTStVJnb5gX}_ZNQ<|WINe(@*m};m zA=?4A9C5rjIZ3fP06(@fb<0+&7g;30#tVBu>V3d|sp}C2_10O>SH=f0oLhT`A4K<=c`&4=KcJFJO~-cSb5gA9#S6cCd-hHb zB>sb@l?8gP?M>%T`?f&Llaa{Hl@0ZgB7zUyUrM{^3@mfMzjPmpC?HVtLgLWl<5%VM zi|A(?^OV{yxnDDuCEDapZH7Id5lPR^`9BUQr{f9wuoG~JX>I##^FdL+q$csfkYG^z zEI&yIC?F5^`12o9FZK25>u*dns>-nl$4`Kzm$v+%R`tgq;VPAy^7VrD|1Hven@`T5-*jIF$g4F-)8e$Uax zxIE(WFa@spZ8Q~5&Wkbr$=jCU?Euo57sIGwL6YZzms1^b$BvF$6TH1X2h}e?y+rRK zt2sZ>HG|Hz>mod3{OeE>VA#gJprky(;4f}6TkTefn8ig1JXDQ!iO zO@k?a19w-PEnh#HT~an2GRAP89ZrR>-aD9>4h-Uu$l)j!`iiV$B?dwHoz5(L!2)aXx49zZB<0|2T~6uw?BeA92JBhQ1n+ zLfLou#E2;oc*4Idd@hmk4WHs<`0rZ4mQ6juFwkc6`T|qS+bkuMsFsLaNU#s+kE4~> zUDY6$BNWMg*GM~Hv>{HO==mGXG?@%%EpSM9ni^A!aZTX9NhOCHLiIw395&(y!P+K) zhqQM_4h-s*B##3&ATucjX%a5T$>v_yLvz0Rw3k{{%c0C$ZQ}d=$x84rIY6Okk))-p zBxOwK4SV{P!&_6M>=xU^2bsy`_Kyxn?dg3#bbw(aa{$+P4gah?gP|8N|A~{7mETgW z1{?@BY^q!BJzDKx^$Lfn8J(TsGrJaitw~T!6)Oas)4XO~oOcP*1e*byQd5~u?2h`O z+z*)#c=7fRx2qan@FcAVpt%Eb8blViGKfMJM+iA!$M^D~#f!miXDg@r?3_wqSs70L zgPz^S?aibj-lO#Z4{WQwEYKL%KOquWzYIBBp?m(lV0Suze~ko}RJ9JNK4OQU z$z-9`B-W!=N^>XB8b=4UQ;96J-@(B>5qlzd*hZ6eN8nK5cAvDMdIB$ox zpO3HvxrcFw-3U-b1Peb9qq;aaB%S{0fKd|o4QC8%uw7EEEw^LrMSKXVp$-h|bF+m` z#1JBh=F+g_v8?Ur^!a3ym6sQ0dqz)>q8+FmP!Yo6XHwVt9!pH@2kw8+{3@(a9Y;U| z3QA>?*x5tyUI>#wW~P_uT{&!shXiy-Y#M9gW;i04*E_4>xw9n>8{K(MNA87&kD+%? zz_RH$t9S@k{_`%GWZ zq&^3hVf?Z{DnL;>hkLkX##IC<58pH9TXEZb`8i;XGlplqaj4z|=@nvQ_V|OQE@drS z20%1iA?kC5J?7sgLQ;pniB%Kz3>Eaz{AKItv0bJ1nAg_$-8KLBy=3)Qwbl_0xw(>Z zfG2~gm74%SPF39!%}G*9Fwf42z}t`1C?r4CMbi72^cM{M@_D$8HOZh-_UAfh_@tEd zk2oRpbU2tpty}{@m>Bmqe3ZM4X+IfM4H;iA4!r!n85d~RmLqR_tX^&sOx9ZRbAY%w z*)T~4TSMO})`}7pQG;FRN1Be4ZxYYIDD`XP=q=CaIzE~wM5-M84pZCH@+(2pZIlrZ zTEvM%PaEbtvj1X7TGC+S<;B2H)dlTpwsvHazXrN4y}s-Mg$x#@o}^!}`rNP}IsmN& zpkN9hD1x`^0(!Pd*fNJmJrVi*yeV*MVW4?)(=ut48%j@xg(m{?VE(mcbpJS@EQZJG zRt!wo;#kr-F>tOx1kHsCmHaFci}W#$2Og0Dblhb@#OpPzAi)FLG_HZqF~!5*nR2q1 zhNCL->ad~mb>;2FH`0`}a1b7Jg457X{z)mrL#fB~G4T2PEpQpt?nE4|4c*2XnFLY< zd^5yA-c|?Wyn6fe+c)Rmp9GQx#8u?}-~);-{s=EXRU&^=g|%$@{t~&9-Qn9=+=~E| z6T)g~SPjC{2p4|5P!HfURnV)pl)Nkc=i&gku7FUNLpCq=^5BcadvDX5-^&+OEq=qrAr%D(l5|#H{=V2_#C}by7$~=bA-Dccjn1k2 ziI}h~Ai8>LD0aYF9htrJ`yq$X{VpO>Q)xn_6%ON^RC&4lJ}LtDTWrcY11XeU=t0jx zt~!Djku_Dy3h@NMLanUrh74xLT~h!84|b2zC*GTsiR@08MNB;_VvwLe|JcV0i(1lS zKx?gfWzt8(W&U#hN%e8uAI~=BJF;-IhRz49(EZq&Y7$AumhaulfV&02+ER9d_-x+; zbUg&<@+o%4Orxi7qYH?Dsm-e;J-RyIQtJ#vyeT~r+QK||^%(3pY)wL9IT?((033h- zya|N*k`En7lCpwnprtgyd2T3?T96X39za`#Eo-TQ{y-WDC!X*8jy$;Rm>qZw-i$ay zrltp$TVn+JxQ@X8ehayg`=KLaGzaV&Lxo>Q!{V|yy`?%Mgkz3z1Xo;@9i z0?X#%+mOC(2r*qb=Aj#j2qd@O#aBbcqJDSedaBQe*4)A@Kxq}3uiIY%nIG!j(+N4Q za;dsArUht$_ij(ZW?E`jr*0lR@*LpIggg6%O{#Z^=X=t0JWd}vHb7Sm@7WeufI$h?XEg;{}|Y0&GLsTEnu^+kWqLV33iMI zynjyZ;R|z(%_B!|c8`$7Gt>3ZnGPHyS_yTqxJu*$}JwJcPF}=j=?=%ND-@ zZMt~92{j-Cg(>sycznQSXm7iFy;7WTbtb=+W2*llxN>#`+IQ+y!naHkGCIq#sR&=Z z8QO9MU8O2Dqm4+dCYCN4`_e^H0l}mbjB{@J29oP4shK=eTMrehgG*7JKL^xiHJ~Db z?;rc20}BlT{>@?r(Jr;GwJd^D^Ur2?#v@fkQ9}+&hZ*P8)Fq0;hfAn*n#(Iqy!ail zgsiBO~lQ*aQ#qUZd(h!dLHlif`v=6sW ztn87gF?r7l;C3xu#=9DdXttQzK-}H2{wIMtJY@xjsaVX+iCac{_LlFifqWKj`7Ygf zMRa47I~+_gW)osL(iOu4YXv;GNaekBs@pB)n7B^`+$^iopS8qtOliCxG#h{|8l9d= z#tlXwW!O4$+5u7$Usv4C9_^|hB#YPFmPLcFLsq?crD>Yst3UuTQET>Zr%Kfb)XLWD zHx*KNzZbu((?gAN?|YToET4%;(PbK!B)4|nNEPCr=(tScH8HyEEiwxI424pJUi4R1 zH4D9j_6>kC;j3mkqi|i?U$V>rt6m7319kxD+yVnE8{)SOi1`D9g7K_a#CSi4W0*bM z|J3TgjTYYSFrNL^K~WEC(-GTDcmeheX3DcveY$`w>bofDOcMEH6NLC1^yT)(8%$Dh zl{@Hb+_VB0Nn#J@r)`0Mi#w^jbr>q$+ia&scr)r$QZst(x`cho-ECw&ih`Af_LH99_Y9*R#8wEl)KUW2wd>p#-u zfvj9kD9Bk&nF~;$BCo##j0Z0D@~&}Lx$iUSnO$lQQD7lkd4 zQ&5;O{_{$0{q_bxSw^~UQ-G;_i#~n(I1p{6!V6g^s(LkA(IvpEROs>+TxCh-tnU>L zHj2F8qVA>fF7-Qvuxnw(IA>wL?pi=wz0%nvcVkV&>Bz|BO(X|hpJ(n92`(y-zDG`fK+TA1 zG07m?bWuT_|D#n=zw;$&|I4jhmC&N4ypXWtfs*OKRz+z>c_-bmF-{T#9GXXRFJ+{G}1rM$;-dVx?3bE zNASoyEi>JY+hpa3&xNgx&+}@zR~wtfbmlsK=U|e#_V4&$l4y>G1=7TRo9$QDW?_=6 z5sC2hs!vs>wk-k)kER0t77%4md`B)J1B!vRmQLk8F+E)yU-Cn5X|O%>LR=0Wk(R)D zH)%~%n?oE$T`Q{C{xbM1Qrd)ZPQOYlO=o01igppLu?LHY9AX1O2I`QaLV5+T6J%`i z0ecR``K}!9f`H#@HiZy}P|va+tQ4PN=9%ipEY7r&tkldSL<^I!J;j`$W|VejXUw_$ z+Q&>H6-BUC3fpdrGbN<9#EjusFHvph! zlpBkcm@M=uiC7KtewQymA*R%v-`aXCs}^8rBo@zpSMkNR=sW_8=HZjCf!>_BsumMl z_r|1wN3v_P6~nW9^Wc|k9i04cLgtR)=FG)Wh&3o=`W=U(Ryc>w;*aZNb%~ri@}eDxt!PgI&KO@!A+TPs#cl(Zks%bC*9+jFRUcrq zdx!n6s^wtm_-t*rGhes70AogEbYsm0;dfzgO|!aQ$^&s=vN(O4S!kkuzXeOLM&`>F zhFwk;*u$B))JE^j&~kTuDVjSRH=`QbM!9a_O5pn|nMM!Cxq+ww>ogxfIv1?+v zJZnYwggXw`k7&|V@F)KwP~fDaYh|W!_?9ZPmhxx8f=_FtXJq@F{T!B#E+1*2*1y7!PpZ|z zl2(}|53r76#CBUJbDKNo@UmUk3>Dwt4}lb;uoqRI2Rj(YoIGqRjOxZr>K`t~VN$1H zf-yX26*gnS@>cs7!AB??>1O3-zd$p`TDq5w;}>5mTYGic{hst|t&!zjz*=+Wh?})+ zMLL;{@XpUJi+IHeVV%#ZCz?cP?NpAXDshY&p#BVUNVJ}b z-G|$Y%PUN?wFR;n@m+!A;n3N=xv(=K+#}J%wWdU8kaJ4CRQhwUf@!C-b?3 zD{<#F?AP<}kHPM(;AFUSZGhkI+mBLNXFoqRFKKl1260a7ngVT@Xe5cCmxDo0Uf(Lu z_~_dMdwr$e5v7e1)}vgS021rm^Ov2lUTfaTz<@6{ts;eG%74~YvW%fqVl?b6Knn*s znlW5e#VAuS#6RJWIj%t$XcYe1VM7$Daaho1vF2If>Dkt(Ps&9zFc^DS_tiJ3+kqlg ztRK)`5_2HNo-Cd2#!Az2SZ_MeYp?zxTqkA}_HGfVi)a#BPDu=Rnn`NdsZp(4QQ{FH zWD*b&!}JkZcD|b$TEZ7D07N25HdvlME&3U^s!@x7J#bVT-I8E=G*RO+z{Vj#zRuhBTn=_@?47k3NOxe&ehE~3h712j&K-qq>7{9M`2-W=PNnKE?dgHSMcjC zxa4bVAIOe;!1t6B=KrEt$M9~T%~2PKsJjBmQlPh4g=*|L#Yv||-}`ndAAAzPuX!Jl zLoW5@?UnPhQqfmlP|NhH)$v;RXa%c;ZN5C9H)nKF@Xt)2 zICh|!3^V`UWcSOXWrJdruL??5#1gAIgzF|~+9V?iMVh^=Wv!Sb4DG2MKIhx=1Yzz5MK>OL5MULi@BX zvFrAAg@@}0W<11AudchcpySHd&!@{yl&QC00g3!D0$SOa! z={@pnGX0?W<>5zmc@dNq(ELlpO~lCcL7>n5yg7%A?%n?{oH!Nt%xB2C0j$q&VU+@j zMGaf_oRr)l80QkTRtYACVdk%cT}?_hWJKU~Fc}i%) zG@79O;Ed3PR-@5^*i#HIzq*J9ms6-EEjK6$w|R7K$VQK+HdvGnN0umiG5N^k<5HC} zJ$Ek&tB<)ieJB&D{deW^?9)dgBqxV|@EgLZXV4~y;8XW1Y7@|}myH#=l`6}k-W~_z zz$g)r)}D;xZfO!~G`ruhM|5fxOSv3sR<2;*Ss;=|6uHXSWi+pTw~0Jx^j6k9_hgga zhGR&U&i2er1?=BmT(c<4ZDbZD%!jdw>wjzrl1czQ_pHYII|kYU=e7wOudKGWu4y> z;)#-b)A#-V>*LVge(|Sb=YPOx7;_m7#95KUodM%=vNLbQkMb~VV0&oLBu(JWae(i; zkXR(Q@3rV3%U2e=B|n5~%DeACa#DYPE+6DZfr0q%oiK!FxqAQh>EftJdClvZi37Tp z^1Z^oq)^qUv8^1-KaZVW+{psD;$H;6N*BgndQ(!CGU*rN$&VsQ$ENLBSA)|T>R7uq|lO1u+c2M>BrxO$V3yHOXq9N1E zD~)%^@?KBhlN4s66o>a`e+lzU<{l>6eUyk~rZTAye#h|8E-*%JdAxi|;@5#aHG0Vd zbN(=~rx}Z8u)6Mi!l`3r{cpRDf z+cF<&x8uiq_N@0$sUX6~%i-+4yRy;$U7?UYGwpSIocAX3bHrTy9{XPK(R1bpH1#Dy+r7c2*Ddc_Mm` zmN1`cHJN<=eb@J&Ju?-5CvK?geIFk>f9y$_PP4R*4S_dB4kfU42X)LglbwqR@H#N> zC;nN;T5(e!L3x4H_2&m4yOH|8LVhuvvZiDPN(CM7_`}PGdB);30Y8cOz-_ z<7`@VJpRL&zwXSBs~ug=F>Z`5w2vjThV>GwZ`(QthKF*c&olIXCf;5!(4jTi7M#)R zi$89zw&Erg@=VzGRv$S8`5*J}l!XR9;3-SwHJdu}0KWI!ztO7UmDg+ah>{<)4fmuD z4lEk2n4b_y3rukQ_b#HrZnl6fFr+Ehk8mK-SXKKC$A+&9pe5l^gUFx~VIl88lg*G29Ucgkh7XDgt2Y!}J+o@`M%`-C*f z`QFKN(WBWDt(VLx*_ls#dfE#+mbK6w$k`{=Y1O-3Rh#t^WL=J(=>xCa&UfL0vcWN9 zT2dS6DAzF1(CF~n6vu%k5fT3Nn-AO zyAcoPCg>5Wa@5NMEIyRxYgJEWV$re@-M;dn}+d^us`-n>)UmK>#S*1yJP7 z0X;fU11+ljb#=t6cWw3XKauf+uYHLpbE_kgLuTxt&9SV zN#ZdgTx$}U`inezlUFOtK`EmZ`MhcNGSigsOM7zesT9Y;}V_}xMj>HsmAvuL<^W1CrQAdh8fz;pO}i}com zZk`6H1&Wf*WevOM{Vlw^j9qE5rofY6%rgl*ULN3T)DZye3g8ugt2x$gDI811^^`?SAc4nzTD- ziaXxII1U2S<@sBJU5K34-n zNC^zYjL(>!pp+--|2q73?IkI8S3P;ywMm7C`%PfHSdEXIZ^&Mk_InWP6dsprnKqBv z(q%K?{J=h)YDBIFu=%js&t82E*JT;djgO_YmbSL@`lF2bDjQz2PnB2?lkBB^%P($S zJF;=Uy#%{h163p2Nh>8*9DAV5=dh6`b3AFpZ}ZA^ciaJ$Gp!WJ>{S|XGbOad0>4B@ z@lM74T90vcU?6b);}J9(GuBU7D?%CYLELpl_g;X7qO0foQTgrxrI4@+@mfBUPE?8iDfdl)Z9|S)AH9NiJtRS z6hvWug*9b!((WstU%qw--fY^v`^tv(R%8?{*QX~+%!3XW*%Y#U`R*EsURb9>x&wGg zHaBIkG{tUmrDF?)^^AynO<;j<}j_#{LSPupWuNDycqN3W{mt+C5wGy0$H$OH@9D&llNt2s9 zU0}s1w~@u9<`~xmrsa7*Z%<8o;cRI2t{_Yi6o3@qtXF53CO0$@= z%2Ie%+Hq#GA{f>IPuZ+_pr2sTY_-4&9A=bcptb_^Li=D6xh8+@2Ok0hJ2P;f_-~DY zC9BN_TsIkHHh3~_bfPfJV26=~1ch>7ItWge)r+TD#K{{6J&J*HoyotTEEg?FXAkF0 z`y%-W>)^OvAtBveXFph^%#ak5Do4ML^C%*+pd{)MXX2wDM;?o$a)xxAh~s-=7-y@D2CmoZv}gma-4Af>E~aIcQdlnp zV{<2{?L&^!+d9Jp2&XYzU}X|k6@(vGzh%>-cDj3nmR8MKZ zT}}Q4p+3pMx(drw>9dA~&shAzv2+x@+gzhN7dZ-Fj0gDM!#2FqzCgO&=*EZQ70g6- zelCwF5f_Ta(n%$f`Tf_7;AEZR1cIT_@&GC&cS-+=GT9XY&{Vw;LKG>+sVDr0M@6bv zftW73-eqmEAWaFanf8TIkU%x(jbR?plW7V-$B^MOa_VvSzTzG{3GHVv#H&d4r&k;` zwTxw;lQMOsM70wX2RjcSGdT6wD>$=hrC|aDWmH-3Q(b`Wp1C!qnm2rL~GUQAi!FoXIp#iLcBQzZ|jt2 zIaEp;zSHHiS~^m9NOl3L|7BZ*@6P#@hCAQV;pHU}o8pwqr4M?O355|`sH zj2H0}Fks3(^#Pq zM-^VRZ1&}8v9^S+1Te1%1JW7md&!I+r40=1%2F48N2CGz12l>Z*$D@FW;%4Xr~4=` zyPSy2#XGIsG@k3E4VUzgNpikSrHgi1iW&;qQfZw8A4yqo)QIDhiJ9)Hi?T&|y@PDL zIy^szr|k&hPvwEUmQgjHkEG9w1;6l-P9A?r_8|_aY02kHm7P`-1!}m4+-$e)=|iHA z?3UpM!TjO`Q=5$$vq;oV|EX?WQVo67@2?w^L7v!Mt`u!D^9VUV^E34(B%@h5riT2F zMsD=!&Bdev%LjMj@(~|4ig$=H@R3BoH$v5*=`X)^)@_fsBRDuyq}QomwRH8lI^4-2OnJ0p)!kj4`DGQ!| z2^w=E(~nP!$<;2JU%ndQ1_tO-xWjTK+RrQL026JxtG`NQ*p%WZa1oaEjj2>Qq0YJ-4_m3M!b!QkagDGua5jLh7Taif@!f`lwn z`iMEjW|6B00z-%|oDEge$>0ouE%ANI z5BR&1Sn?G>9_U?f&VDl64_2lDSo!xewD^jOCi1sgd=w1;$naB53E-(akso5iYhvXx`pr<>qD#sy1jQ-f1?r~3bOknvD)HXpkp3f8J?{?O}ogt+Y=Z}CJ zm;jWmOU~fKuL%XaP?SFFDX|u(qrxDwJu{gNHR8x`qM)4|t^5Ul-}psaiol#H3Vyq+ zBSfMI7G&tnS<2VZAV`ex>!Xhwhzwy9vCg;hw2>$Isuz>rSi~JRlOF3Q0^Z5T1z;0q zaGo;Xi4iE(H(ue*uTA_kVcZ7=cv|bdt z_ltup3MFedH9nuZAwFG#YV@8%o__M=Y;u(x%!qU=6a?^9+PM6EUxy@cL!T2)bpONK zt_oqWKUaq-K^$!6qLQ z#f433&?i=g%` z^!f6-7NlDc1e?TthZ*WZF_#r1FhvQ!t^-ce#eC3yVLHqyFNO>IFd--N;DvDPHyulP z5=KK}#d(OnftHrQftY`lRb&ELFU4MD|f@~u5 zkNT>;!LUpc^uO(8ew7olS=tk(9vhEwR$s7^ktog!!76!7TPSrilc3~p04@Vl+^C!q z<`Tx)A7CLX-0B_XrQOZQ2?IYOe%5c-p9I#-*eA%cWm@ivF#L0sOA`XGXEWUs6i{>K z1NiWeGLfde=x!QE3pKtKz{W==PXi-=_3PC8BzCi9>{c0f`*e^|qrfm<~I_UF&d5u24QkG43Cy`y#f9{|IlfbHNT>s7A`AdtY@4@?kj^3K z5t$ZSv0@`|w|EjBi0Zwde1JYTE-9J~iWF4S$?G^Y`JdsL% zj@Ek+=J(CiH~FX!uoUb`H7|0oL~@m#mRd%kBQrFSCyf+6!nH!~wcif36v|U+JO!in zT=*UYg{1lbU5Iq{gaoPx+L{Sotdh?&9#)j$?pkaOOfP^6`XGx@*{phPM8>8ArM1fA zO&4uZEQl)J+|Lm&A>K`6rR0M0FstM89>z#L_L z1*=aXM9rv>U}?NwN~j+K@Dd1HAgWk)2*`7T-)n|JF@bU3rO9SQN_gBrR3bKmPrD_k zjv@!B?V@_h^jz4Ygo{tRgk;p53eU{1hFq<$PUUf{$}nG&o_->5xXiH2h8g)bFN{W0 zDsW%lwhj7RfBi_ubIF5PU)zO`J^9FP^+zKX;H02C^@1g;fN9YYM0Z}*OL=q106WwM z4&&FuY7x`x>FE!FI0`{Yr^Q<^kZcuK>ZdOTf<^KW3q2vWU$>&;B&Q#oW6bLpZT|1U z{1MU^<1sT1MEVJy_rjgXd+zSkJw z4={iN5tnv#6N;{+dgEOJQKO#~r^s&+vZ$@=rG&2pgLQvGPvB)6s7kFwP{O^~-+AWlf1T4?Tg111c5>5mQ|25iQmy1OOxPT}HX| zfW|B)1m^vAa*dw4P?%XdT(Q@sQ~w7ieUG9*!$a5!F%K;AnSy#TTRpu^?P{CN<;Zl@UF4}S`GCejE5-uOa|^i1s@3s(q2 zL7EElwt8qp#+>3jC{tw5*FOG_e@^}bD~JDiaP* z))hw|OEpB4?td)uAgx};=B_X|_74VSHrk0T3!{M9%`@p6-;{k?fP?os)bI-?5bgFv zR8J5WhAoOosma>FjeLPW;Z?KwMn1sN*A{@V&8$lerOHa<^&?HMDso*$T6oE*9!vV%1pKtI4ovr z?a{tNge^sN9f!NZk|U13TvWfJKnGFbPk<3culqp8nfLsEto24~y|TJ45|CB?ds&ri zEiGE1tT!*YXIPI_aX26-W||s(Us#FT#~n{iqwk~6{}C0@exuUlxty2v_o7$!26#|Q z5DO*|9qW|*BZZ)D0Z@~4VBpHf%gT1yBGibFHtcWrO8G;&&nE$WQ)@%Wr(GS_Q7w0VW+%V#KIbb3zVSwUi9cPLV16@-wavD#H9K+=;_@*mhfJf@} z^$X64^i-Y^DZnBDg#N?4P6fZ`^=MNQoE7jzM=TXZW(9xTK@*^V@jnB(`}}}X!E<8$ zCL=dn$1G<84axgxCXGyH*va1uU=Tf7XOzGQ9m;iXlm=d2Lk>K3O7rYcl?tMIuP0K& zrAc{;PVSw1lT`-s3PI*`fJn?SeWa#&W4M2sm$mEELX)VWseqc4gMu!FzNnC;&k}Iqx^2dvju#z(H@C z!&l8dW}*Lx*L?C)@#831sQu;fkR8P2UXbSP90F!3g)H$?P4U%r_(@5fCBHjRm@%?g zQ+B)%#4l6tFdg9ZZl<28t}3zIuLopa6+(HP@py7=#DR-si$%8c%`x>KVpoQfBhIp* z$wFsm=j+4!`2F6UC%VZypQ^ue)nschr>=52jTxSpe8xj2REbq1gj?panR%8oQhRG3 zDPOA=n5&oY6h(g!nc;}ryQ`@XR1Sojb8_X#T`dK=p!*pxB5#wl;mRrL(1t4lAjP>4 zT)Gwe6a^-ugJk+C;h#q>mcin?r1AHBnfOk}5*P`2p6iUKAqmuKP&q#%-KQB3f4O6dm1+JY- z5idyzegsXYjQGeD1i=t!WE{7G5g^+ns# z;D0IpT0DKt_&7(obpL*iP7Vlg1`6X@CKbc|X(L5M*ffQs-MGO!oq$8U-`?|)3Rfi3 zMw4xwXol=FkXLPdB_fL6Yd~klAlDw|`eRwYpvak)4b@&z1i%C46w^l(1p-jM({CRG zA5IL%EOO)|NVSLyJ z#ox51fu__{*vox7^J^U8&UV}3^5trQmS1Qh()*(IHbM=CyLwC|0H6Gd4#Z1!#1eg$ z`=;??+D7i2Fm1T~mmn0y3T#apt@~FqCPcHAIbIl&9fbqD27zblLP5*SHGBUio+gmK z2x={Ndzk1qb9MALHKvbWs_QSi374K=e0QdxwdwEm(`fb#u7axF%|osSq?u)iH0BZO zq4;H2I(+ZZZ-&?*Sy$CSdElWGG%^W39(w|?yRu~SNneF z3i#FaS3nj-P4a^?Aj4(Is+eRV-pz>6@!r&Mnhz~LzVBi~ep12{&iZ zt^C<9D0_xkp~XgB(DJ-vx`j6Kw-vrJS}bjNh)xi4=P# z@?F#G*ngd)3a30TYbnkFT$>@&slRLBNQy^p?CoZ4IDZG^$=-4*$=5ec6wh*XnT780kWaa3dQSw0TiAB zxi^ChhRupd3xQ7e&I;A2N8qz?@f_uXt-4>Ek@{oz#f0i2pA6QG>X5MU-nlA%VZ5n| zHO)V`R5{AkdxnG|KWX5nsy@9lTCC2|p1G|~nMYmB^bzTv54(*n@s{zp{e3+ZCFW=h zTcT}PumGr*3H`b#`JE8ZFU!0IOF@slYM~qIl4AcBZHe{A;vTanyYza-vv{sW#3gyY z?vVUqjeD4Ncugy<+k>a~r7LOqg9yfoK+i~C+fw31Ie)tf*bw1W(U zNvjz8g(-u*8wBubF93V!pMPB(^oupG^;8hCz!ca=%paGZIo%}!DCL5PV+LKr-$OU5 z$cQ`mVXl7*gvvF5SJF9s3ZWa?f;ieDpsC`FLRx7~oRA}1rWs4tpr$v5*npQ2wY`$8 zFrhe)glrUl;Q*YHO^jc41w<8~)k^=*HO5*W#f6r@goAsy+~>45^&KWjljBT=|zX2(jcL=*kB z?D6mh#$a4EuXDzA&g1x*{S<^_Xrn-JzhwwgdFKR+f}q}e2Ml- zdeQRojmg$eRyu3a#_TVFpmz^E@0S4sN^Ina!<9$V=YL?pv(ehgrt6aF*GL3LQr;bY zl0V=kyJ^ajut^6o?iD8fRQT%fszebi#q7KK%g5(1KZ``oK6yJ;RL@{TP|CTe?8gEU zArj?ymioTvd;)l5-@1Q0RYRI5*$yAY&fTe$J^;OhMxZuHa88qBsE(2Q%+LDOSl-WA zruusVtJ#wZBM)`UQUgk1>(7pRb}l^cl1DEYz@CjL>Wa!t-27p5@#YJxsMnVTEzaqB zz6Kd#bKg=}ZM&liv3C#LWc!HRm-{dv?EWMieWFL35PbKJCG_+x**u$KbLS$Fu={HY zl0EqZ#QPYxuSpwEOIm%K$EkE3QY-9l0%8u(lKLuTX?R7kO1x0l>}w#){S)U3T6#J( zy{kIrx#t=wYu}llAl@}w&d|%e7v(R-%TCw@_7#%)YH>LK-MPpUP!_3K;KwMb3l07* zygDyQF=T)9L7>=Kp6{R-E->3_@~Yx_F$K3&<~Uo|5nOS!@1TbKQ!hnH59)eqH22cb zLO2pV;6jvx_9q)=rTFUfv&7fn?)Ul*18#p(C3}aBj|(wxTax;kt6VHc{`DA&*!Dki zNR?{E7-xdc& z*e}^CQK`z0FE6yV-&E++s~L}A>&L!+mxy&>=jcSQc5j~j%$qTv*y zec!2cP`x7o)jOP|i}u_W&cMFat=+-)xLoY>THdxH^^c10dfVE;`^*e=|I*@x9HzqI z!}Jo2&8%`nTHlCKh3NXDdWbPzPcF6Wz(X`jp3L8e_wrsi86;-A;EJq{uJYdI3Ow8b z0_+^<0!dPFiX0w(hty-RcgBDUb{cMxY~p0KPqsCf9Yd516JJBLiuE5z0y>E)hn8H} ztA!aL^a_DKETb5(zY=z>f&TjSuSW&7axD<=>E_i*dJMe7P0Ny3&&Xz5e8*|)RGXqz zZ2L1Vlrc+I9fr&QT{xGLD(tK6j{}}jxkxaWOy7?T%8)7=nt81(g4#G4O<}sl(DN@$ zmGzxNrL7<8MuWbEgRS_-U8C7U% zO4fq;<%f0XwXuyBw|i~Y?3)f39`16`A;>=J z;&H-yykVY7H-BhPrd7+6IdBgoLb%agz#IZQ5HAKJa{m%of=h1;2R91MC9M{+_3CS+pY>qfsRcE@(%+I)SC(_yI4I}SkoBXlDu?~5G3Lmg_Adf$1?UpGd|`5k?7E6w-3qLw z;ucjgQK0(REU_javs%c6V>S1D5Y|PRtyyH&H{!E2#YMdI&h8(_CgLC1?$){y+{+$) zE4O8xe?UunKA|q)IZ>_TBlcvEu>ISAqUJ?0|B_lZyH<3@=ZH<5 zNF^vDcl^^TT&VP>_6%=-O$x7WBBkQ^`JGB`dE;33bv_)(;A7_)5CwcI%?iM=S>RS; zS+BKO$mVo*d@F_Bdd(g5S($(m=s9agyJ`wRx6DW^T%1@K<#h=9$8xc2uJr$=Z+ua0rk)?Q-`cuD(2<;%Q zosb2Ic1yB6VzVeAxay79&s-Ku@6o<7HF_mY9#eCw7OjD@BFXv&gms4i?`}=vQCC6C z^9;JGf9WuW~QHG^iz%FQ%BtB0;UV~U$=dhw;|4ZcnFe86DHBA{ z=Xhj@Wb?GO1M1E(ow``yP8e7QTV&{pE^F9zb}%W%%1z!y>FdKU(pZCrCrzm~n796s zW?!XAc#VMMntLz0pvYLT>Ew?vp#fRC70rq0lP|?O+{S!w_L~%uopm{*XAQ84`Ukynw9=C=E}ydv3depk<)VLr zQg{%`mhu5l&)7$l0)}dVG^uapA8z_;?~Tn;k;-vB_#Q8@yTQDanZv?K8F0cu``^p` z0*=$!IOuBoD=(w!gZ&-t1(_HS=`RWa+ z7M!O=<@r|vJqN>C5Y!VkvVB?tKX!I}WoNG-YP*N5{ZZV%6<)SR;Ob|+qVqB2E93pY;#p=WkzXMF)$ttnR z@pAM)A;(JT%gVOJC1^5#TEBQ& zh-VpW&4~>#4P>dVrApt%LaK8Oep~ zmk^KAL(@RnnP>RK^63kGar<6a;1)`1=@#=?F9cW)n$v~pWLA?!e$uB*AO3MPqLJOQ zlU=EPmK61nG%i!BXuTIKJ~&Ywb~3b4I*83Z0~)h!#0>$RFB~`bVZE#Ooz&by)vq|D zlx^y}Yj*iK+w{ya0avT$^aG%UcV&~lui^Ph>06|U_u3e6v7*T`HS4Yfq5Og%tsb+50I~294|SFQf=c@N2hU z{oWudpsp5@dzzvHrb$U%ah{GX#ibiawpsQM&eE^T*8m@YPYlNOLim-}F8mNEL% zkdixrF)?WMHFT$ph`_Iv5j1`84b+>k#U(w^8~piKC3{xe^gdxp^S)a5mg_TLG# zG^y_#e-zS|23m!vuw2KtFob)SM@JQO=8X5|tJ@oPKK0ICxs@QAy>%c~`nCA;we=!5 zjjHv_s5d|Q2(xA;(pp{qZql!w&2V0$!_w+tpjp)j*%)>7kc@oxu_&0y7T~9;2UuNoU#5N zORAne3C-DG+62tad=ZRyaFUKRPN`{+=jlq_BmWLhFQ1RslU#tk{;cqvFgtT723R5+ zxg5ChTp>cU-cy!7cFHxnv{_D9nd6&i#yPil!#;9)6>A{}o*Spf?PVw*M$@bFyr=rx z*75dIzXiPn3hB;qbl1ZVjB-8t4a=Niu#o+^Y(D=xip4EdY%xBee)CG{wf{d=ue`Qq zg89(-gzihP=g^+_0n^ElARd&W6me^Fuw56;L*|S%%SefMKcS;0iW*QWM;4-b7Q+Q9F{etXi zgZF2)c=Q9bE zVs;}h8HQTqy*#+6mmC44A`@>hlIG@}D#n2l)cX8Xl=Fo5O;?%qh7v4Zy*xrG9SZ=( zrn~G<=#w;dZo-}`8;P+iBa^r@f4jRg1t8hN)#kvY$^H$#*BiPgs=1CH5%?Nd$Cs!i zD8#d+#L`&PL_GIT?VI0}M0?+Fm@K!!jrgz688OkB5jPn!nb3?LCqs))esBIEjzv8i z`QycO<#RrZ?5Nj5Dq~w50~@~O63UkopDlpTaXZS;*h%l+>i1=7n{e6br?tmHv9(}%-NL>aAh+E{?wbk&DXX(o}`*x;2ds#pI{& z1&hwS@)C~omuuLh$CjpedY`8|kQ&$R(|}>%9T*1c$9##(tQP)}?boEWb&nQ%8V;Sb5|9kogk&nT9r4dcs zw9vkIAdB*-)#M9Eb;9mI8uC=egr1T45cBuw)~wFxU2m*Nyj98s9xz^Z_Kl6`@I06a za@>ot%T(RE8WotlE>t?cOzzt z)6V6y;~c@P7ZAU-F=q0#q+?)o2zlWwiz|+(NBzIt!t(az9; z(U!CAZS12UBKqUlmy7e@lA+SUlxR9p+l1TJ%dhfozEJrZ{-L_evQBIAF(LAWUeWbj9YUlwUxfBi!8(e&uf!N^(N zNsLJs|8q4fG7wgAhuL7glz3^Wpeo7L@$uMs+o4C5dTjpZxh$)35&zTtn0P#6&cydD z+qhCLH|Ey#a2~qveM-^v0rfPMrn^g}RN|A#xt6F0084>-LbzCFPvjq4Q|sGB5+8B( zPF85gl5i{+=~93DWR;+BmDORf2$$F?oHBlpLAoLe*&wXm(@WCB9_l8LPsTKrNa9V9 zNIyPgxg2XwtqUbPeebTaN3Y1dVK{z^INqiha`q(-#aCOBad?(x(1;}78k)TKh75`) zObRo<+nYJKSUnoOA1k5LbxptDQi);_;I}mFlHoq z5`T<5VL1G@ruQ>$vKf)ge4jT*xadqfDWOIyT9Flir#RwvFDP6Mq$h$!EAvZzJbj-= z7%Yd;+8s(R%=!_X%@36AKM@mwPTil-#a?z>V}e<2K|swc3tGKpBu4LOK3)~hhS!19DfL*98Kwg=sx?1ADqoY00iEF2xj}i&o?)gzjW0W48j#es z*z-MY>kU?@Ww0_*37mbabmZLTFq_Y2qj|jtY=FB0IM6Hk;bt-CHOv~>|kk?*Ng@t28nIdv}d|FAOoM`m9Eu*_jR8_EoW4+ z6Fd|i_^hev<`~?OWNqW~xD|Q4PtCCKCa$GAGoo+cLpYy;(u<;Si z;KDbd%0g1##;kYwF)C3%5@&`CTaW-d05qnE@GjJ1jDVRz?lxx^)fR6)4sbon!%s;T zS~47K{Fe_T6$zS&U%sz5u?lh@%$&(I$87$$TwdENY?Ncb)IU~2nFitc@xC(6V=~Ph zV-oB1q`Y_&n6dZN!u`0MWJCXyo};_clO?b)q;+MJux~9|UEwUd9lpM|5a!H)5ZoRY z36Z`7&lCO*Jdd0`QeQe3kJ%(QB;E)%llMjo_=NNS#ZRoP*s%gF@*QZuvBgD*4Ruw1 zXg0Y36l7Jrad@8lfX3F>>uMhIjKi11mE-bn3AU?3u}{etZ{ zUw%9aXcwLEh1bvvl7*1ficM)4Xq$xvCi;YdHGj|tz5%XG8p-5L`E%`_w{A-?X#<*& z{EuK0Oj{4U@F3He#Utvgmnm9Jse#A_&(#|+wVeB`n%k%K=QftfE9+=Rs=~e^+bk!7@ z&%cxPDs1CHuVWbG6C2LDcHDVCa78*f^)WR$?fude|kd2KHWIz$&11l2J^Y`}O zTl$o#T=5n#iaGDtglH2|s#kIiSR2*O7rAkKhr}73rH+aGy*^`>IqXP9TVr-Y*U?Y& zFUjM-6O94yg$%lhS$F-*iRR1~68#byrEc)X1I`+~D<`lLX5O#0Z|(h_X|NxFqpwZG ztulI^>c;tW#0f-ibocxG#d5KAtw+Lj3`Dc0VQG~$d)k+YUqR82wle-s@;=`FDZ%b>M+vA{09gr+8+FNS(3j=u2r_-yUs3K*ViA z&1o{Gk?kR4fq9b(ZfJoIAUK3$4n&{T1M5zA*9;&-9uL`9pWRc-Ld}GE3Bt47hitKt z*? zPj&jSrK`_LSpP% zsLryXFj23+$oGgK@`d+~Yt;FW-(g4k^P>T1a^85gYVWQKMA;_Qk< z%RF<}RMvT}-n19*d*j(}hs^7OQ2A5t*qLqHa}I~>AL_7`oo9E2*WRnU9SRP6PJZ<8 zM?{(V*=kh=XR#Y^HPh=A?EY0nQXBz_rl9q~=kdz>=huqYBOrXEa(vl09*q-lY|n|l zSLi$m2M;NYxz_2Cd!x6@Yx2v(W%L%@_i7&xwB#%SZ{D(8OPK~n5eUu$QXhO>B$|P7 zGWxKYf@3)ls-hOUa{XfAM}%3D=UDJgpE&;u!q1PuS2CQTIZb7{nI4=!qJBVii8*Z|mj0&nKHWZDwo; zxREgeJwtr|KGobOU2O)=`(}a(`|XQ?Giw>y8Y@5VuJ1z1?j8B>f+pwx@jNj(ej};( zwKDP4TK-%ZfSrbTB(vF|68yd1^=8PgqirDrNjP!nG-?V+MI%G*D&zE?cZ-3`;44d_ zp|t8K^D4^>-*M*!z-HRO)QGYLy?1hBWVI>4Vl5eVAf=d=buMHU$O_Y;lz;2^v;Cj`4TmR{jPh57P~!# zOpNUr`D$36#03XPS&!Fjgo&$A(s_}^yp3eup6M(w9xY3%? z5n1u8GCBaMLmq4PMl(B%jH`)J!(`^_ey;Gt60af&XsH{-l#U8QsgRCp$U2hCmu?*M zKc?4>?m|0`;7)Yx(4pF2@beO=$EAkQJ*Y^lb5j2FL{sq2y+=oXqcKTJi_W6nRs3i5 zX*&Z@=QM?f4`Vg=TOy7L7>EU0{g-B+?%g~pS@R|HhI)SVd=md);?QUzqL6vRoMj?~ z#@ClZW*YUWB?#rO0c+N0vMWcPAfyI0{ zVuiE5ve?O+^V>_js~RpEp;798zhi^U-%n#d&;YR+1^U{FVdJ3PJT4`vZw^e^v;N$W zl(bMxgVVr{hx>Y^SQ=6D6>ajTR`TZfCdm>GXU=qe8^A%+@-Zjtro9tts7*g)S-==g z*i?#{6&LUOc{JXPk?!7~DMLwPY)*}PLY?pjkP5{r57NBf<(0!|1$!NDw;q%XC1PkKq7aK`Rl=>Msa;DO!&SuR3U z_Zr0-eXfBO=U$QQzJBNqCyveRLHb=H{}P28XM?aq&%WU)H+uSKlFx~OZJ;1g0KtD8 zEeQ3a0Eui;(m{?xDS+#DQNRuO3Q#-~9=JdikU?#A&cHvYF54g5{!Gf|BE6MQeJ9_M z4so0d##*FDWZH<)?z(Cl-m`Z3e6>si-u(kV$*s){ZYjV3Wm`1Ctpu?9Blf9z+>J!A zQf%txX4^2dRb@@ckFCC)dntnkDhNChq9DES(ZNZh>JBmr5*^QI6D?%6ae<*PHj(>1 z729~_7bAzrK@HN=m~ISv{cOg;OCF-YOQ|^A&lEcTviE^)%;xZtNZ%fP>L% z`>*leJ;a@Jl4XNn4zk4X>iv61R?GnbVC$#FAeYT)T^5zB)Er#)iW^^^8$<$}k5@Dg24oX7l^z-|w11e+5zh+^Tc z|B7kQPj(9D51jf04~Yye6fu*;=J`=Ds5`uNkreGZm$0abQa0u$48=;xKP*9IbHFPW z76z|a`#~3|GjMXK#gdC(RU4&DTb|YzrEsqQ=fjYA_z((JP(rY8vml3h|G& z6wkx|_B91UxCV8+NZJ>WjG6tHGwNS##f-?~7;Khn{@8$86yV5u1;zNbBSpF_t_~v$ z=??DbAm*6?_VZHJ|0~SV`Q>v<`oZeGT#PtKrn>xP8u?2S)?5dW-s(xDM|BFis3J5z ze!>lEo{00>DC^5Xg7*Y85o~(#AY5n^|HHmf%C3Yw3yCU@Q&_l@h4YaYxuvCBmQKXR@#%?&FK$4I zc!*>y3unRQh%m2{izIi?xq<~HnHIe<{$d34YS7;m9jLK#4w5?g_kYcI?=tUGFW`cB zEw#Mg02l(ApcD_Sk_OI!c`SD;nwtSQ)ak|nBcSMeiO*8p{B~f66gV0znDPl;&UEMZ zv~e%I{4gf3E2q8iE@5A$dt=s?PDC46$zK&D4&f|0qpG_+A^C0`qdrfElIbibA3;2|>_3idIHt{S5fb(Ecb;SAvK9oXM4-DkbV6q`$2| zU~#(;Wo|5B^9z>PV ztVwWV2SN_Afv=kv4}@bB8%oO$Ek*wleF_iV$ZwXxCc-4u85wponZ?gGFL4drvXruC zqPfQvM=$1A_2qbwYWXU7qKQNAq_Aq=K(;wp_+|u!R?UcHTTKI1mP!R$@Fy!0+%e(A zq@YAk4K)LC6_fm$h?i@KTXAM8{d~;~Q&^|m%1zsmn!KEaC!1Mj!u6|;?3^;lT&^Yp zCHsYzT7qI(o}rzr&G_k7EPfQf+W@MtdCk9JP!&^T7Vx*eittowjP5^Oz-{I>%HqIs zCfZMW`A0ek`Qw-h6w9IjokN#1Y;nPCI}5uIeKdGI>s>HF9L4tyMLRUI1Ox}NYEgU^ z_nQ3!4{xo<>^R1C>9I`}{fu$&jjFRdmiZPC7T?Cxf-~`=&J>|?l%ev9V_$TkfJ57W zAr8(3|Cyz~Uj`~ZD1PGh|Bvt@$KqzoM0d~8=JCl}Tu30!!C^jglNZ zdhD;jm=uak0s{m1KxyUmQo*Ps+ApP=a=D>4$~oY_X5esDjhRJ|H`E=N&eH!AAC;El zh>{A0skr01Du+VgPHOBrepU9M$b~;9cs)v`{d=?8LGP3%Y5}}l(h*PXW2nk?KI?Ed%@;e(hq_x1-Ppm zTZ;g+!FsB=bzT#!5I_~XSX9v);e0v%LeAxvoRCPSGrlkxcS`Xd8F?Wd{or$*He1znF zW{>?~Q*Fdq6jtROv!RQt6QzcAiqVrkh<&1TuETU2l`yLsa~2Wa-$K@^=hyZV9n@w* z)s=&Fbt(U8O4yNwNK&#$_O=bS4a$45@MEQ;39s!hd}Hg7YwK*2Lub%{5Qim4yyKA~ zCn}g2NjtPg=}Lv!ewYb-^(JuRv(U6qve1_PTm+Fy?2v=+Q_41rb@>Dp=bt%O7auVMnNV-wGy20skXBwK#rw-z3#*{9&RPLQq69*7GjtVRG=? zNzloCLyebb29cj5tAyLW3Rg9g7;2SNH9uM$8D3nTEAVqQ=j<+T%BshQrfWy3g|A#N z_pC#Pa0=0eAVwZ9NTkrE&=%pi(N-4B`NUl$46jSb?*zPdOZDRF*d9T8xu6JpO^~R( z#D1-MPqPVngvJb=+}sFy8n%m@vWBidGHBt0@dWME;RD<5^YG-1YD)}f+= z_Oc7a#52c)yrBzqdv>4d!3SngVc_a(cVSg49>s*nx}3%DYx0kbw`2U~v=$a%E5D4z zl8MAX_9eBY4O0byfMEuy0Y_I#$QMf+L{CZ*MWyM6q3-|c^B&*Lgveolios~l@67!> zQGfj3pa0te|37MhYs{3N?~jdRbA7-=vASh``1vVL!q@*_2ROOBg4z21e;;61{L~oLY)}Oa`3hN!fBp~4t@BI( literal 0 HcmV?d00001 diff --git a/README.md b/README.md index 6453650..4867ae3 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,6 @@ -# Files - -- [Deep Learning for Natural Language Processing with Pytorch.ipynb](https://github.com/andrewrgarcia/DeepLearningForNLPInPytorch/blob/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb) - 2017 -- [Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb](https://github.com/andrewrgarcia/DeepLearningForNLPInPytorch/blob/master/Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb) - Google Colab (2023) adaptation. Preview at [**Colab Google**](https://colab.research.google.com/drive/1aomR0tLaPRuFLtnHeDIU3pYbY99loMf1?usp=sharing) +# 2023 Google Colab Adapation +[![](Google_Colaboratory.png)](https://colab.research.google.com/drive/1aomR0tLaPRuFLtnHeDIU3pYbY99loMf1?usp=sharing) # Table of Contents: From 7c054ad47fbd640227b9dc12b8279290e656a0ce Mon Sep 17 00:00:00 2001 From: andrewrgarcia Date: Sun, 22 Jan 2023 16:33:30 -0500 Subject: [PATCH 4/6] final revision -2023 Google Colab adaptation --- ...ral_Language_Processing_with_Pytorch.ipynb | 246 +++++++++--------- 1 file changed, 129 insertions(+), 117 deletions(-) rename Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb => Colab_Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb (91%) diff --git a/Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb b/Colab_Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb similarity index 91% rename from Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb rename to Colab_Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb index 4148e6d..654cba1 100644 --- a/Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb +++ b/Colab_Deep_Learning_for_Natural_Language_Processing_with_Pytorch.ipynb @@ -17,35 +17,20 @@ }, { "cell_type": "code", - "source": [ - "# install needed modules\n", - "# import sys\n", - "# if 'google.colab' in sys.modules:\n", - "# %pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 -f https://download.pytorch.org/whl/torch_stable.html\n", - "# %pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html" - ], - "metadata": { - "id": "SPomhYdxyS90" - }, - "execution_count": 1, - "outputs": [] - }, - { - "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GjlsQX1nvH9x", - "outputId": "bffd4fde-ec86-428b-bd20-dba28b0279a1" + "outputId": "73ffbab3-6668-497d-8518-2d0f3b89cd15" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "" + "" ] }, "metadata": {}, @@ -92,13 +77,13 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0jx6vh43vH95", - "outputId": "5db25a53-a6af-401c-d77e-f270aca23de7" + "outputId": "6ca503cc-d24d-43c4-abf9-dd4a1d32a8a2" }, "outputs": [ { @@ -149,13 +134,13 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d_lmP8pkvH97", - "outputId": "151df143-e9b9-4e4e-82dc-48da9addd058" + "outputId": "76362d13-0f51-430f-988f-3603e1da53dd" }, "outputs": [ { @@ -201,13 +186,13 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wZK_FfwtvH9-", - "outputId": "41106698-554b-4a19-80dc-1325284c784e" + "outputId": "6cc5c840-369a-4446-eb79-a888130f799a" }, "outputs": [ { @@ -248,13 +233,13 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Pw3S0abWvH-A", - "outputId": "b9c3fa41-49e3-4f1c-e654-4b941d552023" + "outputId": "e15d93e0-65a7-4b83-bbfb-d255cc7ed2a0" }, "outputs": [ { @@ -285,13 +270,13 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "xQXgGgqjvH-B", - "outputId": "a2b49a6d-be3f-47d9-e7fa-5df4de7ba637" + "outputId": "2e7dba33-acdb-4834-c5f1-e125103c75e8" }, "outputs": [ { @@ -339,13 +324,13 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "fFpq2JnZvH-D", - "outputId": "9e59b975-70a4-4bdd-dc1e-5a75aa3c866e" + "outputId": "4a06d1d6-0f8b-4063-b37f-ec654154d408" }, "outputs": [ { @@ -403,13 +388,13 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1wm4TJtEvH-E", - "outputId": "37aa0b1c-ab36-4073-c4cc-3a0a3f0bcfc0" + "outputId": "73a7aac4-8ed0-4cda-eca0-8b695a43bee1" }, "outputs": [ { @@ -418,7 +403,7 @@ "text": [ "tensor([1., 2., 3.])\n", "tensor([5., 7., 9.])\n", - "\n" + "\n" ] } ], @@ -450,13 +435,13 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0kDriTUVvH-F", - "outputId": "4624ce21-6097-493f-9e41-84e9d216c5e6" + "outputId": "78c09cde-1335-46c6-d4c2-3c498400b9bf" }, "outputs": [ { @@ -464,7 +449,7 @@ "name": "stdout", "text": [ "tensor(21., grad_fn=)\n", - "\n" + "\n" ] } ], @@ -502,13 +487,13 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "xDoeZQ1qvH-H", - "outputId": "390b74d4-c536-4895-a709-0e8ef72877cb" + "outputId": "98486662-67cc-4ae1-928b-57e66c065ae3" }, "outputs": [ { @@ -535,13 +520,13 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KHJ2-qbavH-I", - "outputId": "9bc6d25b-2c85-41eb-ae38-737cfeb1c348" + "outputId": "f066bd4d-8e1a-4150-e573-83c7f110a79e" }, "outputs": [ { @@ -628,13 +613,13 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JJGlaziPvH-K", - "outputId": "d53ba1ac-c587-41f5-800b-65bb5d0aba3a" + "outputId": "280e9809-da2b-42a8-a8ee-622c28ffa189" }, "outputs": [ { @@ -678,13 +663,13 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qS5597VEvH-L", - "outputId": "2b13a682-348e-44dd-e194-038a4c8fa0bb" + "outputId": "5efbe789-2cf0-40d4-a419-fc572128d1a3" }, "outputs": [ { @@ -723,13 +708,13 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "pJ-XchHvvH-M", - "outputId": "3b7db16e-c41e-46fc-aa92-3b38f7e307b6" + "outputId": "46db081b-55ad-498c-83a6-f9b8782f819c" }, "outputs": [ { @@ -839,13 +824,13 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "14KQExwRvH-N", - "outputId": "4660da0c-1a3a-4430-952b-aaf5b0882bb7" + "outputId": "fa3c7c6f-e345-40f5-8271-1436105b0a1e" }, "outputs": [ { @@ -880,7 +865,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": null, "metadata": { "collapsed": true, "id": "uhu6a7KovH-O" @@ -913,7 +898,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": null, "metadata": { "collapsed": true, "id": "_tdQMvRCvH-O" @@ -932,13 +917,13 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HBoU0HMCvH-P", - "outputId": "dd57207f-8406-4608-913d-a885c6e2dd86" + "outputId": "4b5826dc-84ab-405c-ae65-d2635bb49156" }, "outputs": [ { @@ -974,13 +959,13 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qXLD5QanvH-P", - "outputId": "f00d547e-c200-4bf9-9353-4defef4d8e86" + "outputId": "f8a3da41-93dc-4fe3-9b10-fcdc5788147d" }, "outputs": [ { @@ -1018,7 +1003,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": null, "metadata": { "collapsed": true, "id": "FdmwPRUmvH-Q" @@ -1042,13 +1027,13 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qcw0Z70JvH-R", - "outputId": "342a4e1d-d8b6-4839-b48a-7810ce3b15b8" + "outputId": "de742f8d-f6d1-4fb5-b9b3-84925717e8cf" }, "outputs": [ { @@ -1080,14 +1065,14 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "metadata": { "collapsed": true, "colab": { "base_uri": "https://localhost:8080/" }, "id": "U5TlojdNvH-R", - "outputId": "b16f30ad-2626-471c-e8c0-efc58ccdaa43" + "outputId": "24b97f13-bfe4-4f85-ec2f-3342cc0d0d74" }, "outputs": [ { @@ -1131,13 +1116,13 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "chM-jo8tvH-R", - "outputId": "ea077997-26c9-4f69-c2a3-0aa3d65b3cad" + "outputId": "a4b4cf7b-a590-45e5-a273-1a6485c7de1c" }, "outputs": [ { @@ -1290,13 +1275,13 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "aw8AQsDjvH-U", - "outputId": "375e5b4f-b018-40d7-e5c7-91bf20dcc1ee" + "outputId": "1817b7ec-d886-4540-bff2-97b6ba5d4fb5" }, "outputs": [ { @@ -1332,13 +1317,13 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ly0ES4N_vH-V", - "outputId": "18b9d27f-fba9-4e5a-c198-f8a547c5337f" + "outputId": "73b4623c-187d-4a26-8d54-6f50e0a34a36" }, "outputs": [ { @@ -1375,7 +1360,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": null, "metadata": { "collapsed": true, "id": "zVLzeff6vH-W" @@ -1388,7 +1373,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": null, "metadata": { "collapsed": true, "id": "45ri5p0CvH-W" @@ -1413,13 +1398,13 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "VWeGU1zavH-X", - "outputId": "1a164b5b-cefa-4145-a876-7beb546f5ae1" + "outputId": "f074ccb4-a464-44f3-dfff-0b8caecef6c8" }, "outputs": [ { @@ -1434,7 +1419,7 @@ "output_type": "stream", "name": "stdout", "text": [ - "[tensor([518.9899]), tensor([516.5867]), tensor([514.2001]), tensor([511.8294]), tensor([509.4730]), tensor([507.1305]), tensor([504.8020]), tensor([502.4858]), tensor([500.1811]), tensor([497.8864])]\n" + "[tensor([518.3880]), tensor([516.0305]), tensor([513.6876]), tensor([511.3594]), tensor([509.0451]), tensor([506.7459]), tensor([504.4583]), tensor([502.1824]), tensor([499.9146]), tensor([497.6555])]\n" ] } ], @@ -1500,13 +1485,13 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6zxV6CaavH-Y", - "outputId": "6d371aec-f8eb-49e0-986d-c91cb4848cc0" + "outputId": "6f0c2158-ebcf-4bcd-a9ff-1d25a0e16e57" }, "outputs": [ { @@ -1535,7 +1520,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": null, "metadata": { "collapsed": true, "id": "OWd6uMCgvH-Z" @@ -1553,20 +1538,20 @@ }, { "cell_type": "code", - "execution_count": 32, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E432t2D0vH-Z", - "outputId": "926352c3-eb78-4ca7-f1d6-b8138c5ce064" + "outputId": "c334e44f-16ad-40c8-b8cc-cad9d65888c8" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "tensor([23, 36, 31, 47])" + "tensor([ 0, 40, 13, 21])" ] }, "metadata": {}, @@ -1640,13 +1625,13 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hGZvG85avH-b", - "outputId": "77d5d890-5b55-4941-e10b-9e779a89e935" + "outputId": "9f9dc41c-f4cf-4319-ccb9-ab2657d054f2" }, "outputs": [ { @@ -1723,7 +1708,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": null, "metadata": { "collapsed": true, "id": "eR9kWCSFvH-c" @@ -1738,13 +1723,13 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "8YDIsb_nvH-c", - "outputId": "3eeb9815-f9e3-4a69-ad09-53c4c0f750b5" + "outputId": "72c37b82-f5db-482f-8fa2-225c12637026" }, "outputs": [ { @@ -1776,7 +1761,7 @@ }, { "cell_type": "code", - "execution_count": 36, + "execution_count": null, "metadata": { "collapsed": true, "id": "3doBH6M0vH-c" @@ -1816,7 +1801,7 @@ }, { "cell_type": "code", - "execution_count": 37, + "execution_count": null, "metadata": { "collapsed": true, "id": "iCHTkTHXvH-d" @@ -1830,13 +1815,13 @@ }, { "cell_type": "code", - "execution_count": 38, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7aRFWaHxvH-d", - "outputId": "4ba212ba-865e-42b7-ea6d-67e1dd6c2939" + "outputId": "bde2fbbd-1bcc-40bd-e4d9-9cada0d2bcda" }, "outputs": [ { @@ -1869,14 +1854,14 @@ }, { "cell_type": "code", - "execution_count": 39, + "execution_count": null, "metadata": { "collapsed": true, "colab": { "base_uri": "https://localhost:8080/" }, "id": "PKQ7zkKzvH-e", - "outputId": "bf16770d-e10c-4c94-953f-bd246585cece" + "outputId": "eed35a1a-45b1-4543-db20-1f7bf9a6edfc" }, "outputs": [ { @@ -1916,13 +1901,13 @@ }, { "cell_type": "code", - "execution_count": 40, + "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "_MPatfm7vH-e", - "outputId": "7ab8ee74-129c-4b5b-c729-125a5d71564a" + "outputId": "e5a1b002-a29a-42a8-eab8-ab43132ed073" }, "outputs": [ { @@ -2081,7 +2066,7 @@ }, { "cell_type": "code", - "execution_count": 41, + "execution_count": null, "metadata": { "collapsed": true, "id": "4SNc1h2yvH-g" @@ -2159,7 +2144,7 @@ " next_tag_var = forward_var + trans_score + emit_score\n", " # The forward variable for this tag is log-sum-exp of all the scores.\n", " alphas_t.append(log_sum_exp(next_tag_var))\n", - " forward_var = torch.cat(alphas_t).view(1, -1)\n", + " forward_var = torch.stack(alphas_t).view(1, -1)\n", " terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]\n", " alpha = log_sum_exp(terminal_var)\n", " return alpha\n", @@ -2205,7 +2190,7 @@ " viterbivars_t.append(next_tag_var[0][best_tag_id])\n", " # Now add in the emission scores, and assign forward_var to the set\n", " # of viterbi variables we just computed\n", - " forward_var = (torch.cat(viterbivars_t) + feat).view(1, -1)\n", + " forward_var = (torch.stack(viterbivars_t) + feat).view(1, -1)\n", " backpointers.append(bptrs_t)\n", " \n", " # Transition to STOP_TAG\n", @@ -2243,14 +2228,14 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": null, "metadata": { "collapsed": true, "colab": { "base_uri": "https://localhost:8080/" }, "id": "McTsJA8VvH-h", - "outputId": "73731660-29c6-46e4-8f23-76b5c7ffafb6" + "outputId": "e7eaaca3-d34b-46f5-a7a1-256927cce3f9" }, "outputs": [ { @@ -2291,46 +2276,61 @@ }, { "cell_type": "code", - "execution_count": 43, + "execution_count": null, "metadata": { "collapsed": true, - "id": "8SajbqK4vH-h" + "id": "8SajbqK4vH-h", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "2ae72970-b49b-4c2d-f36f-daec473e54bc" }, - "outputs": [], + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "SGD (\n", + "Parameter Group 0\n", + " dampening: 0\n", + " lr: 0.01\n", + " momentum: 0\n", + " nesterov: False\n", + " weight_decay: 0.0001\n", + ")\n" + ] + } + ], "source": [ "model = BiLSTM_CRF( len(word_to_ix), tag_to_ix, EMBEDDING_DIM, HIDDEN_DIM)\n", - "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)" + "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n", + "print(optimizer)" ] }, { "cell_type": "code", - "execution_count": 44, + "execution_count": null, "metadata": { "colab": { - "base_uri": "https://localhost:8080/", - "height": 348 + "base_uri": "https://localhost:8080/" }, "id": "X7CTWZIivH-i", - "outputId": "9e4e858e-4d0b-4745-a497-65ebafeea9bd" + "outputId": "fb18297d-008f-431f-9064-314a90a15f8b" }, "outputs": [ { - "output_type": "error", - "ename": "RuntimeError", - "evalue": "ignored", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mRuntimeError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mprecheck_sent\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mprepare_sequence\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtraining_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mword_to_ix\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mprecheck_tags\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mLongTensor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m \u001b[0mtag_to_ix\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mt\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtraining_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mprecheck_sent\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", - "\u001b[0;32m/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py\u001b[0m in \u001b[0;36m_call_impl\u001b[0;34m(self, *input, **kwargs)\u001b[0m\n\u001b[1;32m 1100\u001b[0m if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks\n\u001b[1;32m 1101\u001b[0m or _global_forward_hooks or _global_forward_pre_hooks):\n\u001b[0;32m-> 1102\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mforward_call\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0minput\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1103\u001b[0m \u001b[0;31m# Do not call functions when jit is used\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1104\u001b[0m \u001b[0mfull_backward_hooks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnon_full_backward_hooks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m\u001b[0m in \u001b[0;36mforward\u001b[0;34m(self, sentence)\u001b[0m\n\u001b[1;32m 148\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 149\u001b[0m \u001b[0;31m# Find the best path, given the features.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 150\u001b[0;31m \u001b[0mscore\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtag_seq\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_viterbi_decode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlstm_feats\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 151\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mscore\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtag_seq\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m\u001b[0m in \u001b[0;36m_viterbi_decode\u001b[0;34m(self, feats)\u001b[0m\n\u001b[1;32m 116\u001b[0m \u001b[0;31m# Now add in the emission scores, and assign forward_var to the set\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 117\u001b[0m \u001b[0;31m# of viterbi variables we just computed\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 118\u001b[0;31m \u001b[0mforward_var\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mviterbivars_t\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mfeat\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mview\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 119\u001b[0m \u001b[0mbackpointers\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbptrs_t\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 120\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mRuntimeError\u001b[0m: zero-dimensional tensor (at position 0) cannot be concatenated" + "output_type": "stream", + "name": "stdout", + "text": [ + "tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])\n", + "(tensor(12.4681, grad_fn=), [0, 1, 2, 1, 2, 1, 2, 1, 2, 1, 0])\n" ] } ], "source": [ + "\n", "# Check predictions before training\n", + "print(prepare_sequence(training_data[0][0], word_to_ix))\n", "precheck_sent = prepare_sequence(training_data[0][0], word_to_ix)\n", "precheck_tags = torch.LongTensor([ tag_to_ix[t] for t in training_data[0][1] ])\n", "print(model(precheck_sent))" @@ -2355,8 +2355,8 @@ " # Step 2. Get our inputs ready for the network, that is, turn them into Variables\n", " # of word indices.\n", " sentence_in = prepare_sequence(sentence, word_to_ix)\n", - " targets = torch.LongTensor([ tag_to_ix[t] for t in tags ])\n", - " \n", + " targets = torch.LongTensor([ tag_to_ix[t] for t in tags ]).to()\n", + " \n", " # Step 3. Run our forward pass.\n", " neg_log_likelihood = model.neg_log_likelihood(sentence_in, targets)\n", " \n", @@ -2370,9 +2370,21 @@ "cell_type": "code", "execution_count": null, "metadata": { - "id": "ukwdEqmjvH-j" + "id": "ukwdEqmjvH-j", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "41d0eacd-57fc-4ef8-9e54-0600bb4d9613" }, - "outputs": [], + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(tensor(30.4068, grad_fn=), [0, 1, 1, 1, 2, 2, 2, 0, 1, 2, 2])\n" + ] + } + ], "source": [ "# Check predictions after training\n", "precheck_sent = prepare_sequence(training_data[0][0], word_to_ix)\n", From 11a6811d1be1690ffa54cb8275a50caf2700684d Mon Sep 17 00:00:00 2001 From: andrewrgarcia Date: Sun, 22 Jan 2023 16:35:09 -0500 Subject: [PATCH 5/6] readme --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 4867ae3..8350625 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,7 @@ -# 2023 Google Colab Adapation +# Google Colab Adaptation (2023) [![](Google_Colaboratory.png)](https://colab.research.google.com/drive/1aomR0tLaPRuFLtnHeDIU3pYbY99loMf1?usp=sharing) - # Table of Contents: 1. Introduction to Torch's Tensor Library 2. Computation Graphs and Automatic Differentiation From 44ade03310020324cb3013180488220d0dac60a8 Mon Sep 17 00:00:00 2001 From: "Andrew R. Garcia, Ph.D" Date: Sun, 22 Jan 2023 16:38:17 -0500 Subject: [PATCH 6/6] Update README.md --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8350625..bf7e968 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,9 @@ -# Google Colab Adaptation (2023) +# Deep Learning for Natural Language Processing with Pytorch + +## Google Colab Adaptation (2023) + +
-[![](Google_Colaboratory.png)](https://colab.research.google.com/drive/1aomR0tLaPRuFLtnHeDIU3pYbY99loMf1?usp=sharing) # Table of Contents: 1. Introduction to Torch's Tensor Library