vikasgupta-github
diff --git a/‎attention/Attention_Basics.ipynb‎
Lines changed: 293 additions & 0 deletions b/‎attention/Attention_Basics.ipynb‎
Lines changed: 293 additions & 0 deletions
@@ -0,0 +1,293 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Attention Basics\n",
+    "In this notebook, we look at how attention is implemented. We will focus on implementing attention in isolation from a larger model. That's because when implementing attention in a real-world model, a lot of the focus goes into piping the data and juggling the various vectors rather than the concepts of attention themselves.\n",
+    "\n",
+    "We will implement attention scoring as well as calculating an attention context vector.\n",
+    "\n",
+    "## Attention Scoring\n",
+    "### Inputs to the scoring function\n",
+    "Let's start by looking at the inputs we'll give to the scoring function. We will assume we're in the first step in the decoding phase. The first input to the scoring function is the hidden state of decoder (assuming a toy RNN with three hidden nodes -- not usable in real life, but easier to illustrate):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dec_hidden_state = [5,1,20]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's visualize this vector:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns\n",
+    "\n",
+    "# Let's visualize our decoder hidden state\n",
+    "plt.figure(figsize=(1.5, 4.5))\n",
+    "sns.heatmap(np.transpose(np.matrix(dec_hidden_state)), annot=True, cmap=sns.light_palette(\"purple\", as_cmap=True), linewidths=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Our first scoring function will score a single annotation (encoder hidden state), which looks like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "annotation = [3,12,45] #e.g. Encoder hidden state"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's visualize the single annotation\n",
+    "plt.figure(figsize=(1.5, 4.5))\n",
+    "sns.heatmap(np.transpose(np.matrix(annotation)), annot=True, cmap=sns.light_palette(\"orange\", as_cmap=True), linewidths=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### IMPLEMENT: Scoring a Single Annotation\n",
+    "Let's calculate the dot product of a single annotation. NumPy's [dot()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html) is a good candidate for this operation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def single_dot_attention_score(dec_hidden_state, enc_hidden_state):\n",
+    "    # TODO: return the dot product of the two vectors\n",
+    "    return \n",
+    "    \n",
+    "single_dot_attention_score(dec_hidden_state, annotation)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### Annotations Matrix\n",
+    "Let's now look at scoring all the annotations at once. To do that, here's our annotation matrix:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "annotations = np.transpose([[3,12,45], [59,2,5], [1,43,5], [4,3,45.3]])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "And it can be visualized like this (each column is a hidden state of an encoder time step):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's visualize our annotation (each column is an annotation)\n",
+    "ax = sns.heatmap(annotations, annot=True, cmap=sns.light_palette(\"orange\", as_cmap=True), linewidths=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### IMPLEMENT: Scoring All Annotations at Once\n",
+    "Let's calculate the scores of all the annotations in one step using matrix multiplication. Let's continue to us the dot scoring method\n",
+    "\n",
+    "<img src=\"images/scoring_functions.png\" />\n",
+    "\n",
+    "To do that, we'll have to transpose `dec_hidden_state` and [matrix multiply](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matmul.html) it with `annotations`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def dot_attention_score(dec_hidden_state, annotations):\n",
+    "    # TODO: return the product of dec_hidden_state transpose and enc_hidden_states\n",
+    "    return \n",
+    "    \n",
+    "attention_weights_raw = dot_attention_score(dec_hidden_state, annotations)\n",
+    "attention_weights_raw"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Looking at these scores, can you guess which of the four vectors will get the most attention from the decoder at this time step?\n",
+    "\n",
+    "## Softmax\n",
+    "Now that we have our scores, let's apply softmax:\n",
+    "<img src=\"images/softmax.png\" />"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def softmax(x):\n",
+    "    x = np.array(x, dtype=np.float128)\n",
+    "    e_x = np.exp(x)\n",
+    "    return e_x / e_x.sum(axis=0) \n",
+    "\n",
+    "attention_weights = softmax(attention_weights_raw)\n",
+    "attention_weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Even when knowing which annotation will get the most focus, it's interesting to see how drastic softmax makes the end score become. The first and last annotation had the respective scores of 927 and 929. But after softmax, the attention they'll get is 0.12 and 0.88 respectively.\n",
+    "\n",
+    "# Applying the scores back on the annotations\n",
+    "Now that we have our scores, let's multiply each annotation by its score to proceed closer to the attention context vector. This is the multiplication part of this formula (we'll tackle the summation part in the latter cells)\n",
+    "\n",
+    "<img src=\"images/Context_vector.png\" />"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def apply_attention_scores(attention_weights, annotations):\n",
+    "    # TODO: Multiple the annotations by their weights\n",
+    "    return\n",
+    "\n",
+    "applied_attention = apply_attention_scores(attention_weights, annotations)\n",
+    "applied_attention"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's visualize how the context vector looks now that we've applied the attention scores back on it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's visualize our annotations after applying attention to them\n",
+    "ax = sns.heatmap(applied_attention, annot=True, cmap=sns.light_palette(\"orange\", as_cmap=True), linewidths=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Contrast this with the raw annotations visualized earlier in the notebook, and we can see that the second and third annotations (columns) have been nearly wiped out. The first annotation maintains some of its value, and the fourth annotation is the most pronounced.\n",
+    "\n",
+    "# Calculating the Attention Context Vector\n",
+    "All that remains to produce our attention context vector now is to sum up the four columns to produce a single attention context vector\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def calculate_attention_vector(applied_attention):\n",
+    "    return np.sum(applied_attention, axis=1)\n",
+    "\n",
+    "attention_vector = calculate_attention_vector(applied_attention)\n",
+    "attention_vector"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "# Let's visualize the attention context vector\n",
+    "plt.figure(figsize=(1.5, 4.5))\n",
+    "sns.heatmap(np.transpose(np.matrix(attention_vector)), annot=True, cmap=sns.light_palette(\"Blue\", as_cmap=True), linewidths=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have the context vector, we can concatenate it with the hidden state and pass it through a hidden layer to produce the the result of this decoding time step."
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}