Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: matching image extension available #1

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions _posts/2019-1-12-ColabTPU.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ CPUs are made to run pretty much any calculation. Therefore, CPU store values in

For an MXU, matrix multiplication reuses both inputs many times, as illustrated below :

![image](https://maelfabien.github.io/assets/images/systolic.gif)
![image](https://maelfabien.github.io/assets/images/systolic.jpg)

Data flows in through the chip in waves.

Expand Down Expand Up @@ -287,4 +287,4 @@ y_pred = model.predict(X_test)

We now have the prediction of our model, but the model's training is now around 20 times faster!

> **Conclusion** : I hope this introduction to Google Colab TPU's was helpful. If you have any question, don't hesitate to drop a comment!
> **Conclusion** : I hope this introduction to Google Colab TPU's was helpful. If you have any question, don't hesitate to drop a comment!
10 changes: 5 additions & 5 deletions _posts/2019-8-19-NLP_Gen.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ In Python, it's as simple as that :
X, y = input_sequences[:,:-1],input_sequences[:,-1]
```

We will now see this problem as a multi-class classification task. As usual, we must first one-hot encode the `y` to get a sparse matrix that contains a 1 in the column that corresponds to the token, and 0 eslewhere :
We will now see this problem as a multi-class classification task. As usual, we must first one-hot encode the `y` to get a sparse matrix that contains a 1 in the column that corresponds to the token, and 0 elsewhere :

![image](https://maelfabien.github.io/assets/images/lgen_7.png)

Expand Down Expand Up @@ -226,7 +226,7 @@ We have 165'000 training samples. X is 199 columns wide since it corresponds to

## Build the model

We will be using Long Short-Term Memory networks (LSTM). LSTM have the important advantage of being able to understand depenence over a whole sequence, and therefore, the beginning of a sentence might have an impact on the 15th word to predict. On the other hand, Recurrent Neural Networks (RNN) only imply a dependence on the previous state of the network, and only the previous word would help predict the next one. We would quickly miss context if we chose RNNs, and therefore, LSTMs seem to be the right choice.
We will be using Long Short-Term Memory networks (LSTM). LSTM have the important advantage of being able to understand dependence over a whole sequence, and therefore, the beginning of a sentence might have an impact on the 15th word to predict. On the other hand, Recurrent Neural Networks (RNN) only imply a dependence on the previous state of the network, and only the previous word would help predict the next one. We would quickly miss context if we chose RNNs, and therefore, LSTMs seem to be the right choice.

### Model architecture

Expand Down Expand Up @@ -303,13 +303,13 @@ On a CPU, a single epoch takes around 8 minutes. On a GPU, you should modify the
# Modify Import
from keras.layers import Embedding, LSTM, Dense, Dropout, CuDNNLSTM

# In the Moddel
# In the Model
...
model.add(CuDNNLSTM(100))
...
```

This reduces training time to 2 minutes per epoch, which makes it acceptable. I have personnaly trained this model on Google Colab. I tend to stop the training at several steps to make so sample predictions and control the quality of the model given several values of the cross entropy.
This reduces training time to 2 minutes per epoch, which makes it acceptable. I have personally trained this model on Google Colab. I tend to stop the training at several steps to make so sample predictions and control the quality of the model given several values of the cross entropy.

Here are my observations :

Expand Down Expand Up @@ -350,7 +350,7 @@ When the loss is around 3.1, here is the sentence it generates with "Google" as

`Google is a large amount of data produced worldwide`

It does not really mean anything, but it sucessfully associates Google to the notion of large amount of data. It's quite impressive since it simply relies on the co-occurence of words, and does not integrate any grammatical notion. If we wait a bit longer in the training and let the loss decrease to 2.6, and give it the input "In this article" :
It does not really mean anything, but it successfully associates Google to the notion of large amount of data. It's quite impressive since it simply relies on the co-occurrence of words, and does not integrate any grammatical notion. If we wait a bit longer in the training and let the loss decrease to 2.6, and give it the input "In this article" :

`In this article we'll cover the main concepts of the data and the dwell time is proposed mentioning the number of nodes`

Expand Down
4 changes: 2 additions & 2 deletions _posts/2019-8-9-intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ This activation function is smooth, differentiable (allows back-propagation) and

The perceptron can be seen as an error minimization algorithm. We choose the softmax function, a differentiable and continuous error function and try to minimize it by applying gradient descent.

We usually apply a log-loss error function. This error function applies a penalty to miscalssified points that is proportional to the distance of the boundary.
We usually apply a log-loss error function. This error function applies a penalty to misclassified points that is proportional to the distance of the boundary.

## Multi-class

Expand Down Expand Up @@ -316,7 +316,7 @@ How do we combine those models ?

![image](https://maelfabien.github.io/assets/images/nn_12.jpg)

We can weight the models indiviually to assign more weight to a model than to another. Say that we want $$ 2/3 $$ of the overall weight on the first one. We simply apply a factor of 2 to the probabilites in the first model :
We can weight the models individually to assign more weight to a model than to another. Say that we want $$ 2/3 $$ of the overall weight on the first one. We simply apply a factor of 2 to the probabilities in the first model :

$$ 2 * 0.7 + 1 * 0.8 $$

Expand Down
8 changes: 4 additions & 4 deletions _posts/2019-9-6-NLP_6.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
published: true
title: Improved Few-Shot Text classification
title: Improved Few-Shot Text Classification
collection: ml
layout: single
author_profile: true
Expand All @@ -17,7 +17,7 @@ sidebar:
nav: sidebar-sample
---

In the [previous article](https://maelfabien.github.io/machinelearning/NLP_5/), we replicated the paper "Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop" by Katherine Bailey and Sunny Chopra Acquia. This article addresses the problem of few-shot text classification using distance metrics and pre-trainened embeddings. We saw that a K-NN classifier could outperform the cosine similarity classifier if the number of classes increases.
In the [previous article](https://maelfabien.github.io/machinelearning/NLP_5/), we replicated the paper "Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop" by Katherine Bailey and Sunny Chopra Acquia. This article addresses the problem of few-shot text classification using distance metrics and pre-trained embeddings. We saw that a K-NN classifier could outperform the cosine similarity classifier if the number of classes increases.

We saw that the number of samples could have a large impact on the classification accuracy (up to 30% for the same class), and therefore, gaining new samples is essential.

Expand All @@ -43,7 +43,7 @@ Then, using a pre-trained Word Embedding model (Word2Vec, Glove..), we compute t

![image](https://maelfabien.github.io/assets/images/nlp_fs_2.png)

At this point, we compute the avereage embedding for each class :
At this point, we compute the average embedding for each class :

![image](https://maelfabien.github.io/assets/images/nlp_fs_3.png)

Expand All @@ -57,7 +57,7 @@ Here is the process when a new sentence to classify comes in :

## K-NN

We also explored a K-NN classifier on the pre-trained embeddings. Let's suppose that the embeding dimension is only 2 (or that we apply a PCA with 2 components) to represent this problem graphically. The classification task with the KNN is the following :
We also explored a K-NN classifier on the pre-trained embeddings. Let's suppose that the embedding dimension is only 2 (or that we apply a PCA with 2 components) to represent this problem graphically. The classification task with the KNN is the following :

![image](https://maelfabien.github.io/assets/images/nlp_fs_6.png)

Expand Down