maelfabien · guspan-tanadi · Sep 10, 2023 · Sep 10, 2023 · Sep 10, 2023 · Sep 10, 2023
diff --git a/_posts/2019-1-12-ColabTPU.md b/_posts/2019-1-12-ColabTPU.md
@@ -89,7 +89,7 @@ CPUs are made to run pretty much any calculation. Therefore, CPU store values in
 
 For an MXU, matrix multiplication reuses both inputs many times, as illustrated below :
 
-![image](https://maelfabien.github.io/assets/images/systolic.gif)
+![image](https://maelfabien.github.io/assets/images/systolic.jpg)
 
 Data flows in through the chip in waves. 
 
@@ -287,4 +287,4 @@ y_pred = model.predict(X_test)
 
 We now have the prediction of our model, but the model's training is now around 20 times faster!
 
-> **Conclusion** : I hope this introduction to Google Colab TPU's was helpful. If you have any question, don't hesitate to drop a comment!
+> **Conclusion** : I hope this introduction to Google Colab TPU's was helpful. If you have any question, don't hesitate to drop a comment!
diff --git a/_posts/2019-8-19-NLP_Gen.md b/_posts/2019-8-19-NLP_Gen.md
@@ -198,7 +198,7 @@ In Python, it's as simple as that :
 X, y = input_sequences[:,:-1],input_sequences[:,-1]
 ```
 
-We will now see this problem as a multi-class classification task. As usual, we must first one-hot encode the `y` to get a sparse matrix that contains a 1 in the column that corresponds to the token, and 0 eslewhere :
+We will now see this problem as a multi-class classification task. As usual, we must first one-hot encode the `y` to get a sparse matrix that contains a 1 in the column that corresponds to the token, and 0 elsewhere :
 
 ![image](https://maelfabien.github.io/assets/images/lgen_7.png)
 
@@ -226,7 +226,7 @@ We have 165'000 training samples. X is 199 columns wide since it corresponds to
 
 ## Build the model
 
-We will be using Long Short-Term Memory networks (LSTM). LSTM have the important advantage of being able to understand depenence over a whole sequence, and therefore, the beginning of a sentence might have an impact on the 15th word to predict. On the other hand, Recurrent Neural Networks (RNN) only imply a dependence on the previous state of the network, and only the previous word would help predict the next one. We would quickly miss context if we chose RNNs, and therefore, LSTMs seem to be the right choice. 
+We will be using Long Short-Term Memory networks (LSTM). LSTM have the important advantage of being able to understand dependence over a whole sequence, and therefore, the beginning of a sentence might have an impact on the 15th word to predict. On the other hand, Recurrent Neural Networks (RNN) only imply a dependence on the previous state of the network, and only the previous word would help predict the next one. We would quickly miss context if we chose RNNs, and therefore, LSTMs seem to be the right choice. 
 
 ### Model architecture
 
@@ -303,13 +303,13 @@ On a CPU, a single epoch takes around 8 minutes. On a GPU, you should modify the
 # Modify Import
 from keras.layers import Embedding, LSTM, Dense, Dropout, CuDNNLSTM
 
-# In the Moddel
+# In the Model
 ...
     model.add(CuDNNLSTM(100))
 ...
 ```
 
-This reduces training time to 2 minutes per epoch, which makes it acceptable. I have personnaly trained this model on Google Colab. I tend to stop the training at several steps to make so sample predictions and control the quality of the model given several values of the cross entropy.
+This reduces training time to 2 minutes per epoch, which makes it acceptable. I have personally trained this model on Google Colab. I tend to stop the training at several steps to make so sample predictions and control the quality of the model given several values of the cross entropy.
 
 Here are my observations :
 
@@ -350,7 +350,7 @@ When the loss is around 3.1, here is the sentence it generates with "Google" as
 
 `Google is a large amount of data produced worldwide`
 
-It does not really mean anything, but it sucessfully associates Google to the notion of large amount of data. It's quite impressive since it simply relies on the co-occurence of words, and does not integrate any grammatical notion. If we wait a bit longer in the training and let the loss decrease to 2.6, and give it the input "In this article" :
+It does not really mean anything, but it successfully associates Google to the notion of large amount of data. It's quite impressive since it simply relies on the co-occurrence of words, and does not integrate any grammatical notion. If we wait a bit longer in the training and let the loss decrease to 2.6, and give it the input "In this article" :
 
 `In this article we'll cover the main concepts of the data and the dwell time is proposed mentioning the number of nodes`
 

diff --git a/_posts/2019-8-9-intro.md b/_posts/2019-8-9-intro.md
@@ -165,7 +165,7 @@ This activation function is smooth, differentiable (allows back-propagation) and
 
 The perceptron can be seen as an error minimization algorithm. We choose the softmax function, a differentiable and continuous error function and try to minimize it by applying gradient descent.
 
-We usually apply a log-loss error function. This error function applies a penalty to miscalssified points that is proportional to the distance of the boundary.
+We usually apply a log-loss error function. This error function applies a penalty to misclassified points that is proportional to the distance of the boundary.
 
 ## Multi-class
 
@@ -316,7 +316,7 @@ How do we combine those models ?
 
 ![image](https://maelfabien.github.io/assets/images/nn_12.jpg)
 
-We can weight the models indiviually to assign more weight to a model than to another. Say that we want $$ 2/3 $$ of the overall weight on the first one. We simply apply a factor of 2 to the probabilites in the first model :
+We can weight the models individually to assign more weight to a model than to another. Say that we want $$ 2/3 $$ of the overall weight on the first one. We simply apply a factor of 2 to the probabilities in the first model :
 
 $$ 2 * 0.7 + 1 * 0.8 $$
 

diff --git a/_posts/2019-9-6-NLP_6.md b/_posts/2019-9-6-NLP_6.md
@@ -1,6 +1,6 @@
 ---
 published: true
-title: Improved Few-Shot Text classification
+title: Improved Few-Shot Text Classification
 collection: ml
 layout: single
 author_profile: true
@@ -17,7 +17,7 @@ sidebar:
     nav: sidebar-sample
 ---
 
-In the [previous article](https://maelfabien.github.io/machinelearning/NLP_5/), we replicated the paper "Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop" by Katherine Bailey and Sunny Chopra Acquia. This article addresses the problem of few-shot text classification using distance metrics and pre-trainened embeddings. We saw that a K-NN classifier could outperform the cosine similarity classifier if the number of classes increases. 
+In the [previous article](https://maelfabien.github.io/machinelearning/NLP_5/), we replicated the paper "Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop" by Katherine Bailey and Sunny Chopra Acquia. This article addresses the problem of few-shot text classification using distance metrics and pre-trained embeddings. We saw that a K-NN classifier could outperform the cosine similarity classifier if the number of classes increases.
 
 We saw that the number of samples could have a large impact on the classification accuracy (up to 30% for the same class), and therefore, gaining new samples is essential.
 
@@ -43,7 +43,7 @@ Then, using a pre-trained Word Embedding model (Word2Vec, Glove..), we compute t
 
 ![image](https://maelfabien.github.io/assets/images/nlp_fs_2.png)
 
-At this point, we compute the avereage embedding for each class :
+At this point, we compute the average embedding for each class :
 
 ![image](https://maelfabien.github.io/assets/images/nlp_fs_3.png)
 
@@ -57,7 +57,7 @@ Here is the process when a new sentence to classify comes in :
 
 ## K-NN
 
-We also explored a K-NN classifier on the pre-trained embeddings. Let's suppose that the embeding dimension is only 2 (or that we apply a PCA with 2 components) to represent this problem graphically. The classification task with the KNN is the following :
+We also explored a K-NN classifier on the pre-trained embeddings. Let's suppose that the embedding dimension is only 2 (or that we apply a PCA with 2 components) to represent this problem graphically. The classification task with the KNN is the following :
 
 ![image](https://maelfabien.github.io/assets/images/nlp_fs_6.png)