From fde326c359222ded56100de64893d0c28213ea5a Mon Sep 17 00:00:00 2001 From: Gabriel Godefroy Date: Wed, 17 Oct 2018 22:30:14 +0100 Subject: [PATCH] Update README.md - Adding an update for the expression of the derivative of sigma in the NeuralNetwork paper, as discussed with the author. - Fixing a typo (virtial -> virtual). --- 1808_Neural_networks/README.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/1808_Neural_networks/README.md b/1808_Neural_networks/README.md index d1592ed..e64a810 100644 --- a/1808_Neural_networks/README.md +++ b/1808_Neural_networks/README.md @@ -25,10 +25,22 @@ To run the [load_and_process_data.ipynb](load_and_process_data.ipynb) you will n pip install scikit-learn -It's a good idea to make a virtial environment for your projects. You can easily do this with `conda`: +It's a good idea to make a virtual environment for your projects. You can easily do this with `conda`: conda env create -f environment.yml + +### Update: derivative of the logistic function + +In the published version of the article, the derivative of the activation function (i.e., the logistic function _sigma(z) = 1 / (1+exp(-z))_ ) was expressed as _sigma'(z) = z(1-z)_. Using a particular value makes it clear that this expression of the derivative is wrong (_z=0; z(1-z)[0]=0_ but the tangent of the sigmoid function is not horizontal on z=0). + +One can show that _sigma'(z) = -exp(-x)/(1+exp(-x))^2 = sigma(z)*(1-sigma(z))_. A more detailed demonstration can be found there: (https://en.wikipedia.org/wiki/Logistic_function#Derivative). The expression of the derivative should be corrected from _z*(1-z)_ to _sigma(z)*(1-sigma(z)_ in the second equation of the paper. + +However, the python expression of the backward sigmoid function _(x*(1-x))_ makes it possible to compute the forward and the backward values while computing the exponential value only once. This uses the following composition: + + * _a1 = sigma(z)_ (estimate exonential) + * _derivative = sigma(a1,False) = sigma(a1)*(1-sigma(a1)) = sigma'(a1)_. +The code provided in the paper is thus correct. It is probably faster than an implementation that would compute independently the sigmoid function and it's derivative. ### So Long and Thanks for All the Fish