You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Otherwise, the code will not work, for instance if one wants to extend it to implement a regression use-case instead of a classification use-case (i.e. "none" instead of "softmax" in the final layer + court-circuiting the final activation function in the code).
The text was updated successfully, but these errors were encountered:
Not necessarily @marxav .If the derivative of the cost function to the activation of the output layer already takes into account "m", ie 1.) d(cost_fn)/d(activation) = (1/m)*((1-y/1-a) - y/a), then there is no need to again divide the parameters or other gradients by "m", because when it gets divided by "m" in 1.) ,it gets propagated to all the parameters and gradients.
Thank you for this wonderful example, which helped me understanding the gradient descent implementation.
I just noticed a minor mistake:
should be:
In addition:
should also be:
Otherwise, the code will not work, for instance if one wants to extend it to implement a regression use-case instead of a classification use-case (i.e. "none" instead of "softmax" in the final layer + court-circuiting the final activation function in the code).
The text was updated successfully, but these errors were encountered: