Open
Description
Thank you for this wonderful example, which helped me understanding the gradient descent implementation.
I just noticed a minor mistake:
- dW_curr = np.dot(dZ_curr, A_prev.T) / m
- db_curr = np.sum(dZ_curr, axis=1, keepdims=True) / m
should be:
- dW_curr = np.dot(dZ_curr, A_prev.T)
- db_curr = np.sum(dZ_curr, axis=1, keepdims=True)
In addition:
- params_values["W" + str(layer_idx)] -= learning_rate * grads_values["dW" + str(layer_idx)]
- params_values["b" + str(layer_idx)] -= learning_rate * grads_values["db" + str(layer_idx)]
should also be:
- params_values["W" + str(layer_idx)] -= learning_rate / m * grads_values["dW" + str(layer_idx)]
- params_values["b" + str(layer_idx)] -= learning_rate / m * grads_values["db" + str(layer_idx)]
Otherwise, the code will not work, for instance if one wants to extend it to implement a regression use-case instead of a classification use-case (i.e. "none" instead of "softmax" in the final layer + court-circuiting the final activation function in the code).
Metadata
Metadata
Assignees
Labels
No labels