- Sigmoid / Logistic Function
- Tanh
- Like logistic function but shifted to range
- Like logistic function but shifted to range
- reLU often used in vision tasks
- rectified linear unit
- Linear with cutoff at zero
- Soft version:
- Quadratic Loss
- the same objective as Linear Regression
- i.e. MSE
- Cross Entropy
- the same objective as Logistic Regression
- i.e. negative log likelihood
- this requires probabilities, so we add an additional "softmax" layer at the end of our network
- steeper
Forward | Backward | |
---|---|---|
Quadratic | ||
Cross Entropy | $J = y^\log{(y)} + (1-y^)\log{(1-y)}$ | $\frac{dJ}{dy} = \frac{y^}{y} + \frac{(1-y^)}{y-1}$ |
- Softmax:
- Loss:
- Def #1 Chain Rule
- Def #2 Chain Rule
- Def #3 Chain Rule
- Backpropagation is just repeated application of the chain rule
- Computation Graphs
- not a Neural Network diagram
- Backprop Ex #1
- Forward Computation
- Given
- Given
- Backgward Computation
- Updates for Backprop
- Reuse forward computation in backward computation
- Reuse backward computation within itself
- Consider a 2-hidden layer neural nets
- parameters are
- SGD training
- Iterate until convergence:
- Sample
- Compute gradient by backprop
- Step opposite the gradient
- Sample
- Iterate until convergence:
- Backprop Ex #2: for neural network
- Given: decision function
- loss function $J = \ell(\hat{y},y^) = y^\log(\hat{y}) + (1-y^*)\log(1-\hat{y})$
- Forward
- Given
- for
- Given
- Given: decision function