Skip to content

Commit 5b9c2d4

Browse files
[cmu-10601] Supplement for Lecture 13 Neural Network + Backpropagation
includes some math expression format fixes
1 parent 9d4b470 commit 5b9c2d4

File tree

1 file changed

+54
-53
lines changed

1 file changed

+54
-53
lines changed

Machine-Learning/cmu-10601/lecture13-backpropagation.md

+54-53
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33
## Activation Functions
44

55
* Sigmoid / Logistic Function
6-
* 1 / (1 + exp(-α))
6+
* $\frac{1}{1 + \exp{(-\alpha)})}$
77
* Tanh
8-
* Like logistic function but shifted to range [-1, +1]
8+
* Like logistic function but shifted to range $[-1, +1]$
99
* reLU often used in vision tasks
1010
* rectified linear unit
1111
* Linear with cutoff at zero
12-
* max(0, wx+b)
13-
* Soft version: log(exp(x) + 1)
12+
* $max(0, wx+b)$
13+
* Soft version: $\log{(\exp{(x)} + 1)}$
1414

1515
## Objective Function
1616

@@ -23,80 +23,81 @@
2323
* this requires probabilities, so we add an additional "softmax" layer at the end of our network
2424
* steeper
2525

26-
| | Forward | Backward |
27-
| ------------- | ------------------------------- | ------------------------------- |
28-
| Quadratic | J = 1/2 (y - y\*)^2 | dJ/dy = y - y\* |
29-
| Cross Entropy | J = y\*log(y) + (1-y\*)log(1-y) | dJ/dy = y\*1/y + (1-y\*)1/(y-1) |
26+
| | Forward | Backward |
27+
| ------------- | --------------------------------------- | ----------------------------------------------------- |
28+
| Quadratic | $J = 1/2 (y - y^*)^2$ | $\frac{dJ}{dy} = y - y^*$ |
29+
| Cross Entropy | $J = y^*\log{(y)} + (1-y^*)\log{(1-y)}$ | $\frac{dJ}{dy} = \frac{y^*}{y} + \frac{(1-y^*)}{y-1}$ |
3030

3131
## Multi-class Output
3232

33-
* Softmax: y_k = exp(b_k) / Σ_{l=1}^{K} exp(b_l)
33+
* Softmax: $y_k = \frac{\exp{(b_k)}}{\sum_{l=1}^{K} \exp{(b_l)}}$
34+
* Loss: $J = \sum_{k=1}^K y_k^* \log{(y_k)}$
3435

3536
## Chain Rule
3637

37-
* Def #1
38-
* y = f(u)
39-
* u = g(x)
40-
* dy/dx = dy/du·du/dx
41-
* Def #2
42-
* y = f(u_1,u_2)
43-
* u2 = g2(x)
44-
* u1 = g1(x)
45-
* dy/dx = dy/du1·du1/dx + dy/du2·du2/dx
46-
* **Def #3 Chain Rule**
47-
* y = f(**u**)
48-
* **u** = g(x)
49-
* dy/dx = Σ^K_{k=1}dy/duk·duk/dx
50-
* Holds for any intermediate quantities
38+
* Def #1 Chain Rule
39+
* $y = f(u)$
40+
* $u = g(x)$
41+
* $\frac{dy}{dx} = \frac{dy}{du}·\frac{du}{dx}$
42+
* Def #2 Chain Rule
43+
* $y = f(u_1,u_2)$
44+
* $u_2 = g_2(x)$
45+
* $u_1 = g_1(x)$
46+
* $\frac{dy}{dx} = \frac{dy}{du_1}·\frac{du_1}{dx} + \frac{dy}{du_2}·\frac{du_2}{dx}$
47+
* Def #3 Chain Rule
48+
* $y = f(u)$
49+
* $u = g(x)$
50+
* $\frac{dy}{dx} = \sum_{j=1}^J \frac{dy_i}{du_j}·\frac{du_j}{dx_k}, \forall i,k$
51+
* Backpropagation is just repeated application of the chain rule
5152
* Computation Graphs
5253
* not a Neural Network diagram
5354

5455
## Backpropagation
5556

5657
* Backprop Ex #1
57-
* y = f(x,z) = exp(xz) + xz/log(x) + sin(log(x))/xz
58+
* $y = f(x,z) = \exp(xz) + \frac{xz}{\log(x)} + \frac{\sin(\log(x))}{xz}$
5859
* Forward Computation
59-
* Given x = 2, z = 3
60-
* a = xz, b = log(x), c = sin(b), d = exp(a), e = a / b, f = c / a
61-
* y = d + e + f
60+
* Given $x = 2, z = 3$
61+
* $a = xz, b = log(x), c = sin(b), d = exp(a), e = a / b, f = c / a$
62+
* $y = d + e + f$
6263
* Backgward Computation
63-
* gy = dy/dy = 1
64-
* gf = dy/df = 1, de = dy/dc = 1, gd = dy/gd = 1
65-
* gc = dy/dc = dy/df·df/dc = gf(1/a)
66-
* gb = dy/db = dy/de·de/db + dy/dc·dc/db = (ge)(-a/b^2) + (gc)(cos(b))
67-
* ga = dy/da = dy/dc·de/da + dy/dd·dd/da + dy/df·df/da = (ge)(1/b) + (gd)(exp(a)) + (gf)(-c/a^2)
68-
* gx = (ga)(z) + (gb)(1/x)
69-
* Gz = (ga)(x)
64+
* $gy = dy/dy = 1$
65+
* $gf = dy/df = 1, de = dy/dc = 1, gd = dy/gd = 1$
66+
* $gc = dy/dc = dy/df·df/dc = gf(1/a)$
67+
* $gb = dy/db = dy/de·de/db + dy/dc·dc/db = (ge)(-a/b^2) + (gc)(cos(b))$
68+
* $ga = dy/da = dy/dc·de/da + dy/dd·dd/da + dy/df·df/da = (ge)(1/b) + (gd)(exp(a)) + (gf)(-c/a^2)$
69+
* $gx = (ga)(z) + (gb)(1/x)$
70+
* $g_z = (ga)(x)$
7071
* Updates for Backprop
71-
* gx = dy/dx = Σ^K\_{k=1}dy/duk·duk/x = Σ^K_{k=1}(guk)(duk/dx)
72+
* $gx = \frac{dy}{dx} = \sum_{k=1}^K \frac{dy}{du_k}·\frac{du_k}{x} = \sum_{k=1}^K (gu_k)(\frac{du_k}{dx})$
7273
* Reuse forward computation in backward computation
7374
* Reuse backward computation within itself
7475

7576
## Neural Network Training
7677

7778
* Consider a 2-hidden layer neural nets
78-
* parameters are θ = [α^(1), α^(2), β]
79+
* parameters are $\theta = [\alpha^{(1)}, \alpha^{(2)}, \beta]$
7980
* SGD training
8081
* Iterate until convergence:
81-
* Sample i ∈ {1, ..., N}
82+
* Sample $i \in {1, \cdots, N}$
8283
* Compute gradient by backprop
83-
* gα^(1) = ▽ α^(1)J^(i)(θ)
84-
* gα^(2) = ▽ α^(2)J^(i)(θ)
85-
* = ▽β J^(i)(θ)
86-
* J^(i)) = l(hθ(x^(i)), y^(i))
84+
* $g\alpha^{(1)} = \nabla \alpha^{(1)}J^{(i)}(\theta)$
85+
* $g\alpha^{(2)} = \nabla \alpha^{(2)}J^{(i)}(\theta)$
86+
* $g\beta = \nabla \beta J^{(i)}(\theta)$
87+
* $J^{(i)}(\theta) = \ell(h_\theta(x^{(i)}), y^{(i)})$
8788
* Step opposite the gradient
88-
* α^(1) <- α^(1) - γgα^(1)
89-
* α^(2) <- α^(2) - γgα^(2)
90-
* β <- β - γgβ
89+
* $\alpha^{(1)} \leftarrow \alpha^{(1)} - \gamma g\alpha^{(1)}$
90+
* $\alpha^{(2)} \leftarrow \alpha^{(2)} - \gamma g\alpha^{(2)}$
91+
* $\beta \leftarrow \beta - \gamma g\beta$
9192
* Backprop Ex #2: for neural network
92-
* Given: decision function y^ = hθ(x) = σ((α^(3))^T)·σ((α^(2))^T·σ((α^(1))^T·x))
93-
* loss function J = l(y^,y\*) = y\*log(y^) + (1-y*)log(1-y^)
93+
* Given: decision function $\hat{y} = hθ(x) = \sigma((\alpha^{(3)})^T)·\sigma((\alpha^{(2)})^T·\sigma((\alpha^{(1)})^T·x))$
94+
* loss function $J = \ell(\hat{y},y^*) = y^*\log(\hat{y}) + (1-y^*)\log(1-\hat{y})$
9495
* Forward
95-
* Given x, α^(1), α^(2), α^(3), y*
96-
* z^(0) = x
97-
* for i = 1, 2, 3
98-
* u^(i) = (α^(1))^T·z^(i-1)
99-
* z^(i) = σ(u^(i))
100-
* y^ = z^(3)
101-
* J = l(y^, y*)
96+
* Given $x, \alpha^{(1)}, \alpha^{(2)}, \alpha^{(3)}, y^*$
97+
* $z^{(0)} = x$
98+
* for $i = 1, 2, 3$
99+
* $u^{(i)} = (\alpha^{(1)})^T·z^{(i-1)}$
100+
* $z^{(i)} = \sigma(u^{(i)})$
101+
* $\hat{y} = z^{(3)}$
102+
* $J = \ell(\hat{y}, y^*)$
102103

0 commit comments

Comments
 (0)