Skip to content

Commit 95164bd

Browse files
[cmu-10601] Lecture 12 Neural Network
includes: Background Example #1: Neural Network with 1 Hidden Layer and 2 Hidden Units Example #2: 1D Face Recognition Neural Network Parameters Architectures Example #3: Arbitrart Feedward Neural Network (Matrix Form) Building a Neural Net
1 parent 872af62 commit 95164bd

File tree

1 file changed

+93
-0
lines changed

1 file changed

+93
-0
lines changed
+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
## Lecture 12 Neural Network
2+
3+
## Background
4+
5+
* Neural Network Model
6+
* Independent variables
7+
* weights
8+
* Hidden Layer
9+
* Weights
10+
* Dependent variable (Prediction)
11+
* Artificial Model
12+
* Neuron: node in a directed acyclic graph (DAG)
13+
* Weight: multiplier on each edge
14+
* Activation Function: nonlinear thresholding function, which allows a neuron to "fire" when the input value is sufficiently high
15+
* Artificial Neural Network: collection of neurons into a DAG, which define some differentiable function
16+
17+
## Example #1: Neural Network with 1 Hidden Layer and 2 Hidden Units
18+
19+
* Let σ be the activation function
20+
* If σ is sigmoid: σ(α) = 1 / (1 + exp(-α))
21+
* xi ∈ R
22+
* zi ∈ (0, 1) if σ is sigmoid
23+
* zi ∈ R more generally
24+
* z1 = σ(α11x1 + α12x2 + α10)
25+
* z2 = σ(α21x1 + α22x2 + α20)
26+
* y = σ(β1z1 + β2z2 + β0) = σ(β1 σ(α11x1 + α12x2 + α10) + β2 σ(α21x1 + α22x2 + α20) + β0)
27+
* (Each is a logistic regression model function)
28+
* (Don't forget the intercept terms)
29+
* y => Pr[Y=1|x1α1β1] => predict using Bayes Optimal Classifier y^ = h_αβ(x) = 1 if y > 0.5; 0 otherwise
30+
31+
## Example #2: 1D Face Recognition
32+
33+
* D = {(1+μ, 0), (3+μ, 1)}
34+
* Is D for classification or regression? Both!
35+
* Which line is learned by linear regression on data set? Z_B(x)
36+
* Z_A(x) = wAx+bA
37+
* Z_B(x) = wBx+bB
38+
* Z_C(x) = wCx+bC
39+
* Which sigmoid is learned by logistic regression?
40+
* h_A(x) = σ(Z_A(x))
41+
* h_B(x) = σ(Z_B(x))
42+
* h_C(x) = σ(Z_C(x))
43+
* What happens if increasing intercept b?
44+
* to z(x)? Shift up OR shift left
45+
* to h(x)? Shift left
46+
* Shift left
47+
* Which changes in h_A(x) if increasing wA? steeper sigmoid
48+
* What is the decision boundary for h_C(x)? the point x = 2
49+
* What is h_E(x) = σ((h_C(x) + h_D(x))/2)
50+
* not σ((Z_C(x) + Z_D(x))/2)
51+
* h_E is the first neural network
52+
* decision boundary is a nonlinear function of x
53+
54+
## Neural Network Parameters
55+
56+
* nonconvex
57+
* no unique set of parameters
58+
59+
## Architectures
60+
61+
* Number of hidden layers (depth)
62+
* Number of units per hidden layer (width)
63+
* Type of activation function (nonlinearity)
64+
* Form of objective function
65+
* How to initialize parameters
66+
67+
## Example #3: Arbitrart Feedward Neural Network (Matrix Form)
68+
69+
* Parameters
70+
* x1 ... xm
71+
* d1 ... d2
72+
* α ∈ R^(M×D1)
73+
* β ∈ R^(D1)
74+
* Computation
75+
* z^(1) = σ((α^(1))^T+b^(1))
76+
* σ applied elementwise to the vector ((α^(1)^T)x+b^(1))
77+
* z^(2) = σ(()(α^(2))^T)z^(1)+b^(2))
78+
* y = σ(β^T z^(2) + β0)
79+
* Fold in the intercept terms?
80+
* Assume x1 = 1 z1^(1) = 1 z1^(2) = 1
81+
* drop β0, b^(1), b^(2)
82+
* Caution: tricky to implement
83+
84+
## Building a Neural Net
85+
86+
* D = M
87+
* D < M
88+
* D > M => Feature Engineering
89+
* Theoretical answer:
90+
* A neural network with 1 hidden layer is a universal function approximator
91+
* For any continuous function g(x), there exists a 1-hidden-layer neural net hθ(x) such that |hθ(x) - g(x| < ε for all x, assuming sigmoid activation
92+
* Empirical answer:
93+
* After 2006, deep networks are easier to train than shallow networks for many problems

0 commit comments

Comments
 (0)