generated from r4ds/bookclub-template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
05_function-minimization-with-autograd.Rmd
111 lines (70 loc) · 3.01 KB
/
05_function-minimization-with-autograd.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# Function minimization with autograd
**Learning objectives:**
- Apply concepts learned in the previous two chapters
## An Optimization Classic {.unnumbered}
**Example**:
*Rosenbrock function*: A function of two variables with minimum at $(a,a^2)$, which lies inside a narrow valley:
$$
(a- x_1)^2 + b(x_2 - x_1^2)^2
$$
<center>[![rosenbrock function](images/rosenbrock.png)](https://skeydan.github.io/Deep-Learning-and-Scientific-Computing-with-R-torch/optim_1.html#an-optimization-classic)</center>
Below we set values for `a` and `b` and define the `rosenbrock` function. We expect the minimum of the function to be at $(1,1)$ (when `a=1`).
```{r}
a <- 1
b <- 5
rosenbrock <- function(x){
x1 <- x[1]
x2 <- x[2]
(a - x1)^2 + b * (x2 - x1^2)^2
}
```
## Minimization from Scratch {.unnumbered}
**Goal**: Starting from a point `(x1, x2)` find minimum of the Rosenbrock function.
**Approach**: Use the function's gradient.
**Setup**:
```{r}
library(torch)
lr <- 0.01 # learning rate
num_interations <- 1000
x <- torch_tensor(c(-1, 1), requires_grad = TRUE)
```
`x` is the parameter with respect to which we want to compute the function's derivative. Thus, we set `requires_grad = TRUE`. We have arbitrarily chosen `x = (-1, 1)` as a starting point of our search.
Next we perform the minimization. For each iteration we will:
1. Compute the value of the `rosenbrock` function at the current value of `x`.
2. Compute the gradient at `x` (i.e. direction of steepest ascent).
3. Take a step of size `lr` in the (negative) direction of the gradient.
4. Repeat.
A few things to point out about the code below:
- We use the `with_no_grad()` function. Reason: Because we set `requires_grad = TRUE` in the definition of `x`, torch will include all operations on `x` (including this one) in the derivative calculation, which we don't want.
- Recall from [Chapter 3](https://r4ds.github.io/bookclub-torch/operations-on-tensors.html) that `x$sub_()` (*with an underscore*) will modify the value of `x`. Similarly, `x$grad$zero_()` will also modify `x`.
- We use `x$grad$zero_()` to zero out the `grad` field of `x`. By default, torch accumulates gradients.
```{r}
for(i in 1:num_interations){
if(i %% 200 == 0) cat("Iteration: ", i, "\n")
# Compute value of function:
value <- rosenbrock(x)
if(i %% 200 == 0) cat("Value is: ", as.numeric(value), "\n")
# Compute the gradient
value$backward()
if(i %% 200 == 0) cat("Gradient is: ", as.matrix(x$grad), "\n\n")
with_no_grad({
x$sub_(lr * x$grad) # Take a step of size lr in the (negative) direction of the gradient
x$grad$zero_() # Zero out grad field of x.
})
}
```
Let's check the value of `x`:
```{r}
x
```
It's close to (1,1) (the true minimum)!
> Exercise: What kind of difference does the learning rate make? Try `lr=0.001` and `lr=0.1`, respectively.
## Meeting Videos {.unnumbered}
### Cohort 1 {.unnumbered}
`r knitr::include_url("https://www.youtube.com/embed/URL")`
<details>
<summary>Meeting chat log</summary>
```
LOG
```
</details>