-
Notifications
You must be signed in to change notification settings - Fork 1
/
03_functions.Rmd
139 lines (101 loc) · 3.25 KB
/
03_functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "functions"
author: "John Little"
date: "`r Sys.Date()`"
output: html_notebook
---
## Functions
Custom functions can be used to accomplish different operations -- or clean up code. A standard rule of thumb is that if you have to type the same code three or more times, it's worth the effort to make that code into a function. While this is certainly true for experienced coders, uninitiated coders may spend more time writing the function. Nonetheless the time saved by not troubleshooting typos could be even greater. Bottom line: functional programming is useful. For a more complete treatment of [functions, see the chapter](https://r4ds.had.co.nz/functions.html) in _R for Data Science_ by Wickham and Grolemund.
```{r}
library(tidyverse)
```
## first look
```{r}
multiplybytwo <- function(n) {
return(2 * n)
}
plotcars <- function(df) {
plot(df)
}
```
```{r}
multiplybytwo(9)
plotcars(cars)
```
```{r}
my_hm <- starwars |>
select(height, mass) |>
drop_na()
plotcars(my_hm)
```
A tidyverse way to make a function:
```{r}
make_key <- . %>%
mutate(key = name) %>%
mutate(key = str_to_lower(key)) %>%
mutate(key = str_extract(key, "\\w+$")) %>%
mutate(first_part = str_to_lower(str_extract(name, "^\\w{1,3}"))) %>%
mutate(key = str_glue("{key}_{first_part}")) %>%
select(-first_part)
starwars |>
make_key() |>
select(name, key)
```
The base-R way to do the same as above:
```{r}
my_key_func <- function(df) {
df |>
mutate(key = name) |>
mutate(key = str_to_lower(key)) |>
mutate(key = str_extract(key, "\\w+$")) |>
mutate(first_part = str_to_lower(str_extract(name, "^\\w{1,3}"))) |>
mutate(key = str_glue("{key}_{first_part}")) |>
select(-first_part)
}
starwars |>
my_key_func() |>
select(name, key)
```
Or, a more complicated regex that accomplishes nearly the same goals as the custom functions above.
```{r}
starwars |>
mutate(foo = str_to_lower(str_replace(name, "(^\\w{1,3}).*[-\\s](\\w+$)", "\\2_\\1"))) |>
select(name, foo)
```
## Tidy evaluation
Read more about [Data maksing, env-variables, and indirection](https://dplyr.tidyverse.org/articles/programming.html#data-masking)
In short, you need to use curly-curly braces (i.e. _embrace_) around the tibble's variable names when referencing those variable names within your function.
```{r}
foo_df <- starwars
make_me_a_plot <- function(df) {
df |>
ggplot(aes(height, mass)) +
geom_point()
}
make_me_a_plot(foo_df)
```
```{r}
multiply_by_two <- function(my_x) {
mutate(my_x, doubled = height * 2)
}
multiply_by_two(starwars) |>
select(height, doubled)
```
## Data masking and indirection
Data masking makes it easy and efficient to code in R/Tidyverse. But it paradoxically makes some things harder. I recommend reading about [Tidy evaluation](
https://dplyr.tidyverse.org/articles/programming.html) to understand the differences between environment variables and data variables.
```{r}
foo_two <- function(my_df, my_var, ...) {
mutate(my_df, doubled = {{ my_var }} * 2) |>
select({{ my_var}}, doubled)
}
foo_two(starwars, height)
foo_two(starwars, mass)
```
```{r}
starwars$height * 2
map_dbl(starwars$height, ~ .x * 2)
starwars |>
mutate(double_vec = map_dbl(height, ~ .x * 2)) |>
select(height, double_vec)
```