03_functions.Rmd

---
title: "functions"
author: "John Little"
date: "`r Sys.Date()`"
output: html_notebook
---

## Functions

Custom functions can be used to accomplish different operations -- or clean up code.  A standard rule of thumb is that if you have to type the same code three or more times, it's worth the effort to make that code into a function.  While this is certainly true for experienced coders, uninitiated coders may spend more time writing the function.  Nonetheless the time saved by not troubleshooting typos could be even greater. Bottom line: functional programming is useful.  For a more complete treatment of [functions, see the chapter](https://r4ds.had.co.nz/functions.html) in _R for Data Science_ by Wickham and Grolemund.

```{r}
library(tidyverse)
```


## first look

```{r}
multiplybytwo <- function(n) {
    return(2 * n)
}

plotcars <- function(df) {
  plot(df)
}

```

```{r}
multiplybytwo(9)
plotcars(cars)
```


```{r}
my_hm <- starwars |>  
  select(height, mass) |>  
  drop_na()

plotcars(my_hm)
```
A tidyverse way to make a function:

```{r}
make_key <- . %>% 
  mutate(key = name) %>%  
  mutate(key = str_to_lower(key)) %>%  
  mutate(key = str_extract(key, "\\w+$")) %>% 
  mutate(first_part = str_to_lower(str_extract(name, "^\\w{1,3}")))  %>% 
  mutate(key = str_glue("{key}_{first_part}")) %>% 
  select(-first_part)

starwars |>
  make_key() |>
  select(name, key)
```

The base-R way to do the same as above:

```{r}
my_key_func <- function(df) {
  df |> 
  mutate(key = name) |>  
  mutate(key = str_to_lower(key)) |>  
  mutate(key = str_extract(key, "\\w+$")) |> 
  mutate(first_part = str_to_lower(str_extract(name, "^\\w{1,3}")))  |> 
  mutate(key = str_glue("{key}_{first_part}")) |> 
  select(-first_part)
}

starwars |>  
  my_key_func() |>
  select(name, key)
```

Or, a more complicated regex that accomplishes nearly the same goals as the custom functions above. 

```{r}
starwars |> 
  mutate(foo = str_to_lower(str_replace(name, "(^\\w{1,3}).*[-\\s](\\w+$)", "\\2_\\1"))) |>
  select(name, foo)
```

## Tidy evaluation

Read more about [Data maksing, env-variables, and indirection](https://dplyr.tidyverse.org/articles/programming.html#data-masking)

In short, you need to use curly-curly braces (i.e. _embrace_) around the tibble's variable names when referencing those variable names within your function.

```{r}
foo_df <- starwars

make_me_a_plot <- function(df) {
  df |>  
    ggplot(aes(height, mass)) +
    geom_point() 
}

make_me_a_plot(foo_df)
```


```{r}
multiply_by_two <- function(my_x) {
  mutate(my_x, doubled = height * 2)
}

multiply_by_two(starwars) |>
  select(height, doubled)
```

## Data masking and indirection

Data masking makes it easy and efficient to code in R/Tidyverse. But it paradoxically makes some things harder.  I recommend reading about [Tidy evaluation](
https://dplyr.tidyverse.org/articles/programming.html) to understand the differences between environment variables and data variables.

```{r}
foo_two <- function(my_df, my_var, ...) {
  mutate(my_df, doubled = {{ my_var }} * 2) |>
  select({{ my_var}}, doubled)
}

foo_two(starwars, height)

foo_two(starwars, mass) 
```

```{r}
starwars$height * 2

map_dbl(starwars$height, ~ .x * 2)

starwars |>
  mutate(double_vec = map_dbl(height, ~ .x * 2)) |>
  select(height, double_vec) 
```