-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathstring-handling.Rmd
87 lines (67 loc) · 2.34 KB
/
string-handling.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
title: 'Working with character strings'
---
## Working with character strings {- #string-handling}
```{r, include=FALSE}
library(tidyverse)
library(tufte)
library(pander)
library(reshape2)
```
### Searching and replacing
If you want to search inside a string there are lots of useful functions in the
`stringr::` library. These replicate some functionality in base R, but like
other packages in the 'tidyverse' they tend to be more consistent and easier to
use. For example:
```{r}
cheese <- c("Stilton", "Brie", "Cheddar")
stringr::str_detect(cheese, "Br")
stringr::str_locate(cheese, "i")
stringr::str_replace(cheese, "Stil", "Mil")
```
### Using `paste` to make labels {- #paste}
Paste can combine character strings with other types of variable to produce a
new vector:
```{r}
paste(mtcars$cyl, "cylinders")[1:10]
```
Which can be a useful way to label graphs:
```{r}
mtcars %>%
ggplot(aes(paste(mtcars$cyl, "cylinders"), mpg)) +
geom_boxplot() + xlab("")
```
### Fixing up `variable` after melting {- #separate-and-extract}
In this example `melt()` creates a new column called `variable`.
```{r, include=F}
sleep.wide <- readRDS('data/sleep.wide.RDS')
```
```{r}
sleep.wide %>%
melt(id.var="Subject") %>%
arrange(Subject, variable) %>%
head
```
However the contents of `variable` are now a character string (i.e. a list of
letters and numbers) rather than numeric values (see
[column types](#factors-and-numerics)) but in this instance we know that the
values `Day.1`, `Day.2`... are not really separate categories but actually form
a linear sequence, from 1 to 9.
We can use the `extract` or `separate` functions to split up `variable` and
create a numeric column for `Day`:
```{r include=F}
sleep.long <- readRDS('data/sleep.long.RDS')
```
```{r}
sleep.long %>%
separate(variable, c("variable", "Day")) %>%
mutate(Day=as.numeric(Day)) %>%
arrange(Subject) %>%
head %>% pander
```
See the user guide for `separate` and `extract` for more details.
[If you are familiar with
[regular expressions](https://code.tutsplus.com/tutorials/you-dont-know-anything-about-regular-expressions-a-complete-guide--net-7869)
you will be happy to know that you can use regex to separate variables using
`extract` and `separate`.
[See this guide for more details on how `separate` and `extract` work](https://rpubs.com/bradleyboehmke/data_wrangling)]{.explainer}