-
Notifications
You must be signed in to change notification settings - Fork 0
/
CH01.qmd
334 lines (254 loc) · 8.81 KB
/
CH01.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
### Exercises (Chapter 1)
1. How many rows are in `penguins`? How many columns?
::: {.callout-note icon="false" title="Answer"}
```{r}
#| message: false
#| warning: false
library(tidyverse)
library(palmerpenguins)
# Your R Code here
```
*Your answer here.*
:::
2. What does the `bill_depth_mm` variable in the `penguins` data frame describe? Read the help for `?penguins` to find out.
::: {.callout-note icon="false" title="Answer"}
*Type your answer here.*
:::
3. Make a scatterplot of `bill_depth_mm` vs. `bill_length_mm`. That is, make a scatterplot with `bill_depth_mm` on the y-axis and `bill_length_mm` on the x-axis. Describe the relationship between these two variables.
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
4. What happens if you make a scatterplot of `species` vs. `bill_depth_mm`? What might be a better choice of geom?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
```{r}
# Your R code here
```
:::
5. Why does the following give an error and how would you fix it?
```{r}
#| eval: false
library(tidyverse)
ggplot(data = penguins) +
geom_point()
```
::: {.callout-note icon="false" title="Answer"}
*Your answer here.*
```{r}
# Correct code here
```
:::
6. What does the `na.rm` argument do in `geom_point()`? What is the default value of the argument? Create a scatterplot where you successfully use this argument set to `TRUE`.
::: {.callout-note icon="false" title="Answer"}
*Your answer here.*
```{r}
# Your R code here
```
:::
7. Add the following caption to the plot you made in the previous exercise: "Data come from the `palmerpenguins` package." Hint: Take a look at the documentation for `labs()`.
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
:::
8. Recreate the following visualization. What aesthetic should `bill_depth_mm` be mapped to? And should it be mapped at the global level or at the geom level?
```{r}
#| echo: false
#| warning: false
#| fig-alt: |
#| A scatterplot of body mass vs. flipper length of penguins, colored
#| by bill depth. A smooth curve of the relationship between body mass
#| and flipper length is overlaid. The relationship is positive,
#| fairly linear, and moderately strong.
library(tidyverse)
library(palmerpenguins)
ggplot(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
geom_point(aes(color = bill_depth_mm)) +
geom_smooth()
```
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R Code here
```
*Your answer here.*
:::
9. Run this code in your head and predict what the output will look like. Then, run the code in R and check your predictions.
```{r}
#| eval: false
ggplot(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g, color = island)
) +
geom_point() +
geom_smooth(se = FALSE)
```
::: {.callout-note icon="false" title="Answer"}
*Your answer here.*
```{r}
# Your R code here
```
:::
10. Will these two graphs look different? Why/why not?
```{r}
#| eval: false
ggplot(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
geom_point() +
geom_smooth()
ggplot() +
geom_point(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
geom_smooth(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
)
```
::: {.callout-note icon="false" title="Answer"}
*Your answer here.*
```{r}
#| warning: false
library(patchwork)
# Your R code here
```
:::
11. Make a bar plot of `species` of `penguins`, where you assign `species` to the `y` aesthetic. How is this plot different?
::: {.callout-note icon="false" title="Answer"}
*Your answer here.*
```{r}
# Your R code here
```
:::
12. How are the following two plots different? Which aesthetic, `color` or `fill`, is more useful for changing the color of bars?
```{r}
#| eval: false
ggplot(penguins, aes(x = species)) +
geom_bar(color = "red")
ggplot(penguins, aes(x = species)) +
geom_bar(fill = "red")
```
::: {.callout-note icon="false" title="Answer"}
```{r}
ggplot(penguins, aes(x = species)) +
geom_bar(color = "red") -> p1
ggplot(penguins, aes(x = species)) +
geom_bar(fill = "red") -> p2
p1 + p2
```
*Your answer here.*
:::
13. What does the `bins` argument in `geom_histogram()` do?
::: {.callout-note icon="false" title="Answer"}
*Your answer here.*
:::
14. Make a histogram of the `carat` variable in the `diamonds` dataset that is available when you load the `tidyverse` package. Experiment with different binwidths. What binwidth reveals the most interesting patterns?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
15. The `mpg` data frame that is bundled with the `ggplot2` package contains `r nrow(mpg)` observations collected by the US Environmental Protection Agency on `r mpg |> distinct(model) |> nrow()` car models. Which variables in `mpg` are categorical? Which variables are numerical? (Hint: Type `?mpg` to read the documentation for the dataset.) How can you see this information when you run `mpg`?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
16. Make a scatterplot of `hwy` vs. `displ` using the `mpg` data frame. Next, map a third, numerical variable to `color`, then `size`, then both `color` and `size`, then `shape`. How do these aesthetics behave differently for categorical vs. numerical variables?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
17. In the scatterplot of `hwy` vs. `displ`, what happens if you map a third variable to `linewidth`?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
18. What happens if you map the same variable to multiple aesthetics?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
19. Make a scatterplot of `bill_depth_mm` vs. `bill_length_mm` and color the points by `species`. What does adding coloring by species reveal about the relationship between these two variables? What about faceting by `species`?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
20. Why does the following yield two separate legends? How would you fix it to combine the two legends?
```{r}
#| warning: false
#| fig-show: hide
ggplot(
data = penguins,
mapping = aes(
x = bill_length_mm, y = bill_depth_mm,
color = species, shape = species
)
) +
geom_point() +
labs(color = "Species")
```
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R fix here
```
*Your answer here.*
:::
21. Create the two following stacked bar plots. Which question can you answer with the first one? Which question can you answer with the second one?
```{r}
#| fig-show: hide
ggplot(penguins, aes(x = island, fill = species)) +
geom_bar(position = "fill")
ggplot(penguins, aes(x = species, fill = island)) +
geom_bar(position = "fill")
```
::: {.callout-note icon="false" title="Answer"}
```{r}
ggplot(penguins, aes(x = island, fill = species)) +
geom_bar(position = "fill") -> p1
ggplot(penguins, aes(x = species, fill = island)) +
geom_bar(position = "fill") -> p2
p1 / p2
```
*Your answer here.*
:::
22. Run the following lines of code. Which of the two plots is saved as `mpg-plot.png`? Why?
```{r}
#| eval: false
ggplot(mpg, aes(x = class)) +
geom_bar()
ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point()
ggsave("mpg-plot.png")
```
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your R code here
```
*Your answer here.*
:::
23. What do you need to change in the code above to save the plot as a PDF instead of a PNG? How could you find out what types of image files would work in `ggsave()`?
::: {.callout-note icon="false" title="Answer"}
*Your answer here.*
:::