-
Notifications
You must be signed in to change notification settings - Fork 0
/
CH03.qmd
386 lines (297 loc) · 9.11 KB
/
CH03.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
```{r}
#| include: false
library(tidyverse)
library(nycflights13)
```
### Exercises (Chapter 3)
1. In a single pipeline for each condition, find all flights that meet the condition:
- Had an arrival delay of two or more hours
::: {.callout-note icon="false" title="Answer"}
```{r}
#| message: false
#| warning: false
library(tidyverse)
library(nycflights13)
# Your code here
```
:::
- Flew to Houston (`IAH` or `HOU`)
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
- Were operated by United, American, or Delta
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
- Departed in summer (July, August, and September)
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
- Arrived more than two hours late, but didn't leave late.
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
- Were delayed by at least an hour, but made up over 30 minutes in flight
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
2. Sort `flights` to find the flights with longest departure delays. Find the flights that left earliest in the morning.
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
3. Sort `flights` to find the fastest flights. (Hint: Try including a math calculation inside of your function.)
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
4. Was there a flight on every day of 2013?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
*Your text answer here.*
:::
5. Which flights traveled the farthest distance? Which traveled the least distance?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
6. Does it matter what order you used `filter()` and `arrange()` if you're using both? Why/why not? Think about the results and how much work the functions would have to do.
```{r}
#| eval: false
#| echo: false
# Your code here
```
::: {.callout-note icon="false" title="Answer"}
*Your text answer here.*
:::
7. Compare `dep_time`, `sched_dep_time`, and `dep_delay`. How would you expect those three numbers to be related?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
*Your text answer here.*
:::
8. Brainstorm as many ways as possible to select `dep_time`, `dep_delay`, `arr_time`, and `arr_delay` from `flights`.
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
9. What happens if you specify the name of the same variable multiple times in a `select()` call?
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
*Your text answer here.*
:::
10. What does the `any_of()` function do? Why might it be helpful in conjunction with this vector?
```{r}
variables <- c("year", "month", "day", "dep_delay", "arr_delay")
# Try below first
flights |>
select(variables)
# Or
flights |>
select(any_of(variables))
```
::: {.callout-note icon="false" title="Answer"}
*Your text answer here.*
:::
11. Does the result of running the following code surprise you? How do the select helpers deal with upper and lower case by default? How can you change that default?
```{r}
#| eval: false
flights |>
select(contains("TIME"))
```
::: {.callout-note icon="false" title="Answer"}
```{r}
flights |>
select(contains("TIME"))
```
*Your text answer here.*
:::
12. Rename `air_time` to `air_time_min` to indicate units of measurement and move it to the beginning of the data frame.
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
:::
13. Why doesn't the following work, and what does the error mean?
```{r}
#| error: true
flights |>
select(tailnum) |>
arrange(arr_delay)
```
```{r}
flights |>
select(tailnum)
```
::: {.callout-note icon="false" title="Answer"}
*Your text answer here.*
:::
14. Which carrier has the worst average delays? Challenge: can you disentangle the effects of bad airports vs. bad carriers? Why/why not? (Hint: think about `flights |> group_by(carrier, dest) |> summarize(n())`)
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
*Your text answer here.*
:::
15. Find the flights that are most delayed upon departure from each destination.
::: {.callout-note icon="false" title="Answer"}
```{r}
# Your code here
```
*Your text answer here.*
:::
16. How do delays vary over the course of the day. Illustrate your answer with a plot.
::: {.callout-note icon="false" title="Answer"}
```{r}
#| warning: false
# Your code here
```
*Your text answer here.*
:::
17. What happens if you supply a negative `n` to `slice_min()` and friends?
::: {.callout-note icon="false" title="Answer"}
```{r}
flights |>
slice_min(dep_delay, n = -5) |>
relocate(dep_delay)
flights |>
slice_min(dep_delay, n = 5) |>
relocate(dep_delay)
flights |>
slice_max(dep_delay, n = -5) |>
relocate(dep_delay)
flights |>
slice_max(dep_delay, n = 5) |>
relocate(dep_delay)
```
*Your text answer here.*
:::
18. Explain what `count()` does in terms of the dplyr verbs you just learned. What does the `sort` argument to `count()` do?
::: {.callout-note icon="false" title="Answer"}
```{r}
flights |>
count(origin, dest, sort = FALSE) # sort = FALSE by default
flights |>
count(origin, dest, sort = TRUE)
```
*Your text answer here.*
:::
19. Suppose we have the following tiny data frame:
```{r}
df <- tibble(
x = 1:5,
y = c("a", "b", "a", "a", "b"),
z = c("K", "K", "L", "L", "K")
)
```
a. Write down what you think the output will look like, then check if you were correct, and describe what `group_by()` does.
```{r}
#| eval: false
df |>
group_by(y)
```
::: {.callout-note icon="false" title="Answer"}
```{r}
df |>
group_by(y)
```
*Your text answer here.*
:::
b. Write down what you think the output will look like, then check if you were correct, and describe what `arrange()` does. Also comment on how it's different from the `group_by()` in part (a).
```{r}
#| eval: false
df |>
arrange(y)
```
::: {.callout-note icon="false" title="Answer"}
```{r}
df |>
arrange(y)
```
*Your text answer here.*
:::
c. Write down what you think the output will look like, then check if you were correct, and describe what the pipeline does.
```{r}
#| eval: false
df |>
group_by(y) |>
summarize(mean_x = mean(x))
```
::: {.callout-note icon="false" title="Answer"}
```{r}
df |>
group_by(y) |>
summarize(mean_x = mean(x))
```
*Your text answer here.*
:::
d. Write down what you think the output will look like, then check if you were correct, and describe what the pipeline does. Then, comment on what the message says.
```{r}
#| eval: false
df |>
group_by(y, z) |>
summarize(mean_x = mean(x))
```
::: {.callout-note icon="false" title="Answer"}
```{r}
df |>
group_by(y, z) |>
summarize(mean_x = mean(x))
```
*Your text answer here.*
:::
e. Write down what you think the output will look like, then check if you were correct, and describe what the pipeline does. How is the output different from the one in part (d)?
```{r}
#| eval: false
df |>
group_by(y, z) |>
summarize(mean_x = mean(x), .groups = "drop")
```
::: {.callout-note icon="false" title="Answer"}
```{r}
df |>
group_by(y, z) |>
summarize(mean_x = mean(x), .groups = "drop")
```
*Your text answer here.*
:::
f. Write down what you think the outputs will look like, then check if you were correct, and describe what each pipeline does. How are the outputs of the two pipelines different?
```{r}
#| eval: false
df |>
group_by(y, z) |>
summarize(mean_x = mean(x))
df |>
group_by(y, z) |>
mutate(mean_x = mean(x))
```
::: {.callout-note icon="false" title="Answer"}
```{r}
df |>
group_by(y, z) |>
summarize(mean_x = mean(x))
df |>
group_by(y, z) |>
mutate(mean_x = mean(x))
```
*Your text answer here.*
:::