-
Notifications
You must be signed in to change notification settings - Fork 8
/
README.Rmd
126 lines (100 loc) · 3.75 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
source(file.path("vignettes", "_common.R"))
knitr::opts_chunk$set(
fig.path = "man/figures/README-"
)
```
# epiprocess
The `{epiprocess}` package works with epidemiological time series data and
provides tools to manage, analyze, and process the data in preparation for
modeling. It is designed to work in tandem with
[epipredict](https://cmu-delphi.github.io/epipredict/), which provides
pre-built epiforecasting models and as well as tools to build custom models.
Both packages are designed to lower the barrier to entry and implementation cost
for epidemiological time series analysis and forecasting.
`{epiprocess}` contains:
- `epi_df()` and `epi_archive()`, two data frame classes (that work like a
`{tibble}` with `{dplyr}` verbs) for working with epidemiological time
series data
- `epi_df` is for working with a snapshot of data at a single point in time
- `epi_archive` is for working with histories of data that changes over time
- one of the most common uses of `epi_archive` is for accurate backtesting of
forecasting models, see `vignette("backtesting", package="epipredict")`
- signal processing tools building on these data structures such as
- `epi_slide()` for sliding window operations (aids with feature creation)
- `epix_slide()` for sliding window operations on archives (aids with
backtesting)
- `growth_rate()` for computing growth rates
- `detect_outlr()` for outlier detection
- `epi_cor()` for computing correlations
If you are new to this set of tools, you may be interested learning through a
book format: [Introduction to Epidemiological
Forecasting](https://cmu-delphi.github.io/delphi-tooling-book/).
You may also be interested in:
- `{epidatr}`, for accessing wide range
of epidemiological data sets, including COVID-19 data, flu data, and more.
- [rtestim](https://github.com/dajmcdon/rtestim), a package for estimating
the time-varying reproduction number of an epidemic.
This package is provided by the [Delphi group](https://delphi.cmu.edu/) at
Carnegie Mellon University.
## Installation
To install:
```{r, eval=FALSE}
# Stable version
pak::pkg_install("cmu-delphi/epiprocess@main")
# Dev version
pak::pkg_install("cmu-delphi/epiprocess@dev")
```
The package is not yet on CRAN.
## Usage
Once `epiprocess` and `epidatr` are installed, you can use the following code to
get started:
```{r, results=FALSE, warning=FALSE, message=FALSE}
library(epiprocess)
library(epidatr)
library(dplyr)
library(magrittr)
```
Get COVID-19 confirmed cumulative case data from JHU CSSE for California,
Florida, New York, and Texas, from March 1, 2020 to January 31, 2022
```{r warning=FALSE, message=FALSE}
df <- pub_covidcast(
source = "jhu-csse",
signals = "confirmed_cumulative_num",
geo_type = "state",
time_type = "day",
geo_values = "ca,fl,ny,tx",
time_values = epirange(20200301, 20220131),
as_of = as.Date("2024-01-01")
) %>%
select(geo_value, time_value, cases_cumulative = value)
df
```
Convert the data to an epi_df object and sort by geo_value and time_value. You
can work with an `epi_df` like you can with a `{tibble}` by using `{dplyr}`
verbs
```{r}
edf <- df %>%
as_epi_df(as_of = as.Date("2024-01-01")) %>%
arrange_canonical() %>%
group_by(geo_value) %>%
mutate(cases_daily = cases_cumulative - lag(cases_cumulative, default = 0))
edf
```
Compute the 7 day moving average of the confirmed daily cases for each geo_value
```{r}
edf <- edf %>%
group_by(geo_value) %>%
epi_slide_mean(cases_daily, .window_size = 7, na.rm = TRUE) %>%
rename(smoothed_cases_daily = slide_value_cases_daily)
edf
```
Autoplot the confirmed daily cases for each geo_value
```{r}
edf %>%
autoplot(smoothed_cases_daily)
```