diff --git a/dev/articles/compactify_files/figure-html/unnamed-chunk-6-1.png b/dev/articles/compactify_files/figure-html/unnamed-chunk-6-1.png index 8c6396fd..feae9911 100644 Binary files a/dev/articles/compactify_files/figure-html/unnamed-chunk-6-1.png and b/dev/articles/compactify_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/dev/articles/slide.html b/dev/articles/slide.html index 2567c50b..de293231 100644 --- a/dev/articles/slide.html +++ b/dev/articles/slide.html @@ -107,9 +107,8 @@ as.Date("2022-01-02"). Alternatively, the time step can be specified manually in the call to epi_slide(); you can read the documentation for more details. Furthermore, the alignment of the -running window used in epi_slide() can be “right”, -“center”, or “left”; the default is “right”, and is what we use in this -vignette.

+running window used in epi_slide() is specified by +before and after.

As in getting started guide, we’ll fetch daily reported COVID-19 cases from CA, FL, NY, and TX (note: here we’re using new, not cumulative cases) using the epidatr diff --git a/dev/news/index.html b/dev/news/index.html index a8eb03a6..eb68e026 100644 --- a/dev/news/index.html +++ b/dev/news/index.html @@ -81,6 +81,11 @@

ImprovementsFix logic to auto-assign epi_df time_type to week (#416) and year (#441).
  • Clarified “Get started” example of getting Ebola line list data into epi_df format.
  • Improved documentation web site landing page’s introduction.
  • +
  • Fixed documentation referring to old epi_slide() interface (#466, thanks @XuedaShen!).
  • + +
    +

    Cleanup

    +
    diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml index b638275a..0264263f 100644 --- a/dev/pkgdown.yml +++ b/dev/pkgdown.yml @@ -11,7 +11,7 @@ articles: growth_rate: growth_rate.html outliers: outliers.html slide: slide.html -last_built: 2024-06-20T23:02Z +last_built: 2024-06-21T11:34Z urls: reference: https://cmu-delphi.github.io/epiprocess/reference article: https://cmu-delphi.github.io/epiprocess/articles diff --git a/dev/reference/as_epi_df.html b/dev/reference/as_epi_df.html index ef3fe7e1..2efce053 100644 --- a/dev/reference/as_epi_df.html +++ b/dev/reference/as_epi_df.html @@ -251,7 +251,7 @@

    Examples#> [1] "week" #> #> $as_of -#> [1] "2024-06-20 23:03:00 UTC" +#> [1] "2024-06-21 11:34:26 UTC" #> #> $other_keys #> [1] "state" "pol" diff --git a/dev/search.json b/dev/search.json index 932ba384..82daea43 100644 --- a/dev/search.json +++ b/dev/search.json @@ -1 +1 @@ -[{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"setting-up-the-development-environment","dir":"","previous_headings":"","what":"Setting up the development environment","title":"NA","text":"","code":"install.packages(c('devtools', 'pkgdown', 'styler', 'lintr')) # install dev dependencies devtools::install_deps(dependencies = TRUE) # install package dependencies devtools::document() # generate package meta data and man files devtools::build() # build package"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"validating-the-package","dir":"","previous_headings":"","what":"Validating the package","title":"NA","text":"","code":"styler::style_pkg() # format code lintr::lint_package() # lint code devtools::test() # test package devtools::check() # check package for errors"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"developing-the-documentation-site","dir":"","previous_headings":"","what":"Developing the documentation site","title":"NA","text":"CI builds two version documentation: https://cmu-delphi.github.io/epiprocess/ main branch https://cmu-delphi.github.io/epiprocess/dev dev branch. documentation site can previewed locally running R: open browser, can try using Python server command line:","code":"# Should automatically open a browser pkgdown::build_site(preview=TRUE) R -e 'devtools::document()' R -e 'pkgdown::build_site()' python -m http.server -d docs"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"versioning","dir":"","previous_headings":"","what":"Versioning","title":"NA","text":"Please follow guidelines PR template document.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"planned-cran-release-process","dir":"","previous_headings":"","what":"Planned CRAN release process","title":"NA","text":"Open release issue copy follow checklist issue (modified checklist generated usethis::use_release_issue(version = \"1.0.2\")): git pull dev branch. Make sure changes committed pushed. Check current CRAN check results. Aim 10/10, notes. check works well enough, merge main. Otherwise open PR fix . guidelines. git checkout main git pull may choke MIT license url, ’s ok. devtools::build_readme() devtools::check_win_devel() maintainer (“cre” description) check email problems. may choke, sensitive binary versions packages given system. Either bypass ask someone else run ’re concerned. Update cran-comments.md PR changes (go list ) dev run list . Submit CRAN: devtools::submit_cran(). Maintainer approves email. Wait CRAN… accepted 🎉, move next steps. rejected, fix resubmit. Open merge PR containing updates made main back dev. usethis::use_github_release(publish = FALSE) (publish , otherwise won’t push) create draft release based commit hash CRAN-SUBMISSION push tag GitHub repo. Go repo, verify release notes, publish ready.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 epiprocess authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"recycling-outputs","dir":"Articles","previous_headings":"","what":"Recycling outputs","title":"Advanced sliding with nonstandard outputs","text":"computation returns single atomic value, epi_slide() internally try recycle output size stable (sense described ). can use advantage, example, order compute trailing average marginally geo values, demonstrate simple synthetic example. slide computation returns atomic vector (rather single value) epi_slide() checks whether return length ensures size stability, , uses fill new column. example, next computation gives result last one. However, output atomic vector (rather single value) size stable, epi_slide() throws error. example, trying return 2 things 3 states.","code":"library(epiprocess) library(dplyr) set.seed(123) edf <- tibble( geo_value = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), time_value = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(geo_value)), x = seq_along(geo_value) + 0.01 * rnorm(length(geo_value)), ) %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) # 2-day trailing average, per geo value edf %>% group_by(geo_value) %>% epi_slide(x_2dav = mean(x), before = 1) %>% ungroup() ## An `epi_df` object, 9 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 4 ## geo_value time_value x x_2dav ## * ## 1 ca 2020-06-01 0.994 0.994 ## 2 ca 2020-06-02 2.00 1.50 ## 3 ca 2020-06-03 3.02 2.51 ## 4 fl 2020-06-01 4.00 4.00 ## 5 fl 2020-06-02 5.00 4.50 ## 6 fl 2020-06-03 6.02 5.51 ## 7 pa 2020-06-01 7.00 7.00 ## 8 pa 2020-06-02 7.99 7.50 ## 9 pa 2020-06-03 8.99 8.49 # 2-day trailing average, marginally edf %>% epi_slide(x_2dav = mean(x), before = 1) ## An `epi_df` object, 9 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 4 ## geo_value time_value x x_2dav ## * ## 1 ca 2020-06-01 0.994 4.00 ## 2 fl 2020-06-01 4.00 4.00 ## 3 pa 2020-06-01 7.00 4.00 ## 4 ca 2020-06-02 2.00 4.50 ## 5 fl 2020-06-02 5.00 4.50 ## 6 pa 2020-06-02 7.99 4.50 ## 7 ca 2020-06-03 3.02 5.50 ## 8 fl 2020-06-03 6.02 5.50 ## 9 pa 2020-06-03 8.99 5.50 edf %>% epi_slide(y_2dav = rep(mean(x), 3), before = 1) ## An `epi_df` object, 9 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 4 ## geo_value time_value x y_2dav ## * ## 1 ca 2020-06-01 0.994 4.00 ## 2 fl 2020-06-01 4.00 4.00 ## 3 pa 2020-06-01 7.00 4.00 ## 4 ca 2020-06-02 2.00 4.50 ## 5 fl 2020-06-02 5.00 4.50 ## 6 pa 2020-06-02 7.99 4.50 ## 7 ca 2020-06-03 3.02 5.50 ## 8 fl 2020-06-03 6.02 5.50 ## 9 pa 2020-06-03 8.99 5.50 edf %>% epi_slide(x_2dav = rep(mean(x), 2), before = 1) ## Error in `.f()`: ## ! The slide computations must either (a) output a single element/row ## each, or (b) one element/row per appearance of the reference time value in ## the local window."},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"multi-column-outputs","dir":"Articles","previous_headings":"","what":"Multi-column outputs","title":"Advanced sliding with nonstandard outputs","text":"Now move outputs data frames single row multiple columns. Working type output structure fact already demonstrated slide vignette. set as_list_col = TRUE call epi_slide(), resulting epi_df object returned epi_slide() list column containing slide values. use as_list_col = FALSE (default epi_slide()), function unnests (sense tidyr::unnest()) list column , resulting epi_df multiple new columns containing slide values. default name unnested columns prefixing name assigned list column () onto column names output data frame slide computation (x_2dav x_2dma) separated “_“. can use names_sep = NULL (gets passed tidyr::unnest()) drop prefix associated list column name, naming unnested columns. Furthermore, epi_slide() recycle single row data frame needed order make result size stable, just like case atomic values.","code":"edf2 <- edf %>% group_by(geo_value) %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = TRUE ) %>% ungroup() class(edf2$a) ## [1] \"list\" length(edf2$a) ## [1] 9 edf2$a[[2]] ## x_2dav x_2dma ## 1 1.496047 0.7437485 edf %>% group_by(geo_value) %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = FALSE ) %>% ungroup() ## An `epi_df` object, 9 x 5 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 5 ## geo_value time_value x a_x_2dav a_x_2dma ## * ## 1 ca 2020-06-01 0.994 0.994 0 ## 2 ca 2020-06-02 2.00 1.50 0.744 ## 3 ca 2020-06-03 3.02 2.51 0.755 ## 4 fl 2020-06-01 4.00 4.00 0 ## 5 fl 2020-06-02 5.00 4.50 0.742 ## 6 fl 2020-06-03 6.02 5.51 0.753 ## 7 pa 2020-06-01 7.00 7.00 0 ## 8 pa 2020-06-02 7.99 7.50 0.729 ## 9 pa 2020-06-03 8.99 8.49 0.746 edf %>% group_by(geo_value) %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = FALSE, names_sep = NULL ) %>% ungroup() ## An `epi_df` object, 9 x 5 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 5 ## geo_value time_value x x_2dav x_2dma ## * ## 1 ca 2020-06-01 0.994 0.994 0 ## 2 ca 2020-06-02 2.00 1.50 0.744 ## 3 ca 2020-06-03 3.02 2.51 0.755 ## 4 fl 2020-06-01 4.00 4.00 0 ## 5 fl 2020-06-02 5.00 4.50 0.742 ## 6 fl 2020-06-03 6.02 5.51 0.753 ## 7 pa 2020-06-01 7.00 7.00 0 ## 8 pa 2020-06-02 7.99 7.50 0.729 ## 9 pa 2020-06-03 8.99 8.49 0.746 edf %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = FALSE, names_sep = NULL ) ## An `epi_df` object, 9 x 5 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 5 ## geo_value time_value x x_2dav x_2dma ## * ## 1 ca 2020-06-01 0.994 4.00 4.45 ## 2 fl 2020-06-01 4.00 4.00 4.45 ## 3 pa 2020-06-01 7.00 4.00 4.45 ## 4 ca 2020-06-02 2.00 4.50 3.71 ## 5 fl 2020-06-02 5.00 4.50 3.71 ## 6 pa 2020-06-02 7.99 4.50 3.71 ## 7 ca 2020-06-03 3.02 5.50 3.69 ## 8 fl 2020-06-03 6.02 5.50 3.69 ## 9 pa 2020-06-03 8.99 5.50 3.69"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"multi-row-outputs","dir":"Articles","previous_headings":"","what":"Multi-row outputs","title":"Advanced sliding with nonstandard outputs","text":"slide computation outputs data frame one row, behavior analogous slide computation outputs atomic vector. Meaning, epi_slide() check result size stable, , fill new column(s) resulting epi_df object appropriately. can convenient modeling following sense: can, example, fit sliding, data-versioning-unaware nowcasting forecasting model pooling data different locations, return separate forecasts common model location. use synthetic example demonstrate idea abstractly simply forecasting (actually, nowcasting) y x fitting time-windowed linear model pooling data across locations. example focused simplicity show work multi-row outputs. Note however, following issues example: lm fitting data includes testing instances, training-test split performed. Adding simple training-test split factor reporting latency properly. Data revisions taken account. three factors contribute unrealistic retrospective forecasts overly optimistic retrospective performance evaluations. Instead, one favor epix_slide realistic “pseudoprospective” forecasts. Using epix_slide also makes easier express certain types forecasts; epi_slide, forecasts additional aheads quantile levels need expressed additional columns, nested inside list columns, epix_slide perform size stability checks recycling, allowing computations output number rows.","code":"edf$y <- 2 * edf$x + 0.05 * rnorm(length(edf$x)) edf %>% epi_slide(function(d, ...) { obj <- lm(y ~ x, data = d) return( as.data.frame( predict(obj, newdata = d %>% group_by(geo_value) %>% filter(time_value == max(time_value)), interval = \"prediction\", level = 0.9 ) ) ) }, before = 1, new_col_name = \"fc\", names_sep = NULL) ## Warning: `f` might not have enough positional arguments before its `...`; in the current ## `epi[x]_slide` call, the group key and reference time value will be included in ## `f`'s `...`; if `f` doesn't expect those arguments, it may produce confusing ## error messages ## An `epi_df` object, 9 x 7 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 7 ## geo_value time_value x y fit lwr upr ## * ## 1 ca 2020-06-01 0.994 1.97 1.96 1.87 2.06 ## 2 fl 2020-06-01 4.00 8.02 8.03 7.95 8.11 ## 3 pa 2020-06-01 7.00 14.1 14.1 14.0 14.2 ## 4 ca 2020-06-02 2.00 4.06 4.01 3.91 4.11 ## 5 fl 2020-06-02 5.00 10.0 10.0 9.94 10.1 ## 6 pa 2020-06-02 7.99 16.0 16.0 15.9 16.1 ## 7 ca 2020-06-03 3.02 6.05 6.07 5.96 6.17 ## 8 fl 2020-06-03 6.02 12.0 12.0 11.9 12.1 ## 9 pa 2020-06-03 8.99 17.9 17.9 17.8 18.0"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"version-aware-forecasting-revisited","dir":"Articles","previous_headings":"","what":"Version-aware forecasting, revisited","title":"Advanced sliding with nonstandard outputs","text":"revisit COVID-19 forecasting example archive vignette order demonstrate preceding points regarding forecast evaluation realistic setting. First, fetch versioned data build archive. Next, extend ARX function handle multiple geo values, since present case, grouping geo value slide computation run multiple geo values . Note , epix_slide() returns grouping variables, time_value, slide computations eventual returned tibble, need include geo_value column output data frame ARX computation. now make forecasts archive compare forecasts latest data. can see forecasts, come training ARX model jointly CA FL, exhibit generally less variability wider prediction bands compared ones archive vignette, come training separate ARX model state. archive vignette, can see difference version-aware (right column) -unaware (left column) forecasting, well.","code":"library(epidatr) library(data.table) library(ggplot2) theme_set(theme_bw()) y1 <- pub_covidcast( source = \"doctor-visits\", signals = \"smoothed_adj_cli\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl\", time_value = epirange(20200601, 20211201), issues = epirange(20200601, 20211201) ) y2 <- pub_covidcast( source = \"jhu-csse\", signal = \"confirmed_7dav_incidence_prop\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl\", time_value = epirange(20200601, 20211201), issues = epirange(20200601, 20211201) ) x <- y1 %>% select(geo_value, time_value, version = issue, percent_cli = value ) %>% as_epi_archive(compactify = FALSE) # mutating merge operation: x <- epix_merge( x, y2 %>% select(geo_value, time_value, version = issue, case_rate_7d_av = value ) %>% as_epi_archive(compactify = FALSE), sync = \"locf\", compactify = FALSE ) library(tidyr) library(purrr) ## ## Attaching package: 'purrr' ## The following object is masked from 'package:data.table': ## ## transpose prob_arx_args <- function(lags = c(0, 7, 14), ahead = 7, min_train_window = 20, lower_level = 0.05, upper_level = 0.95, symmetrize = TRUE, intercept = FALSE, nonneg = TRUE) { return(list( lags = lags, ahead = ahead, min_train_window = min_train_window, lower_level = lower_level, upper_level = upper_level, symmetrize = symmetrize, intercept = intercept, nonneg = nonneg )) } prob_arx <- function(x, y, geo_value, time_value, args = prob_arx_args()) { # Return NA if insufficient training data if (length(y) < args$min_train_window + max(args$lags) + args$ahead) { return(data.frame( geo_value = unique(geo_value), # Return geo value! point = NA, lower = NA, upper = NA )) } # Set up x, y, lags list if (!missing(x)) { x <- data.frame(x, y) } else { x <- data.frame(y) } if (!is.list(args$lags)) args$lags <- list(args$lags) args$lags <- rep(args$lags, length.out = ncol(x)) # Build features and response for the AR model, and then fit it dat <- tibble(i = seq_len(ncol(x)), lag = args$lags) %>% unnest(lag) %>% mutate(name = paste0(\"x\", seq_len(nrow(.)))) %>% # nolint: object_usage_linter # One list element for each lagged feature pmap(function(i, lag, name) { tibble( geo_value = geo_value, time_value = time_value + lag, # Shift back !!name := x[, i] ) }) %>% # One list element for the response vector c(list( tibble( geo_value = geo_value, time_value = time_value - args$ahead, # Shift forward y = y ) )) %>% # Combine them together into one data frame reduce(full_join, by = c(\"geo_value\", \"time_value\")) %>% arrange(time_value) if (args$intercept) dat$x0 <- rep(1, nrow(dat)) obj <- lm(y ~ . + 0, data = select(dat, -geo_value, -time_value)) # Use LOCF to fill NAs in the latest feature values (do this by geo value) setDT(dat) # Convert to a data.table object by reference cols <- setdiff(names(dat), c(\"geo_value\", \"time_value\")) dat[, (cols) := nafill(.SD, type = \"locf\"), .SDcols = cols, by = \"geo_value\"] # Make predictions test_time_value <- max(time_value) point <- predict( obj, newdata = dat %>% dplyr::group_by(geo_value) %>% dplyr::filter(time_value == test_time_value) ) # Compute bands r <- residuals(obj) s <- ifelse(args$symmetrize, -1, NA) # Should the residuals be symmetrized? q <- quantile(c(r, s * r), probs = c(args$lower, args$upper), na.rm = TRUE) lower <- point + q[1] upper <- point + q[2] # Clip at zero if we need to, then return if (args$nonneg) { point <- pmax(point, 0) lower <- pmax(lower, 0) upper <- pmax(upper, 0) } return(data.frame( geo_value = unique(geo_value), # Return geo value! point = point, lower = lower, upper = upper )) } # Latest snapshot of data, and forecast dates x_latest <- epix_as_of(x, max_version = max(x$DT$version)) fc_time_values <- seq(as.Date(\"2020-08-01\"), as.Date(\"2021-11-30\"), by = \"1 month\" ) # Simple function to produce forecasts k weeks ahead k_week_ahead <- function(x, ahead = 7, as_of = TRUE) { if (as_of) { x %>% epix_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, .data$geo_value, .data$time_value, args = prob_arx_args(ahead = ahead) ), before = 119, ref_time_values = fc_time_values ) %>% mutate( target_date = .data$time_value + ahead, as_of = TRUE, geo_value = .data$fc_geo_value ) } else { x_latest %>% epi_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, .data$geo_value, .data$time_value, args = prob_arx_args(ahead = ahead) ), before = 119, ref_time_values = fc_time_values ) %>% mutate(target_date = .data$time_value + ahead, as_of = FALSE) } } # Generate the forecasts, and bind them together fc <- bind_rows( k_week_ahead(x, ahead = 7, as_of = TRUE), k_week_ahead(x, ahead = 14, as_of = TRUE), k_week_ahead(x, ahead = 21, as_of = TRUE), k_week_ahead(x, ahead = 28, as_of = TRUE), k_week_ahead(x, ahead = 7, as_of = FALSE), k_week_ahead(x, ahead = 14, as_of = FALSE), k_week_ahead(x, ahead = 21, as_of = FALSE), k_week_ahead(x, ahead = 28, as_of = FALSE) ) # Plot them, on top of latest COVID-19 case rates ggplot(fc, aes(x = target_date, group = time_value, fill = as_of)) + geom_ribbon(aes(ymin = fc_lower, ymax = fc_upper), alpha = 0.4) + geom_line( data = x_latest, aes(x = time_value, y = case_rate_7d_av), inherit.aes = FALSE, color = \"gray50\" ) + geom_line(aes(y = fc_point)) + geom_point(aes(y = fc_point), size = 0.5) + geom_vline(aes(xintercept = time_value), linetype = 2, alpha = 0.5) + facet_grid(vars(geo_value), vars(as_of), scales = \"free\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 case rates\") + theme(legend.position = \"none\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Advanced sliding with nonstandard outputs","text":"case_rate_7d_av data used document modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. percent_cli data modified part COVIDcast Epidata API Doctor Visits data. dataset licensed terms Creative Commons Attribution 4.0 International license. Copyright Delphi Research Group Carnegie Mellon University 2020.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"converting-to-tsibble-format","dir":"Articles","previous_headings":"","what":"Converting to tsibble format","title":"Aggregate signals over space and time","text":"manipulating wrangling time series data, tsibble already provides whole bunch useful tools. tsibble object (formerly, class tbl_ts) basically tibble (data frame) two specially-marked columns: index column representing time variable (defining order past present), key column identifying unique observational unit time point. fact, key can made number columns, just single one. epi_df object, index variable time_value, key variable typically geo_value (though need always case: example, age group variable another column, serve second key variable). epiprocess package thus provides implementation as_tsibble() epi_df objects, sets variables according defaults. can also set key variable(s) directly call as_tsibble(). Similar SQL keys, key uniquely identify time point (, key index together uniquely identify row), as_tsibble() throws error: can see, duplicate county names Massachusetts Vermont, caused error. Keying county name state name, however, work:","code":"library(tsibble) xt <- as_tsibble(x) head(xt) ## # A tsibble: 6 x 5 [1D] ## # Key: geo_value [1] ## geo_value time_value cases county_name state_name ## ## 1 25001 2020-06-01 4 Barnstable County Massachusetts ## 2 25001 2020-06-02 6 Barnstable County Massachusetts ## 3 25001 2020-06-03 5 Barnstable County Massachusetts ## 4 25001 2020-06-04 8 Barnstable County Massachusetts ## 5 25001 2020-06-05 3 Barnstable County Massachusetts ## 6 25001 2020-06-06 4 Barnstable County Massachusetts key(xt) ## [[1]] ## geo_value index(xt) ## time_value interval(xt) ## ## [1] 1D head(as_tsibble(x, key = \"county_name\")) ## Error in `validate_tsibble()`: ## ! A valid tsibble must have distinct rows identified by key and index. ## ℹ Please use `duplicates()` to check the duplicated rows. head(duplicates(x, key = \"county_name\")) ## # A tibble: 6 × 5 ## geo_value time_value cases county_name state_name ## ## 1 25009 2020-06-01 63 Essex County Massachusetts ## 2 25011 2020-06-01 0 Franklin County Massachusetts ## 3 50009 2020-06-01 0 Essex County Vermont ## 4 50011 2020-06-01 0 Franklin County Vermont ## 5 25009 2020-06-02 74 Essex County Massachusetts ## 6 25011 2020-06-02 0 Franklin County Massachusetts head(as_tsibble(x, key = c(\"county_name\", \"state_name\"))) ## # A tsibble: 6 x 5 [1D] ## # Key: county_name, state_name [1] ## geo_value time_value cases county_name state_name ## ## 1 50001 2020-06-01 0 Addison County Vermont ## 2 50001 2020-06-02 0 Addison County Vermont ## 3 50001 2020-06-03 0 Addison County Vermont ## 4 50001 2020-06-04 0 Addison County Vermont ## 5 50001 2020-06-05 0 Addison County Vermont ## 6 50001 2020-06-06 1 Addison County Vermont"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"detecting-and-filling-time-gaps","dir":"Articles","previous_headings":"","what":"Detecting and filling time gaps","title":"Aggregate signals over space and time","text":"One major advantages tsibble package ability handle implicit gaps time series data. words, can infer time scale ’re interested (say, daily data), detect apparent gaps (say, values reported January 1 3 January 2). can subsequently use functionality make missing entries explicit, generally help avoid bugs downstream data processing tasks. Let’s first remove certain dates data set create gaps: functions has_gaps(), scan_gaps(), count_gaps() tsibble package provide useful summaries, slightly different formats. can also visualize patterns missingness: Using fill_gaps() function tsibble, can replace gaps explicit value. default NA, current case, missingness random rather represents small value censored (hypothetical COVID-19 reports, certainly real phenomenon occurs signals), better replace zero, . (approaches, LOCF: last observation carried forward time, accomplished first filling NA values following second call tidyr::fill().) Note time series Addison, VT starts August 27, 2020, even though original (uncensored) data set drawn period went back June 6, 2020. setting .full = TRUE, can zero-fill entire span observed (censored) data. Explicit imputation missingness (zero-filling case) can important protecting bugs sorts downstream tasks. example, even something simple 7-day trailing average complicated missingness. function epi_slide() looks rows within window 7 days anchored right reference time point (= 6). days given week missing censored small case counts, taking average observed case counts can misleading unintentionally biased upwards. Meanwhile, running epi_slide() zero-filled data brings trailing averages (appropriately) downwards, can see inspecting Plymouth, MA around July 1, 2021.","code":"# First make geo value more readable for tables, plots, etc. x <- x %>% mutate(geo_value = paste( substr(county_name, 1, nchar(county_name) - 7), name_to_abbr(state_name), sep = \", \" )) %>% select(geo_value, time_value, cases) xt <- as_tsibble(x) %>% filter(cases >= 3) head(has_gaps(xt)) ## # A tibble: 6 × 2 ## geo_value .gaps ## ## 1 Addison, VT TRUE ## 2 Barnstable, MA TRUE ## 3 Bennington, VT TRUE ## 4 Berkshire, MA TRUE ## 5 Bristol, MA TRUE ## 6 Caledonia, VT TRUE head(scan_gaps(xt)) ## # A tsibble: 6 x 2 [1D] ## # Key: geo_value [1] ## geo_value time_value ## ## 1 Addison, VT 2020-08-28 ## 2 Addison, VT 2020-08-29 ## 3 Addison, VT 2020-08-30 ## 4 Addison, VT 2020-08-31 ## 5 Addison, VT 2020-09-01 ## 6 Addison, VT 2020-09-02 head(count_gaps(xt)) ## # A tibble: 6 × 4 ## geo_value .from .to .n ## ## 1 Addison, VT 2020-08-28 2020-10-04 38 ## 2 Addison, VT 2020-10-06 2020-10-23 18 ## 3 Addison, VT 2020-10-25 2020-11-04 11 ## 4 Addison, VT 2020-11-06 2020-11-10 5 ## 5 Addison, VT 2020-11-14 2020-11-18 5 ## 6 Addison, VT 2020-11-20 2020-11-20 1 library(ggplot2) theme_set(theme_bw()) ggplot( count_gaps(xt), aes( x = reorder(geo_value, desc(geo_value)), color = geo_value ) ) + geom_linerange(aes(ymin = .from, ymax = .to)) + geom_point(aes(y = .from)) + geom_point(aes(y = .to)) + coord_flip() + labs(x = \"County\", y = \"Date\") + theme(legend.position = \"none\") fill_gaps(xt, cases = 0) %>% head() ## # A tsibble: 6 x 3 [1D] ## # Key: geo_value [1] ## geo_value time_value cases ## ## 1 Addison, VT 2020-08-27 3 ## 2 Addison, VT 2020-08-28 0 ## 3 Addison, VT 2020-08-29 0 ## 4 Addison, VT 2020-08-30 0 ## 5 Addison, VT 2020-08-31 0 ## 6 Addison, VT 2020-09-01 0 xt_filled <- fill_gaps(xt, cases = 0, .full = TRUE) head(xt_filled) ## # A tsibble: 6 x 3 [1D] ## # Key: geo_value [1] ## geo_value time_value cases ## ## 1 Addison, VT 2020-06-01 0 ## 2 Addison, VT 2020-06-02 0 ## 3 Addison, VT 2020-06-03 0 ## 4 Addison, VT 2020-06-04 0 ## 5 Addison, VT 2020-06-05 0 ## 6 Addison, VT 2020-06-06 0 xt %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% ungroup() %>% filter( geo_value == \"Plymouth, MA\", abs(time_value - as.Date(\"2021-07-01\")) <= 3 ) %>% print(n = 7) ## An `epi_df` object, 4 x 4 with metadata: ## * geo_type = custom ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 4 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 Plymouth, MA 2021-06-28 3 4.25 ## 2 Plymouth, MA 2021-06-30 7 5 ## 3 Plymouth, MA 2021-07-01 6 5 ## 4 Plymouth, MA 2021-07-02 6 5.2 xt_filled %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% ungroup() %>% filter( geo_value == \"Plymouth, MA\", abs(time_value - as.Date(\"2021-07-01\")) <= 3 ) %>% print(n = 7) ## An `epi_df` object, 7 x 4 with metadata: ## * geo_type = custom ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 7 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 Plymouth, MA 2021-06-28 3 2.43 ## 2 Plymouth, MA 2021-06-29 0 2.43 ## 3 Plymouth, MA 2021-06-30 7 2.86 ## 4 Plymouth, MA 2021-07-01 6 2.86 ## 5 Plymouth, MA 2021-07-02 6 3.71 ## 6 Plymouth, MA 2021-07-03 0 3.71 ## 7 Plymouth, MA 2021-07-04 0 3.14"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"aggregate-to-different-time-scales","dir":"Articles","previous_headings":"","what":"Aggregate to different time scales","title":"Aggregate signals over space and time","text":"Continuing useful tsibble functionality, can aggregate different time scales using index_by() tsibble, modifies index variable given object applying suitable time-coarsening transformation (say, moving days weeks, weeks months, ). common use case follow call dplyr verb like summarize() order perform kind aggregation measured variables new index variable. , use functions yearweek() yearmonth() provided tsibble package order aggregate weekly monthly resolutions. former call, set week_start = 7 coincide CDC definition epiweek (epidemiological week).","code":"# Aggregate to weekly xt_filled_week <- xt_filled %>% index_by(epiweek = ~ yearweek(., week_start = 7)) %>% group_by(geo_value) %>% summarize(cases = sum(cases, na.rm = TRUE)) head(xt_filled_week) ## # A tsibble: 6 x 3 [1W] ## # Key: geo_value [1] ## geo_value epiweek cases ## ## 1 Addison, VT 2020 W23 0 ## 2 Addison, VT 2020 W24 0 ## 3 Addison, VT 2020 W25 0 ## 4 Addison, VT 2020 W26 0 ## 5 Addison, VT 2020 W27 0 ## 6 Addison, VT 2020 W28 0 # Aggregate to monthly xt_filled_month <- xt_filled_week %>% index_by(month = ~ yearmonth(.)) %>% group_by(geo_value) %>% summarize(cases = sum(cases, na.rm = TRUE)) head(xt_filled_month) ## # A tsibble: 6 x 3 [1M] ## # Key: geo_value [1] ## geo_value month cases ## ## 1 Addison, VT 2020 May 0 ## 2 Addison, VT 2020 Jun 0 ## 3 Addison, VT 2020 Jul 0 ## 4 Addison, VT 2020 Aug 3 ## 5 Addison, VT 2020 Sep 0 ## 6 Addison, VT 2020 Oct 29"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"geographic-aggregation","dir":"Articles","previous_headings":"","what":"Geographic aggregation","title":"Aggregate signals over space and time","text":"TODO","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Aggregate signals over space and time","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"getting-data-into-epi_archive-format","dir":"Articles","previous_headings":"","what":"Getting data into epi_archive format","title":"Work with archive objects and data revisions","text":"epi_archive() object can constructed data frame, data table, tibble, provided (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. version: time value specifying version row measurements. example, given row version January 15, 2022 time_value January 14, 2022, row contains measurements data January 14, 2022 available one day later. can see , data frame returned epidatr::pub_covidcast() columns required epi_archive format, issue playing role version. can now use as_epi_archive() bring epi_archive format. removal redundant version updates as_epi_archive using compactify, please refer compactify vignette. epi_archive consists primary field DT, data table (data.table package) columns geo_value, time_value, version (possibly additional ones), metadata fields, geo_type time_type. variables geo_value, time_value, version serve key variables data table, well specified metadata (described ). can single row per unique combination key variables, therefore key variables critical figuring generate snapshot data archive, given version (also described ). general, last version observation carried forward (LOCF) fill data recorded versions.","code":"x <- dv %>% select(geo_value, time_value, version = issue, percent_cli = value) %>% as_epi_archive(compactify = TRUE) class(x) print(x) ## [1] \"epi_archive\" ## → An `epi_archive` object, with metadata: ## ℹ Min/max time values: 2020-06-01 / 2021-11-30 ## ℹ First/last version with update: 2020-06-02 / 2021-12-01 ## ℹ Versions end: 2021-12-01 ## ℹ A preview of the table (119316 rows x 4 columns): ## Key: ## geo_value time_value version percent_cli ## ## 1: ca 2020-06-01 2020-06-02 NA ## 2: ca 2020-06-01 2020-06-06 2.140116 ## 3: ca 2020-06-01 2020-06-08 2.140379 ## 4: ca 2020-06-01 2020-06-09 2.114430 ## 5: ca 2020-06-01 2020-06-10 2.133677 ## --- ## 119312: tx 2021-11-26 2021-11-29 1.858596 ## 119313: tx 2021-11-27 2021-11-28 NA ## 119314: tx 2021-11-28 2021-11-29 NA ## 119315: tx 2021-11-29 2021-11-30 NA ## 119316: tx 2021-11-30 2021-12-01 NA class(x$DT) ## [1] \"data.table\" \"data.frame\" head(x$DT) ## Key: ## geo_value time_value version percent_cli ## ## 1: ca 2020-06-01 2020-06-02 NA ## 2: ca 2020-06-01 2020-06-06 2.140116 ## 3: ca 2020-06-01 2020-06-08 2.140379 ## 4: ca 2020-06-01 2020-06-09 2.114430 ## 5: ca 2020-06-01 2020-06-10 2.133677 ## 6: ca 2020-06-01 2020-06-11 2.197207 key(x$DT) ## [1] \"geo_value\" \"time_value\" \"version\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"some-details-on-metadata","dir":"Articles","previous_headings":"","what":"Some details on metadata","title":"Work with archive objects and data revisions","text":"following pieces metadata included fields epi_archive object: geo_type: type geo values. time_type: type time values. additional_metadata: list additional metadata data archive. Metadata epi_archive object x can accessed (altered) directly, x$geo_type x$time_type, etc. Just like as_epi_df(), function as_epi_archive() attempts guess metadata fields epi_archive object instantiated, explicitly specified function call (case ).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"producing-snapshots-in-epi_df-form","dir":"Articles","previous_headings":"","what":"Producing snapshots in epi_df form","title":"Work with archive objects and data revisions","text":"key method epi_archive class epix_as_of(), generates snapshot archive epi_df format. represents --date values signal variables given version. can see max time value epi_df object x_snapshot generated archive May 29, 2021, even though specified version date June 1, 2021. can infer doctor’s visits signal 2 days latent June 1. Also, can see metadata epi_df object version date recorded as_of field. default, using maximum version column underlying data table epi_archive object generates snapshot latest values signal variables entire archive. epix_as_of() function issues warning case, since updates current version may still come later point time, due various reasons, synchronization issues. , pull several snapshots archive, spaced one month apart. overlay corresponding signal curves colored lines, version dates marked dotted vertical lines, draw latest curve black (latest snapshot x_latest archive can provide). can see interesting highly nontrivial revision behavior: points time provisional data snapshots grossly underestimate latest curve (look particular Florida close end 2021), others overestimate (states towards beginning 2021), though quite dramatically. Modeling revision process, often called backfill modeling, important statistical problem .","code":"x_snapshot <- epix_as_of(x, max_version = as.Date(\"2021-06-01\")) class(x_snapshot) ## [1] \"epi_df\" \"tbl_df\" \"tbl\" \"data.frame\" head(x_snapshot) ## An `epi_df` object, 6 x 3 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2021-06-01 ## ## # A tibble: 6 × 3 ## geo_value time_value percent_cli ## * ## 1 ca 2020-06-01 2.75 ## 2 ca 2020-06-02 2.57 ## 3 ca 2020-06-03 2.48 ## 4 ca 2020-06-04 2.41 ## 5 ca 2020-06-05 2.57 ## 6 ca 2020-06-06 2.63 max(x_snapshot$time_value) ## [1] \"2021-05-31\" attributes(x_snapshot)$metadata$as_of ## [1] \"2021-06-01\" x_latest <- epix_as_of(x, max_version = max(x$DT$version)) theme_set(theme_bw()) self_max <- max(x$DT$version) versions <- seq(as.Date(\"2020-06-01\"), self_max - 1, by = \"1 month\") snapshots <- map_dfr(versions, function(v) { epix_as_of(x, max_version = v) %>% mutate(version = v) }) %>% bind_rows( x_latest %>% mutate(version = self_max) ) %>% mutate(latest = version == self_max) ggplot( snapshots %>% filter(!latest), aes(x = time_value, y = percent_cli) ) + geom_line(aes(color = factor(version)), na.rm = TRUE) + geom_vline(aes(color = factor(version), xintercept = version), lty = 2) + facet_wrap(~geo_value, scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"% of doctor's visits with CLI\") + theme(legend.position = \"none\") + geom_line( data = snapshots %>% filter(latest), aes(x = time_value, y = percent_cli), inherit.aes = FALSE, color = \"black\", na.rm = TRUE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"merging-epi_archive-objects","dir":"Articles","previous_headings":"","what":"Merging epi_archive objects","title":"Work with archive objects and data revisions","text":"Now demonstrate merge two epi_archive objects together, e.g., grabbing data multiple sources particular version can performed single epix_as_of call. function epix_merge() made purpose. merge working epi_archive versioned percentage CLI outpatient visits another one versioned COVID-19 case reporting data, fetch COVIDcast API, rate scale (counts per 100,000 people population). merging archives, unless archives identical data release patterns, NAs can introduced non-key variables reasons: - represent “value” observation initial release (need pair additional observations archive released) - represent “value” observation recorded versions (sort situation) - requested via sync=\"na\", represent potential update data yet access (e.g., due encountering issues attempting download currently available version data one archives, ).","code":"y <- pub_covidcast( source = \"jhu-csse\", signals = \"confirmed_7dav_incidence_prop\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl,ny,tx\", time_values = epirange(20200601, 20211201), issues = epirange(20200601, 20211201) ) %>% select(geo_value, time_value, version = issue, case_rate_7d_av = value) %>% as_epi_archive(compactify = TRUE) x <- epix_merge(x, y, sync = \"locf\", compactify = TRUE) print(x) head(x$DT) ## → An `epi_archive` object, with metadata: ## ℹ Min/max time values: 2020-06-01 / 2021-11-30 ## ℹ First/last version with update: 2020-06-02 / 2021-12-01 ## ℹ Versions end: 2021-12-01 ## ℹ A preview of the table (129638 rows x 5 columns): ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## --- ## 129634: tx 2021-11-26 2021-11-29 1.858596 7.957657 ## 129635: tx 2021-11-27 2021-11-28 NA 7.174299 ## 129636: tx 2021-11-28 2021-11-29 NA 6.834681 ## 129637: tx 2021-11-29 2021-11-30 NA 8.841247 ## 129638: tx 2021-11-30 2021-12-01 NA 9.566218 ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## 6: ca 2020-06-01 2020-06-10 2.133677 6.628329"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"sliding-version-aware-computations","dir":"Articles","previous_headings":"","what":"Sliding version-aware computations","title":"Work with archive objects and data revisions","text":"Lastly, demonstrate another key method archives, epix_slide(). works just like epi_slide() epi_df object, one key difference: performs version-aware computations. , computation given reference time t, uses data available t. demonstration, ’ll revisit forecasting example slide vignette, now ’ll build forecaster uses properly-versioned data (available real-time) forecast future COVID-19 case rates current past COVID-19 case rates, well current past values outpatient CLI signal medical claims. ’ll extend prob_ar() function slide vignette accomodate exogenous variables autoregressive model, often referred ARX model. Next slide forecaster working epi_archive object, order forecast COVID-19 case rates 7 days future. get back tibble z grouping variables (geo value), time values, three columns fc_point, fc_lower, fc_upper produced slide computation correspond point forecast, lower upper endpoints 95% prediction band, respectively. (instead set as_list_col = TRUE call epix_slide(), gotten list column fc, element fc data frame named columns point, lower, upper.) whole, epix_slide() works similarly epix_slide(), though notable differences, even apart version-aware aspect. can read documentation epix_slide() details. finish comparing version-aware -unaware forecasts various points time forecast horizons. former comes using epix_slide() epi_archive object x, latter applying epi_slide() latest snapshot data x_latest. row displays forecasts different location (CA, FL, NY, TX), column corresponds whether properly-versioned data used (FALSE means , TRUE means yes). can see properly-versioned forecaster , points time, problematic; example, massively overpredicts peak locations winter wave 2020. However, performance pretty poor across board , whether properly-versioned data used. Similar saw slide vignette, ARX forecasts can volatile, overconfident, . volatility can attenuated training ARX model jointly locations; advanced sliding vignette gives demonstration . really, epipredict package, builds data structures functionality current package, place look robust forecasting methodology. forecasters appear vignettes current package meant demo slide functionality basic forecasting methodology possible.","code":"prob_arx <- function(x, y, lags = c(0, 7, 14), ahead = 7, min_train_window = 20, lower_level = 0.05, upper_level = 0.95, symmetrize = TRUE, intercept = FALSE, nonneg = TRUE) { # Return NA if insufficient training data if (length(y) < min_train_window + max(lags) + ahead) { return(data.frame(point = NA, lower = NA, upper = NA)) } # Useful transformations if (!missing(x)) { x <- data.frame(x, y) } else { x <- data.frame(y) } if (!is.list(lags)) lags <- list(lags) lags <- rep(lags, length.out = ncol(x)) # Build features and response for the AR model, and then fit it dat <- do.call( data.frame, unlist( # Below we loop through and build the lagged features purrr::map(seq_len(ncol(x)), function(i) { purrr::map(lags[[i]], function(j) lag(x[, i], n = j)) }), recursive = FALSE ) ) names(dat) <- paste0(\"x\", seq_len(ncol(dat))) if (intercept) dat$x0 <- rep(1, nrow(dat)) dat$y <- lead(y, n = ahead) obj <- lm(y ~ . + 0, data = dat) # Use LOCF to fill NAs in the latest feature values, make a prediction setDT(dat) setnafill(dat, type = \"locf\") point <- predict(obj, newdata = tail(dat, 1)) # Compute a band r <- residuals(obj) s <- ifelse(symmetrize, -1, NA) # Should the residuals be symmetrized? q <- quantile(c(r, s * r), probs = c(lower_level, upper_level), na.rm = TRUE) lower <- point + q[1] upper <- point + q[2] # Clip at zero if we need to, then return if (nonneg) { point <- max(point, 0) lower <- max(lower, 0) upper <- max(upper, 0) } return(data.frame(point = point, lower = lower, upper = upper)) } fc_time_values <- seq(as.Date(\"2020-08-01\"), as.Date(\"2021-11-30\"), by = \"1 month\" ) z <- x %>% group_by(geo_value) %>% epix_slide( fc = prob_arx(x = percent_cli, y = case_rate_7d_av), before = 119, ref_time_values = fc_time_values ) %>% ungroup() head(z, 10) ## # A tibble: 10 × 5 ## geo_value time_value fc_point fc_lower fc_upper ## ## 1 ca 2020-08-01 21.0 19.1 23.0 ## 2 fl 2020-08-01 44.5 38.9 50.0 ## 3 ny 2020-08-01 3.10 2.89 3.31 ## 4 tx 2020-08-01 35.5 33.6 37.4 ## 5 ca 2020-09-01 22.9 20.1 25.8 ## 6 fl 2020-09-01 15.5 10.5 20.6 ## 7 ny 2020-09-01 3.16 2.93 3.39 ## 8 tx 2020-09-01 17.5 14.3 20.7 ## 9 ca 2020-10-01 12.8 9.21 16.5 ## 10 fl 2020-10-01 14.7 8.72 20.6 x_latest <- epix_as_of(x, max_version = max(x$DT$version)) # Simple function to produce forecasts k weeks ahead k_week_ahead <- function(x, ahead = 7, as_of = TRUE) { if (as_of) { x %>% group_by(.data$geo_value) %>% epix_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, ahead = ahead), before = 119, ref_time_values = fc_time_values ) %>% mutate(target_date = .data$time_value + ahead, as_of = TRUE) %>% ungroup() } else { x_latest %>% group_by(.data$geo_value) %>% epi_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, ahead = ahead), before = 119, ref_time_values = fc_time_values ) %>% mutate(target_date = .data$time_value + ahead, as_of = FALSE) %>% ungroup() } } # Generate the forecasts, and bind them together fc <- bind_rows( k_week_ahead(x, ahead = 7, as_of = TRUE), k_week_ahead(x, ahead = 14, as_of = TRUE), k_week_ahead(x, ahead = 21, as_of = TRUE), k_week_ahead(x, ahead = 28, as_of = TRUE), k_week_ahead(x, ahead = 7, as_of = FALSE), k_week_ahead(x, ahead = 14, as_of = FALSE), k_week_ahead(x, ahead = 21, as_of = FALSE), k_week_ahead(x, ahead = 28, as_of = FALSE) ) # Plot them, on top of latest COVID-19 case rates ggplot(fc, aes(x = target_date, group = time_value, fill = as_of)) + geom_ribbon(aes(ymin = fc_lower, ymax = fc_upper), alpha = 0.4) + geom_line( data = x_latest, aes(x = time_value, y = case_rate_7d_av), inherit.aes = FALSE, color = \"gray50\" ) + geom_line(aes(y = fc_point)) + geom_point(aes(y = fc_point), size = 0.5) + geom_vline(aes(xintercept = time_value), linetype = 2, alpha = 0.5) + facet_grid(vars(geo_value), vars(as_of), scales = \"free\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 case rates\") + theme(legend.position = \"none\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Work with archive objects and data revisions","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. percent_cli data modified part COVIDcast Epidata API Doctor Visits data. dataset licensed terms Creative Commons Attribution 4.0 International license. Copyright Delphi Research Group Carnegie Mellon University 2020.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/compactify.html","id":"removing-redundant-update-data-to-save-space","dir":"Articles","previous_headings":"","what":"Removing redundant update data to save space","title":"Compactify to remove redundant archive data","text":"need store version update rows look like last version corresponding observations carried forward (LOCF) use epiprocess‘s’ epi_archive-related functions, apply LOCF fill data explicit updates. default, even detect remove LOCF-redundant rows save space; impact results long directly work archive’s DT field way expects rows remain. three different values can assigned compactify: argument: LOCF-redundant rows, removes issues warning information rows removed TRUE: removes LOCF-redundant rows without warning feedback FALSE: keeps LOCF-redundant rows without warning feedback example, one chart using LOCF values, another doesn’t use illustrate LOCF. Notice head first dataset differs second third value included. LOCF-redundant values can mar performance dataset operations. column case_rate_7d_av many LOCF-redundant values percent_cli, omit percent_cli column comparing performance. example, huge proportion original version update data LOCF-redundant, compactifying saves large amount space. proportion data LOCF-redundant can vary widely data sets, won’t always lucky. expect, performing 1000 iterations dplyr::filter faster LOCF values omitted. also like measure speed epi_archive methods. detailed performance comparison:","code":"library(epiprocess) library(dplyr) dt <- archive_cases_dv_subset$DT locf_omitted <- as_epi_archive(dt) ## Warning: Found rows that appear redundant based on last (version of each) observation carried forward; these rows have been removed to 'compactify' and save space: ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 2: ca 2020-06-01 2020-06-23 2.498918 6.628329 ## 3: ca 2020-06-01 2020-07-23 2.698157 6.603020 ## --- ## 4793: tx 2021-10-18 2021-10-22 NA 23.819450 ## 4794: tx 2021-10-19 2021-10-22 NA 24.705959 ## 4795: tx 2021-10-20 2021-10-22 NA 16.464639 ## Built-in `epi_archive` functionality should be unaffected, but results may change if you work directly with its fields (such as `DT`). See `?as_epi_archive` for details. To silence this warning but keep compactification, you can pass `compactify=TRUE` when constructing the archive. locf_included <- as_epi_archive(dt, compactify = FALSE) head(locf_omitted$DT) ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 4: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## 5: ca 2020-06-01 2020-06-10 2.133677 6.628329 ## 6: ca 2020-06-01 2020-06-11 2.197207 6.628329 head(locf_included$DT) ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## 6: ca 2020-06-01 2020-06-10 2.133677 6.628329 dt2 <- select(dt, -percent_cli) locf_included_2 <- as_epi_archive(dt2, compactify = FALSE) locf_omitted_2 <- as_epi_archive(dt2, compactify = TRUE) nrow(locf_included_2$DT) ## [1] 129638 nrow(locf_omitted_2$DT) ## [1] 9355 # Performance of filtering iterate_filter <- function(my_ea) { for (i in 1:1000) { filter(my_ea$DT, version >= as.Date(\"2020-01-01\") + i) } } elapsed_time <- function(fx) c(system.time(fx))[[3]] speed_test <- function(f, name) { data.frame( operation = name, locf = elapsed_time(f(locf_included_2)), no_locf = elapsed_time(f(locf_omitted_2)) ) } speeds <- speed_test(iterate_filter, \"filter_1000x\") # Performance of as_of iterated 200 times iterate_as_of <- function(my_ea) { for (i in 1:1000) { my_ea %>% epix_as_of(min(my_ea$DT$time_value) + i - 1000) } } speeds <- rbind(speeds, speed_test(iterate_as_of, \"as_of_1000x\")) # Performance of slide slide_median <- function(my_ea) { my_ea %>% epix_slide(median = median(.data$case_rate_7d_av), before = 7) } speeds <- rbind(speeds, speed_test(slide_median, \"slide_median\")) speeds_tidy <- tidyr::gather(speeds, key = \"is_locf\", value = \"time_in_s\", locf, no_locf) library(ggplot2) ggplot(speeds_tidy) + geom_bar(aes(x = is_locf, y = time_in_s, fill = operation), stat = \"identity\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"correlations-grouped-by-time","dir":"Articles","previous_headings":"","what":"Correlations grouped by time","title":"Correlate signals over space and time","text":"epi_cor() function operates epi_df object, requires specification variables correlate, next two arguments (var1 var2). general, can specify grouping variable (combination variables) correlation computations call epi_cor(), via cor_by argument. potentially leads many ways compute correlations. always least two ways compute correlations epi_df: grouping time value, geo value. former obtained via cor_by = time_value. plot addresses question: “given day, case death rates linearly associated, across U.S. states?”. might interested broadening question, instead asking: “given day, higher case rates tend associate higher death rates?”, removing dependence linear relationship. latter can addressed using Spearman correlation, accomplished setting method = \"spearman\" call epi_cor(). Spearman correlation highly robust invariant monotone transformations.","code":"library(ggplot2) theme_set(theme_bw()) z1 <- epi_cor(x, case_rate, death_rate, cor_by = \"time_value\") ggplot(z1, aes(x = time_value, y = cor)) + geom_line() + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Correlation\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"lagged-correlations","dir":"Articles","previous_headings":"","what":"Lagged correlations","title":"Correlate signals over space and time","text":"might also interested case rates associate death rates future. Using dt1 parameter epi_cor(), can lag case rates back number days want, calculating correlations. , set dt1 = -10. means var1 = case_rate lagged 10 days, case rates June 1st correlated death rates June 11th. (might also help think way: death rates certain day correlated case rates offset -10 days.) Note epi_cor() takes argument shift_by determines grouping use time shifts. default geo_value, makes sense problem hand (another setting, may want group geo value another variable—say, age—time shifting). can see , generally, lagging case rates back 10 days improves correlations, confirming case rates better correlated death rates 10 days now.","code":"z2 <- epi_cor(x, case_rate, death_rate, cor_by = time_value, dt1 = -10) z <- rbind( z1 %>% mutate(lag = 0), z2 %>% mutate(lag = 10) ) %>% mutate(lag = as.factor(lag)) ggplot(z, aes(x = time_value, y = cor)) + geom_line(aes(color = lag)) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Correlation\", col = \"Lag\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"correlations-grouped-by-state","dir":"Articles","previous_headings":"","what":"Correlations grouped by state","title":"Correlate signals over space and time","text":"second option group geo value, obtained setting cor_by = geo_value. ’ll look correlations 0- 10-day lagged case rates. can see , generally speaking, lagging case rates back 10 days improves correlations.","code":"z1 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value) z2 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -10) z <- rbind( z1 %>% mutate(lag = 0), z2 %>% mutate(lag = 10) ) %>% mutate(lag = as.factor(lag)) ggplot(z, aes(cor)) + geom_density(aes(fill = lag, col = lag), alpha = 0.5) + labs(x = \"Correlation\", y = \"Density\", fill = \"Lag\", col = \"Lag\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"more-systematic-lag-analysis","dir":"Articles","previous_headings":"","what":"More systematic lag analysis","title":"Correlate signals over space and time","text":"Next perform systematic investigation correlations broad range lag values. can see pretty clear curvature mean correlation case death rates (correlations come grouping geo value) function lag. maximum occurs lag somewhere around 17 days.","code":"library(purrr) lags <- 0:35 z <- map_dfr(lags, function(lag) { epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -lag) %>% mutate(lag = .env$lag) }) z %>% group_by(lag) %>% summarize(mean = mean(cor, na.rm = TRUE)) %>% ggplot(aes(x = lag, y = mean)) + geom_line() + geom_point() + labs(x = \"Lag\", y = \"Mean correlation\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Correlate signals over space and time","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"motivation","dir":"Articles","previous_headings":"","what":"Motivation","title":"Get started with `epiprocess`","text":"{epiprocess} {epipredict} designed lower barrier entry implementation cost epidemiological time series analysis forecasting. Epidemiologists forecasting groups repeatedly separately rush implement type functionality much ad hoc manner; trying save effort future providing well-documented, tested, general packages can called many common tasks instead. {epiprocess} also provides tools help avoid particularly common pitfall analysis forecasting: ignoring reporting latency revisions data set. can, example, lead one retrospectively analyzing surveillance signal forecasting model concluding much accurate actually real time, producing always-decreasing forecasts data sets initial surveillance estimates systematically revised upward. Storing working version history can help avoid issues.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"intended-audience","dir":"Articles","previous_headings":"","what":"Intended audience","title":"Get started with `epiprocess`","text":"expect users proficient R, familiar {dplyr} {tidyr} packages.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"installing","dir":"Articles","previous_headings":"","what":"Installing","title":"Get started with `epiprocess`","text":"package CRAN yet, can installed using {devtools} package: Building vignettes, getting started guide, takes significant amount time. included package default. want include vignettes, use modified command:","code":"devtools::install_github(\"cmu-delphi/epiprocess\", ref = \"main\") devtools::install_github(\"cmu-delphi/epiprocess\", ref = \"main\", build_vignettes = TRUE, dependencies = TRUE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"getting-data-into-epi_df-format","dir":"Articles","previous_headings":"","what":"Getting data into epi_df format","title":"Get started with `epiprocess`","text":"’ll start showing get data epi_df format, just tibble bit special structure, format assumed functions epiprocess package. epi_df object (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. can number columns can serve measured variables, also broadly refer signal variables. documentation gives details data format. data frame tibble geo_value time_value columns can converted epi_df object, using function as_epi_df(). example, ’ll work daily cumulative COVID-19 cases four U.S. states: CA, FL, NY, TX, time span mid 2020 early 2022, ’ll use epidatr package fetch data COVIDcast API. can see, data frame returned epidatr::pub_covidcast() columns required epi_df object (along many others). can use as_epi_df(), specification relevant metadata, bring data frame epi_df format.","code":"library(epidatr) library(epiprocess) library(dplyr) library(tidyr) library(withr) cases <- pub_covidcast( source = \"jhu-csse\", signals = \"confirmed_cumulative_num\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl,ny,tx\", time_values = epirange(20200301, 20220131), ) colnames(cases) ## [1] \"geo_value\" \"signal\" \"source\" ## [4] \"geo_type\" \"time_type\" \"time_value\" ## [7] \"direction\" \"issue\" \"lag\" ## [10] \"missing_value\" \"missing_stderr\" \"missing_sample_size\" ## [13] \"value\" \"stderr\" \"sample_size\" x <- as_epi_df(cases, geo_type = \"state\", time_type = \"day\", as_of = max(cases$issue) ) %>% select(geo_value, time_value, total_cases = value) class(x) ## [1] \"epi_df\" \"tbl_df\" \"tbl\" \"data.frame\" summary(x) ## An `epi_df` x, with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2023-03-10 ## ---------- ## * min time value = 2020-03-01 ## * max time value = 2022-01-31 ## * average rows per time value = 4 head(x) ## An `epi_df` object, 6 x 3 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2023-03-10 ## ## # A tibble: 6 × 3 ## geo_value time_value total_cases ## * ## 1 ca 2020-03-01 19 ## 2 fl 2020-03-01 0 ## 3 ny 2020-03-01 0 ## 4 tx 2020-03-01 0 ## 5 ca 2020-03-02 23 ## 6 fl 2020-03-02 1 attributes(x)$metadata ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2023-03-10\" ## ## $other_keys ## character(0)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"some-details-on-metadata","dir":"Articles","previous_headings":"","what":"Some details on metadata","title":"Get started with `epiprocess`","text":"general, epi_df object following fields metadata: geo_type: type geo values. time_type: type time values. as_of: time value given data available. Metadata epi_df object x can accessed (altered) via attributes(x)$metadata. first two fields , geo_type time_type, currently used downstream functions epiprocess package, serve useful bits information convey data set hand. last field , as_of, one unique aspects epi_df object. brief, can think epi_df object single snapshot data set contains --date values signals interest, time specified as_of. example, as_of January 31, 2022, epi_df object --date version data available January 31, 2022. epiprocess package also provides companion data structure called epi_archive, stores full version history given data set. See archive vignette . geo_type, time_type, as_of arguments missing call as_epi_df(), function try infer passed object. Usually, geo_type time_type can inferred geo_value time_value columns, respectively, inferring as_of field easy. See documentation as_epi_df() details.","code":"x <- as_epi_df(cases, as_of = as.Date(\"2024-03-20\")) %>% select(geo_value, time_value, total_cases = value) attributes(x)$metadata ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2024-03-20\" ## ## $other_keys ## character(0)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"using-additional-key-columns-in-epi_df","dir":"Articles","previous_headings":"","what":"Using additional key columns in epi_df","title":"Get started with `epiprocess`","text":"following examples show create epi_df additional keys.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"converting-a-tsibble-that-has-county-code-as-an-extra-key","dir":"Articles","previous_headings":"Using additional key columns in epi_df","what":"Converting a tsibble that has county code as an extra key","title":"Get started with `epiprocess`","text":"metadata now includes county_code extra key.","code":"ex1 <- tibble( geo_value = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), county_code = c( \"06059\", \"06061\", \"06067\", \"12111\", \"12113\", \"12117\", \"42101\", \"42103\", \"42105\" ), time_value = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(geo_value)), value = seq_along(geo_value) + 0.01 * withr::with_rng_version(\"3.0.0\", withr::with_seed(42, length(geo_value))) ) %>% as_tsibble(index = time_value, key = c(geo_value, county_code)) ex1 <- as_epi_df(x = ex1, geo_type = \"state\", time_type = \"day\", as_of = \"2020-06-03\") attr(ex1, \"metadata\") ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2020-06-03\" ## ## $other_keys ## [1] \"county_code\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"dealing-with-misspecified-column-names","dir":"Articles","previous_headings":"Using additional key columns in epi_df","what":"Dealing with misspecified column names","title":"Get started with `epiprocess`","text":"epi_df requires columns geo_value time_value, exist as_epi_df() throws error. columns can renamed match epi_df format. example , notice also additional key pol.","code":"data.frame( # misnamed state = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), # extra key pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 3), # misnamed reported_date = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(geo_value)), value = seq_along(geo_value) + 0.01 * withr::with_rng_version(\"3.0.0\", withr::with_seed(42, length(geo_value))) ) %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) ## Error in eval(expr, envir, enclos): object 'geo_value' not found ex2 <- tibble( # misnamed state = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), # extra key pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 3), # misnamed reported_date = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(state)), value = seq_along(state) + 0.01 * withr::with_rng_version(\"3.0.0\", withr::with_seed(42, length(state))) ) %>% data.frame() head(ex2) ## state pol reported_date value ## 1 ca blue 2020-06-01 1.09 ## 2 ca blue 2020-06-02 2.09 ## 3 ca blue 2020-06-03 3.09 ## 4 fl swing 2020-06-01 4.09 ## 5 fl swing 2020-06-02 5.09 ## 6 fl swing 2020-06-03 6.09 ex2 <- ex2 %>% rename(geo_value = state, time_value = reported_date) %>% as_epi_df( geo_type = \"state\", as_of = \"2020-06-03\", additional_metadata = list(other_keys = \"pol\") ) attr(ex2, \"metadata\") ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2020-06-03\" ## ## $other_keys ## [1] \"pol\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"adding-additional-keys-to-an-epi_df-object","dir":"Articles","previous_headings":"Using additional key columns in epi_df","what":"Adding additional keys to an epi_df object","title":"Get started with `epiprocess`","text":"examples, keys added objects epi_df objects. illustrate add keys epi_df object. use toy data set included epiprocess prepared using covidcast library filtering single state simplicity. Now add state (MA) pol new columns data new keys metadata. Reminder lower case state name abbreviations expect geo_value column. Note two additional keys added, state pol, specified character vector other_keys component additional_metadata list. must specified manner downstream actions epi_df, like model fitting prediction, can recognize use keys. Currently other_keys metadata epi_df doesn’t impact epi_slide(), contrary other_keys as_epi_archive affects update data interpreted.","code":"ex3 <- jhu_csse_county_level_subset %>% filter(time_value > \"2021-12-01\", state_name == \"Massachusetts\") %>% slice_tail(n = 6) attr(ex3, \"metadata\") # geo_type is county currently ## $geo_type ## [1] \"county\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2022-05-23 21:35:45 UTC\" ex3 <- ex3 %>% as_tibble() %>% # needed to add the additional metadata mutate( state = rep(tolower(\"MA\"), 6), pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 2) ) %>% as_epi_df(additional_metadata = list(other_keys = c(\"state\", \"pol\")), as_of = as.Date(\"2024-03-20\")) attr(ex3, \"metadata\") ## $geo_type ## [1] \"county\" ## ## $time_type ## [1] \"week\" ## ## $as_of ## [1] \"2024-03-20\" ## ## $other_keys ## [1] \"state\" \"pol\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"working-with-epi_df-objects-downstream","dir":"Articles","previous_headings":"","what":"Working with epi_df objects downstream","title":"Get started with `epiprocess`","text":"Data epi_df format easy work downstream, since standard tabular data format; vignettes, ’ll walk basic signal processing tasks using functions provided epiprocess package. course, can also write custom code downstream uses, like plotting, pretty easy ggplot2. last couple examples, ’ll look data sets just show might get epi_df format. Data daily new (cumulative) SARS cases Canada 2003, outbreaks package: Get confirmed cases Ebola Sierra Leone 2014 2015 province date onset, prepared line list data package:","code":"library(ggplot2) theme_set(theme_bw()) ggplot(x, aes(x = time_value, y = total_cases, color = geo_value)) + geom_line() + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Cumulative COVID-19 cases\", color = \"State\") x <- outbreaks::sars_canada_2003 %>% mutate(geo_value = \"ca\") %>% select(geo_value, time_value = date, starts_with(\"cases\")) %>% as_epi_df(geo_type = \"nation\", as_of = as.Date(\"2024-03-20\")) head(x) ## An `epi_df` object, 6 x 6 with metadata: ## * geo_type = nation ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 6 × 6 ## geo_value time_value cases_travel cases_household cases_healthcare cases_other ## * ## 1 ca 2003-02-23 1 0 0 0 ## 2 ca 2003-02-24 0 0 0 0 ## 3 ca 2003-02-25 0 0 0 0 ## 4 ca 2003-02-26 0 1 0 0 ## 5 ca 2003-02-27 0 0 0 0 ## 6 ca 2003-02-28 1 0 0 0 library(tidyr) x <- x %>% pivot_longer(starts_with(\"cases\"), names_to = \"type\") %>% mutate(type = substring(type, 7)) yrange <- range( x %>% group_by(time_value) %>% summarize(value = sum(value)) %>% pull(value) ) ggplot(x, aes(x = time_value, y = value)) + geom_col(aes(fill = type)) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + scale_y_continuous(breaks = yrange[1]:yrange[2]) + labs(x = \"Date\", y = \"SARS cases in Canada\", fill = \"Type\") x <- outbreaks::ebola_sierraleone_2014 %>% select(district, date_of_onset, status) %>% mutate(province = case_when( district %in% c(\"Kailahun\", \"Kenema\", \"Kono\") ~ \"Eastern\", district %in% c( \"Bombali\", \"Kambia\", \"Koinadugu\", \"Port Loko\", \"Tonkolili\" ) ~ \"Northern\", district %in% c(\"Bo\", \"Bonthe\", \"Moyamba\", \"Pujehun\") ~ \"Sourthern\", district %in% c(\"Western Rural\", \"Western Urban\") ~ \"Western\" )) %>% group_by(geo_value = province, time_value = date_of_onset) %>% summarise(cases = sum(status == \"confirmed\"), .groups = \"drop\") %>% complete(geo_value, time_value = full_seq(time_value, period = 1), fill = list(cases = 0) ) %>% as_epi_df(geo_type = \"province\", as_of = as.Date(\"2024-03-20\")) ggplot(x, aes(x = time_value, y = cases)) + geom_col(aes(fill = geo_value), show.legend = FALSE) + facet_wrap(~geo_value, scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Confirmed cases of Ebola in Sierra Leone\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Get started with `epiprocess`","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"growth-rate-basics","dir":"Articles","previous_headings":"","what":"Growth rate basics","title":"Estimate growth rates in signals","text":"growth rate function \\(f\\) defined continuously-valued parameter \\(t\\) defined \\(f'(t)/f(t)\\), \\(f'(t)\\) derivative \\(f\\) \\(t\\). estimate growth rate signal discrete-time (can thought evaluations discretizations underlying function continuous-time), can estimate derivative divide signal value (possibly smoothed version signal value). growth_rate() function takes sequence underlying design points x corresponding sequence y signal values, allows us choose following methods estimating growth rate given reference point x0, setting method argument: “rel_change”: uses \\((\\bar B/\\bar - 1) / h\\), \\(\\bar B\\) average y second half sliding window bandwidth h centered reference point x0, \\(\\bar \\) average first half. can seen using first-difference approximation derivative. “linear_reg”: uses slope linear regression y x sliding window centered reference point x0, divided fitted value linear regression x0. “smooth_spline”: uses estimated derivative x0 smoothing spline fit x y, via stats::smooth.spline(), divided fitted value spline x0. “trend_filter”: uses estimated derivative x0 polynomial trend filtering (discrete spline) fit x y, via genlasso::trendfilter(), divided fitted value discrete spline x0. default growth_rate() x0 = x, returns estimate growth rate underlying design point.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"relative-change","dir":"Articles","previous_headings":"","what":"Relative change","title":"Estimate growth rates in signals","text":"default method “rel_change”, simplest way estimate growth rates. default bandwidth h = 7, daily data, considers relative change signal adjacent weeks. can wrap growth_rate() call dplyr::mutate() append new column epi_df object computed growth rates. can visualize growth rate estimates plotting signal values highlighting periods time relative change 1% (red) -1% (blue), faceting geo value. direct visualization, plot estimated growth rates , overlaying curves two states one plot. can see estimated growth rates relative change method somewhat volatile, appears bias towards towards right boundary time span—look estimated growth rate Georgia late December 2021, takes potentially suspicious dip. general, estimation derivatives difficult near boundary, relative changes can suffer particularly noticeable boundary bias based difference averages two halves local window, simplistic approach, one halves truncated near boundary.","code":"x <- x %>% group_by(geo_value) %>% mutate(cases_gr1 = growth_rate(time_value, cases)) head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## # Groups: geo_value [1] ## geo_value time_value cases cases_gr1 ## * ## 1 ga 2020-06-01 643. 0.00601 ## 2 ga 2020-06-02 603. 0.0185 ## 3 ga 2020-06-03 608 0.0240 ## 4 ga 2020-06-04 656. 0.0218 ## 5 ga 2020-06-05 677. 0.0193 ## 6 ga 2020-06-06 718. 0.0163 ## 7 ga 2020-06-07 691. 0.0180 ## 8 ga 2020-06-08 656. 0.0234 ## 9 ga 2020-06-09 720. 0.0227 ## 10 ga 2020-06-10 727. 0.0227 library(ggplot2) theme_set(theme_bw()) upper <- 0.01 lower <- -0.01 ggplot(x, aes(x = time_value, y = cases)) + geom_tile( data = x %>% filter(cases_gr1 >= upper), aes(x = time_value, y = 0, width = 7, height = Inf), fill = 2, alpha = 0.08 ) + geom_tile( data = x %>% filter(cases_gr1 <= lower), aes(x = time_value, y = 0, width = 7, height = Inf), fill = 4, alpha = 0.08 ) + geom_line() + facet_wrap(vars(geo_value), scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 cases\") ggplot(x, aes(x = time_value, y = cases_gr1)) + geom_line(aes(col = geo_value)) + geom_hline(yintercept = upper, linetype = 2, col = 2) + geom_hline(yintercept = lower, linetype = 2, col = 4) + scale_color_manual(values = c(3, 6)) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"State\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"linear-regression","dir":"Articles","previous_headings":"","what":"Linear regression","title":"Estimate growth rates in signals","text":"second simplest method available “linear_reg”, whose default bandwidth h = 7. Compared “rel_change”, appears behave similarly overall, thankfully avoids troublesome spikes:","code":"x <- x %>% group_by(geo_value) %>% mutate(cases_gr2 = growth_rate(time_value, cases, method = \"linear_reg\")) x %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr1 = \"rel_change\", cases_gr2 = \"linear_reg\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(2, 4)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"nonparametric-estimation","dir":"Articles","previous_headings":"","what":"Nonparametric estimation","title":"Estimate growth rates in signals","text":"can also use nonparametric method estimate derivative, “smooth_spline” “trend_filter”. latter going generally computationally expensive, also able adapt better local level smoothness. (apparent efficiency actually compounded particular implementations default settings methods: “trend_filter” based full solution path algorithm provided genlasso package, performs cross-validation default order pick level regularization; read documentation growth_rate() details.) particular example, trend filtering estimates growth rate appear much stable smoothing spline, also much stable estimates local relative changes linear regressions. smoothing spline growth rate estimates based default settings stats::smooth.spline(), appear severely -regularized . arguments stats::smooth.spline() can customized passing additional arguments ... call growth_rate(); similarly, can also use additional arguments customize settings underlying trend filtering functions genlasso::trendfilter(), genlasso::cv.trendfilter(), documentation growth_rate() gives full details.","code":"x <- x %>% group_by(geo_value) %>% mutate( cases_gr3 = growth_rate(time_value, cases, method = \"smooth_spline\"), cases_gr4 = growth_rate(time_value, cases, method = \"trend_filter\") ) x %>% select(geo_value, time_value, cases_gr3, cases_gr4) %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr3 = \"smooth_spline\", cases_gr4 = \"trend_filter\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(3, 6)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"log-scale-estimation","dir":"Articles","previous_headings":"","what":"Log scale estimation","title":"Estimate growth rates in signals","text":"general, alternative view growth rate function \\(f\\) given defining \\(g(t) = \\log(f(t))\\), observing \\(g'(t) = f'(t)/f(t)\\). Therefore, method estimates derivative can simply applied log signal interest, light, method (“rel_change”, “linear_reg”, “smooth_spline”, “trend_filter”) log scale analog, can used setting argument log_scale = TRUE call growth_rate(). Comparing rel_change_log curves rel_change counterparts (shown earlier figures), see former curves appear less volatile match linear regression estimates much closely. particular, rel_change upward spikes, rel_change_log less pronounced spikes. occur? estimate \\(g'(t)\\) can expressed \\(\\mathbb E[\\log(B)-\\log()]/h = \\mathbb E[\\log(1+hR)]/h\\), \\(R = ((B-)/h) / \\), expectation refers averaging \\(h\\) observations window. Consider following two relevant inequalities, due concavity logarithm function: \\[ \\mathbb E[\\log(1+hR)]/h \\leq \\log(1+h\\mathbb E[R])/h \\leq \\mathbb E[R]. \\] first inequality Jensen’s; second inequality tangent line concave function lies . Finally, observe \\(\\mathbb E[R] \\approx ((\\bar B-\\bar )/h) / \\bar \\), rel_change estimate. explains rel_change_log curve often lies rel_change curve.","code":"x <- x %>% group_by(geo_value) %>% mutate( cases_gr5 = growth_rate(time_value, cases, method = \"rel_change\", log_scale = TRUE ), cases_gr6 = growth_rate(time_value, cases, method = \"linear_reg\", log_scale = TRUE ), cases_gr7 = growth_rate(time_value, cases, method = \"smooth_spline\", log_scale = TRUE ), cases_gr8 = growth_rate(time_value, cases, method = \"trend_filter\", log_scale = TRUE ) ) x %>% select(geo_value, time_value, cases_gr5, cases_gr6) %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr5 = \"rel_change_log\", cases_gr6 = \"linear_reg_log\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(2, 4)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\") x %>% select(geo_value, time_value, cases_gr7, cases_gr8) %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr7 = \"smooth_spline_log\", cases_gr8 = \"trend_filter_log\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(3, 6)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Estimate growth rates in signals","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/outliers.html","id":"outlier-detection","dir":"Articles","previous_headings":"","what":"Outlier detection","title":"Detect and correct outliers in signals","text":"detect_outlr() function allows us run multiple outlier detection methods given signal, (optionally) combine results methods. , ’ll investigate outlier detection results following methods. Detection based rolling median, using detect_outlr_rm(), computes rolling median default window size n time points centered time point consideration, computes thresholds based multiplier times rolling IQR computed residuals. Detection based seasonal-trend decomposition using LOESS (STL), using detect_outlr_stl(), similar rolling median method replaces rolling median fitted values STL. Detection based STL decomposition, without seasonality term, amounts smoothing using LOESS. outlier detection methods specified using tibble passed detect_outlr(), one row per method, whose columms specify outlier detection function, input arguments (nondefault values need supplied), abbreviated name method used tracking results. Abbreviations “rm” “stl” can used built-detection functions detect_outlr_rm() detect_outlr_stl(), respectively. Additionally, ’ll form combined lower upper thresholds, calculated median lower upper thresholds methods time point. Note using combined median threshold equivalent using majority vote across base methods determine whether value outlier. visualize results, first define convenience function plotting. Now produce plots state time, faceting detection method.","code":"detection_methods <- bind_rows( tibble( method = \"rm\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5 )), abbr = \"rm\" ), tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = 7 )), abbr = \"stl_seasonal\" ), tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = NULL )), abbr = \"stl_nonseasonal\" ) ) detection_methods ## # A tibble: 3 × 3 ## method args abbr ## ## 1 rm rm ## 2 stl stl_seasonal ## 3 stl stl_nonseasonal x <- x %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr( x = time_value, y = cases, methods = detection_methods, combiner = \"median\" )) %>% ungroup() %>% unnest(outlier_info) head(x) ## An `epi_df` object, 6 x 15 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2022-05-21 22:17:14.962335 ## ## # A tibble: 6 × 15 ## geo_value time_value cases rm_lower rm_upper rm_replacement stl_seasonal_lower ## * ## 1 fl 2020-06-01 667 345 2195 667 0 ## 2 nj 2020-06-01 486 64.4 926. 486 221. ## 3 fl 2020-06-02 617 406. 2169. 617 0 ## 4 nj 2020-06-02 658 140. 841. 658 245. ## 5 fl 2020-06-03 1317 468. 2142. 1317 0 ## 6 nj 2020-06-03 541 216 756 541 227. ## # ℹ 8 more variables: stl_seasonal_upper , stl_seasonal_replacement , ## # stl_nonseasonal_lower , stl_nonseasonal_upper , ## # stl_nonseasonal_replacement , combined_lower , ## # combined_upper , combined_replacement # Plot outlier detection bands and/or points identified as outliers plot_outlr <- function(x, signal, method_abbr, bands = TRUE, points = TRUE, facet_vars = vars(.data$geo_value), nrow = NULL, ncol = NULL, scales = \"fixed\") { # Convert outlier detection results to long format signal <- rlang::enquo(signal) x_long <- x %>% pivot_longer( cols = starts_with(method_abbr), names_to = c(\"method\", \".value\"), names_pattern = \"(.+)_(.+)\" ) # Start of plot with observed data p <- ggplot() + geom_line(data = x, mapping = aes(x = .data$time_value, y = !!signal)) # If requested, add bands if (bands) { p <- p + geom_ribbon( data = x_long, aes( x = .data$time_value, ymin = .data$lower, ymax = .data$upper, color = .data$method ), fill = NA ) } # If requested, add points if (points) { x_detected <- x_long %>% filter((!!signal < .data$lower) | (!!signal > .data$upper)) p <- p + geom_point( data = x_detected, aes( x = .data$time_value, y = !!signal, color = .data$method, shape = .data$method ) ) } # If requested, add faceting if (!is.null(facet_vars)) { p <- p + facet_wrap(facet_vars, nrow = nrow, ncol = ncol, scales = scales) } return(p) } method_abbr <- c(detection_methods$abbr, \"combined\") plot_outlr(x %>% filter(geo_value == \"fl\"), cases, method_abbr, facet_vars = vars(method), scales = \"free_y\", ncol = 1 ) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs( x = \"Date\", y = \"Reported COVID-19 counts\", color = \"Method\", shape = \"Method\" ) plot_outlr(x %>% filter(geo_value == \"nj\"), cases, method_abbr, facet_vars = vars(method), scales = \"free_y\", ncol = 1 ) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs( x = \"Date\", y = \"Reported COVID-19 counts\", color = \"Method\", shape = \"Method\" )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/outliers.html","id":"outlier-correction","dir":"Articles","previous_headings":"","what":"Outlier correction","title":"Detect and correct outliers in signals","text":"Finally, order correct outliers, can use posited replacement values returned outlier detection method. use replacement value combined method, defined median replacement values base methods time point. advanced correction functionality coming point future.","code":"y <- x %>% mutate(cases_corrected = combined_replacement) %>% select(geo_value, time_value, cases, cases_corrected) y %>% filter(cases != cases_corrected) ## An `epi_df` object, 22 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2022-05-21 22:17:14.962335 ## ## # A tibble: 22 × 4 ## geo_value time_value cases cases_corrected ## * ## 1 fl 2020-07-12 15300 10181 ## 2 nj 2020-07-19 -8 320. ## 3 nj 2020-08-13 694 404. ## 4 nj 2020-08-14 619 397. ## 5 nj 2020-08-16 40 366 ## 6 nj 2020-08-22 555 360 ## 7 fl 2020-09-01 7569 2861. ## 8 nj 2020-10-08 1415 873. ## 9 fl 2020-10-10 0 2660 ## 10 fl 2020-10-11 5570 2660 ## # ℹ 12 more rows ggplot(y, aes(x = time_value)) + geom_line(aes(y = cases), linetype = 2) + geom_line(aes(y = cases_corrected), col = 2) + geom_hline(yintercept = 0, linetype = 3) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 counts\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/outliers.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Detect and correct outliers in signals","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"optimized-rolling-mean","dir":"Articles","previous_headings":"","what":"Optimized rolling mean","title":"Slide a computation over signal values","text":"first demonstrate apply 7-day trailing average daily cases order smooth signal, passing name column(s) want average first argument epi_slide_mean(). epi_slide_mean () can used averaging. computation per state, first call group_by(). calculation done using data.table::frollmean, whose behavior can adjusted passing relevant arguments via ....","code":"x %>% group_by(geo_value) %>% epi_slide_mean(\"cases\", before = 6) %>% ungroup() %>% head(10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases slide_value_cases ## * ## 1 ca 2020-03-01 6 NA ## 2 ca 2020-03-02 4 NA ## 3 ca 2020-03-03 6 NA ## 4 ca 2020-03-04 11 NA ## 5 ca 2020-03-05 10 NA ## 6 ca 2020-03-06 18 NA ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"slide-with-a-formula","dir":"Articles","previous_headings":"","what":"Slide with a formula","title":"Slide a computation over signal values","text":"previous computation can also performed using epi_slide(), flexible quite bit slower epi_slide_mean(). recommended use epi_slide_mean() possible. 7-day trailing average daily cases can computed passing formula first argument epi_slide(). per state, first call group_by(). formula specified access non-grouping columns present original epi_df object (must refer prefix .x$). can see, function epi_slide() returns epi_df object new column appended contains results (sliding), named slide_value default. can course change post hoc, can instead specify new name front using new_col_name argument: information available additional variables: .group_key one-row tibble containing values grouping variables associated group .ref_time_value reference time value time window based Like group_modify(), alternative names variables well: . can used instead .x, .y instead .group_key, .z instead .ref_time_value.","code":"x %>% group_by(geo_value) %>% epi_slide(~ mean(.x$cases), before = 6) %>% ungroup() %>% head(10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases slide_value ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4 x <- x %>% group_by(geo_value) %>% epi_slide(~ mean(.x$cases), before = 6, new_col_name = \"cases_7dav\") %>% ungroup() head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"slide-with-a-function","dir":"Articles","previous_headings":"","what":"Slide with a function","title":"Slide a computation over signal values","text":"can also pass function first argument epi_slide(). case, passed function must accept following arguments: case, passed function f must accept following arguments: data frame column names original object, minus grouping variables, containing time window data one group-ref_time_value combination; followed one-row tibble containing values grouping variables associated group; followed associated ref_time_value. can accept additional arguments; epi_slide() forward ... args receives f. Recreating last example 7-day trailing average:","code":"x <- x %>% group_by(geo_value) %>% epi_slide(function(x, gk, rtv) mean(x$cases), before = 6, new_col_name = \"cases_7dav\") %>% ungroup() head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"slide-the-tidy-way","dir":"Articles","previous_headings":"","what":"Slide the tidy way","title":"Slide a computation over signal values","text":"Perhaps convenient way setup computation epi_slide() pass expression tidy evaluation. case, can simply define name new column directly part expression, setting equal computation can access columns x name, just call dplyr::mutate(), dplyr verbs. example: addition referring individual columns name, can refer time window data epi_df tibble using .x. Similarly, arguments function format available magic names .group_key .ref_time_value, tidyverse “pronouns” .data .env can also used. simple sanity check, visualize 7-day trailing averages computed top original counts: can see top right panel, looks like Texas moved weekly reporting COVID-19 cases summer 2021.","code":"x <- x %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% ungroup() head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4 library(ggplot2) theme_set(theme_bw()) ggplot(x, aes(x = time_value)) + geom_col(aes(y = cases, fill = geo_value), alpha = 0.5, show.legend = FALSE) + geom_line(aes(y = cases_7dav, col = geo_value), show.legend = FALSE) + facet_wrap(~geo_value, scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 cases\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"running-a-local-forecaster","dir":"Articles","previous_headings":"","what":"Running a local forecaster","title":"Slide a computation over signal values","text":"complex example, create forecaster based local (time) autoregression AR model. AR models can fit numerous ways (using base R functions various packages), define “hand” provides advanced example sliding function epi_df object, allows us bit flexible defining probabilistic forecaster: one outputs just point prediction, notion uncertainty around . particular, forecaster output point prediction along 90% uncertainty band, represented predictive quantiles 5% 95% levels (lower upper endpoints uncertainty band). function defined , prob_ar(), probabilistic AR forecaster. lagsargument indicates lags use model, ahead indicates far ahead future make forecasts (encoded terms units time_value column; , days, working epi_df considered vignette). go ahead slide AR forecaster working epi_df COVID-19 cases. Note actually model cases_7dav column, operate scale smoothed COVID-19 cases. clearly equivalent, constant, modeling weekly sums COVID-19 cases. Note utilized argument ref_time_values perform sliding computation (, compute forecast) specific subset reference time values. get three columns fc_point, fc_lower, fc_upper correspond point forecast, lower upper endpoints 95% prediction band, respectively. (instead set as_list_col = TRUE call epi_slide(), gotten list column fc, element fc data frame named columns point, lower, upper.) finish , plot forecasts times (spaced months) last year, multiple horizons: 7, 14, 21, 28 days ahead. , encapsulate process generating forecasts simple function, can call times. Two points worth making. First, AR model’s performance pretty spotty. various points time, can see forecasts volatile (point predictions place), overconfident (bands narrow), time. meant simple demo entirely unexpected given way AR model set . epipredict package, companion package epiprocess, offers suite predictive modeling tools can improve shortcomings simple AR model. Second, AR forecaster using finalized data, meaning, uses latest versions signal values (reported COVID-19 cases) available, training models making predictions historically. However, reflective provisional nature data must cope true forecast task. Training making predictions finalized data can lead overly optimistic sense accuracy; see, example, McDonald et al. (2021), references therein. Fortunately, epiprocess package provides data structure called epi_archive can used store data revisions, furthermore, epi_archive object knows slide computations correct version-aware sense (computation reference time \\(t\\), uses data available \\(t\\)). revisit example archive vignette.","code":"prob_ar <- function(y, lags = c(0, 7, 14), ahead = 6, min_train_window = 20, lower_level = 0.05, upper_level = 0.95, symmetrize = TRUE, intercept = FALSE, nonneg = TRUE) { # Return NA if insufficient training data if (length(y) < min_train_window + max(lags) + ahead) { return(data.frame(point = NA, lower = NA, upper = NA)) } # Build features and response for the AR model dat <- do.call( data.frame, purrr::map(lags, function(j) lag(y, n = j)) ) names(dat) <- paste0(\"x\", seq_len(ncol(dat))) if (intercept) dat$x0 <- rep(1, nrow(dat)) dat$y <- lead(y, n = ahead) # Now fit the AR model and make a prediction obj <- lm(y ~ . + 0, data = dat) point <- predict(obj, newdata = tail(dat, 1)) # Compute a band r <- residuals(obj) s <- ifelse(symmetrize, -1, NA) # Should the residuals be symmetrized? q <- quantile(c(r, s * r), probs = c(lower_level, upper_level), na.rm = TRUE) lower <- point + q[1] upper <- point + q[2] # Clip at zero if we need to, then return if (nonneg) { point <- max(point, 0) lower <- max(lower, 0) upper <- max(upper, 0) } return(data.frame(point = point, lower = lower, upper = upper)) } fc_time_values <- seq(as.Date(\"2020-06-01\"), as.Date(\"2021-12-01\"), by = \"1 months\" ) x %>% group_by(geo_value) %>% epi_slide( fc = prob_ar(cases_7dav), before = 119, ref_time_values = fc_time_values ) %>% ungroup() %>% head(10) ## An `epi_df` object, 10 x 7 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 7 ## geo_value time_value cases cases_7dav fc_point fc_lower fc_upper ## * ## 1 ca 2020-06-01 2437 2694 2973. 2566. 3380. ## 2 ca 2020-07-01 7346 6722 7892. 7321. 8462. ## 3 ca 2020-08-01 8616 8284. 7188. 6153. 8223. ## 4 ca 2020-09-01 4248 4707. 4133. 2329. 5937. ## 5 ca 2020-10-01 3504 3360. 3257. 1449. 5064. ## 6 ca 2020-11-01 4210 4441. 3840. 2258. 5422. ## 7 ca 2020-12-01 23626 15690 17699. 16082. 19316. ## 8 ca 2021-01-01 50251 41097. 45534. 38417. 52650. ## 9 ca 2021-02-01 13098 17952. 15266. 6725. 23808. ## 10 ca 2021-03-01 3031 5209 4482. 0 12982. # Note the use of all_rows = TRUE (keeps all original rows in the output) k_week_ahead <- function(x, ahead = 7) { x %>% group_by(.data$geo_value) %>% epi_slide( fc = prob_ar(.data$cases_7dav, ahead = ahead), before = 119, ref_time_values = fc_time_values, all_rows = TRUE ) %>% ungroup() %>% mutate(target_date = .data$time_value + ahead) } # First generate the forecasts, and bind them together z <- bind_rows( k_week_ahead(x, ahead = 7), k_week_ahead(x, ahead = 14), k_week_ahead(x, ahead = 21), k_week_ahead(x, ahead = 28) ) # Now plot them, on top of actual COVID-19 case counts ggplot(z) + geom_line(aes(x = time_value, y = cases_7dav), color = \"gray50\") + geom_ribbon(aes( x = target_date, ymin = fc_lower, ymax = fc_upper, group = time_value ), fill = 6, alpha = 0.4) + geom_line(aes(x = target_date, y = fc_point, group = time_value)) + geom_point(aes(x = target_date, y = fc_point, group = time_value), size = 0.5 ) + geom_vline( data = tibble(x = fc_time_values), aes(xintercept = x), linetype = 2, alpha = 0.5 ) + facet_wrap(vars(geo_value), scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 cases\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Slide a computation over signal values","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Jacob Bien. Contributor. Logan Brooks. Author, maintainer. Rafael Catoia. Contributor. Nat DeFries. Contributor. Daniel McDonald. Author. Rachel Lobay. Contributor. Ken Mawer. Contributor. Chloe . Contributor. Quang Nguyen. Contributor. Evan Ray. Author. Dmitry Shemetov. Contributor. Ryan Tibshirani. Author. Lionel Henry. Contributor. Author included rlang fragments Hadley Wickham. Contributor. Author included rlang fragments Posit. Copyright holder. Copyright holder included rlang fragments","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Brooks L, McDonald D, Ray E, Tibshirani R (2024). epiprocess: Tools basic signal processing epidemiology. R package version 0.7.11, https://cmu-delphi.github.io/epiprocess/.","code":"@Manual{, title = {epiprocess: Tools for basic signal processing in epidemiology}, author = {Logan Brooks and Daniel McDonald and Evan Ray and Ryan Tibshirani}, year = {2024}, note = {R package version 0.7.11}, url = {https://cmu-delphi.github.io/epiprocess/}, }"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"epiprocess","dir":"","previous_headings":"","what":"Tools for basic signal processing in epidemiology","title":"Tools for basic signal processing in epidemiology","text":"package introduces common data structure epidemiological data sets measured space time, offers associated utilities perform basic signal processing tasks. See getting started guide vignettes examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tools for basic signal processing in epidemiology","text":"install (unless ’re making changes package, use stable version):","code":"# Stable version pak::pkg_install(\"cmu-delphi/epiprocess@main\") # Dev version pak::pkg_install(\"cmu-delphi/epiprocess@dev\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"epi_df-snapshot-of-a-data-set","dir":"","previous_headings":"","what":"epi_df: snapshot of a data set","title":"Tools for basic signal processing in epidemiology","text":"first main data structure epiprocess package called epi_df. simply tibble couple required columns, geo_value time_value. can number columns, can seen measured variables, also call signal variables. brief, epi_df object represents snapshot data set contains --date values signals variables, given time. convention, functions epiprocess package operate epi_df objects begin epi. example: epi_slide(), iteratively applying custom computation variable epi_df object sliding windows time; epi_cor(), computing lagged correlations variables epi_df object, (allowing grouping geo value, time value, variables). Functions package operate directly given variables begin epi. example: growth_rate(), estimating growth rate given signal given time values, using various methodologies; detect_outlr(), detecting outliers given signal time, using either built-custom methodologies.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"epi_archive-full-version-history-of-a-data-set","dir":"","previous_headings":"","what":"epi_archive: full version history of a data set","title":"Tools for basic signal processing in epidemiology","text":"second main data structure package called epi_archive. special class (R6 format) wrapped around data table stores archive (version history) signal variables interest. convention, functions epiprocess package operate epi_archive objects begin epix (“x” meant remind “archive”). just wrapper functions around public methods epi_archive R6 class. example: epix_as_of(), generating snapshot epi_df format data archive, represents --date values signal variables, specified version; epix_fill_through_version(), filling fake version data following simple rules, use downstream methods expect archive --date (e.g., forecasting deadline date one data sources accessed provide latest versions data) epix_merge(), merging two data archives , support various approaches handling one archives --date version-wise ; epix_slide(), sliding custom computation data archive local windows time, much like epi_slide epi_df object, one key difference: sliding computation given reference time t performed data available t.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"data source based information outpatient visits, provided us health system partners, also contains confirmed COVID-19 cases based reports made available Center Systems Science Engineering Johns Hopkins University. example data ranges June 1, 2020 Dec 1, 2021, also limited California, Florida, Texas, New York.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"","code":"archive_cases_dv_subset"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"epi_archive data format. data table DT 129,638 rows 5 columns: geo_value geographic value associated row measurements. time_value time value associated row measurements. version time value specifying version row measurements. percent_cli percentage doctor’s visits CLI (COVID-like illness) computed medical insurance claims case_rate_7d_av 7-day average signal number new confirmed deaths due COVID-19 per 100,000 population, daily","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Doctor Visits API: signal percent_cli taken directly API without changes. COVIDcast Epidata API: case_rate_7d_av signal computed Delphi original JHU-CSSE data calculating moving averages preceding 7 days, signal June 7 average underlying data June 1 7, inclusive. Furthermore, data subset full dataset, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert to epi_df format — as_epi_df","title":"Convert to epi_df format — as_epi_df","text":"Converts data frame tibble epi_df object. See getting started guide examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert to epi_df format — as_epi_df","text":"","code":"as_epi_df(x, ...) # S3 method for epi_df as_epi_df(x, ...) # S3 method for tbl_df as_epi_df(x, geo_type, time_type, as_of, additional_metadata = list(), ...) # S3 method for data.frame as_epi_df(x, geo_type, time_type, as_of, additional_metadata = list(), ...) # S3 method for tbl_ts as_epi_df(x, geo_type, time_type, as_of, additional_metadata = list(), ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert to epi_df format — as_epi_df","text":"x data.frame, tibble::tibble, tsibble::tsibble converted ... Additional arguments passed methods. geo_type Type geo values. missing, function attempt infer geo values present; fails, set \"custom\". time_type Type time values. missing, function attempt infer time values present; fails, set \"custom\". as_of Time value representing time given data available. example, as_of January 31, 2022, epi_df object created represent --date version data available January 31, 2022. as_of argument missing, current day-time used. additional_metadata List additional metadata attach epi_df object. metadata geo_type, time_type, as_of fields; named entries passed list included well. tibble additional keys, sure specify character vector other_keys component additional_metadata.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert to epi_df format — as_epi_df","text":"epi_df object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"methods-by-class-","dir":"Reference","previous_headings":"","what":"Methods (by class)","title":"Convert to epi_df format — as_epi_df","text":"as_epi_df(epi_df): Simply returns epi_df object unchanged. as_epi_df(tbl_df): input tibble x must contain columns geo_value time_value. columns preserved , treated measured variables. as_of missing, function try guess as_of, issue, version column x (present), as_of field metadata (stored attributes); fails, current day-time used. as_epi_df(data.frame): Works analogously as_epi_df.tbl_df(). as_epi_df(tbl_ts): Works analogously as_epi_df.tbl_df(), except tbl_ts class dropped, key variables (\"geo_value\") added metadata returned object, other_keys field.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert to epi_df format — as_epi_df","text":"","code":"# Convert a `tsibble` that has county code as an extra key # Notice that county code should be a character string to preserve any leading zeroes ex1_input <- tibble::tibble( geo_value = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), county_code = c( \"06059\", \"06061\", \"06067\", \"12111\", \"12113\", \"12117\", \"42101\", \"42103\", \"42105\" ), time_value = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\" ), length.out = length(geo_value)), value = 1:length(geo_value) + 0.01 * rnorm(length(geo_value)) ) %>% tsibble::as_tsibble(index = time_value, key = c(geo_value, county_code)) # The `other_keys` metadata (`\"county_code\"` in this case) is automatically # inferred from the `tsibble`'s `key`: ex1 <- as_epi_df(x = ex1_input, geo_type = \"state\", time_type = \"day\", as_of = \"2020-06-03\") attr(ex1, \"metadata\")[[\"other_keys\"]] #> [1] \"county_code\" # Dealing with misspecified column names: # Geographical and temporal information must be provided in columns named # `geo_value` and `time_value`; if we start from a data frame with a # different format, it must be converted to use `geo_value` and `time_value` # before calling `as_epi_df`. ex2_input <- tibble::tibble( state = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), # misnamed pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 3), # extra key reported_date = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\" ), length.out = length(state)), # misnamed value = 1:length(state) + 0.01 * rnorm(length(state)) ) print(ex2_input) #> # A tibble: 9 × 4 #> state pol reported_date value #> #> 1 ca blue 2020-06-01 0.997 #> 2 ca blue 2020-06-02 1.99 #> 3 ca blue 2020-06-03 3.01 #> 4 fl swing 2020-06-01 4.02 #> 5 fl swing 2020-06-02 4.98 #> 6 fl swing 2020-06-03 6.01 #> 7 pa swing 2020-06-01 6.98 #> 8 pa swing 2020-06-02 7.99 #> 9 pa swing 2020-06-03 9.00 ex2 <- ex2_input %>% dplyr::rename(geo_value = state, time_value = reported_date) %>% as_epi_df( geo_type = \"state\", as_of = \"2020-06-03\", additional_metadata = list(other_keys = \"pol\") ) attr(ex2, \"metadata\") #> $geo_type #> [1] \"state\" #> #> $time_type #> [1] \"day\" #> #> $as_of #> [1] \"2020-06-03\" #> #> $other_keys #> [1] \"pol\" #> # Adding additional keys to an `epi_df` object ex3_input <- jhu_csse_county_level_subset %>% dplyr::filter(time_value > \"2021-12-01\", state_name == \"Massachusetts\") %>% dplyr::slice_tail(n = 6) ex3 <- ex3_input %>% tsibble::as_tsibble() %>% # needed to add the additional metadata # add 2 extra keys dplyr::mutate( state = rep(\"MA\", 6), pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 2) ) %>% # the 2 extra keys we added have to be specified in the other_keys # component of additional_metadata. as_epi_df(additional_metadata = list(other_keys = c(\"state\", \"pol\"))) attr(ex3, \"metadata\") #> $geo_type #> [1] \"county\" #> #> $time_type #> [1] \"week\" #> #> $as_of #> [1] \"2024-06-20 23:03:00 UTC\" #> #> $other_keys #> [1] \"state\" \"pol\" #>"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tibble.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert to tibble — as_tibble.epi_df","title":"Convert to tibble — as_tibble.epi_df","text":"Converts epi_df object tibble, dropping metadata grouping.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tibble.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert to tibble — as_tibble.epi_df","text":"","code":"# S3 method for epi_df as_tibble(x, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tibble.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert to tibble — as_tibble.epi_df","text":"x epi_df ... additional arguments forward NextMethod()","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tsibble.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert to tsibble format — as_tsibble.epi_df","title":"Convert to tsibble format — as_tsibble.epi_df","text":"Converts epi_df object tsibble, index taken time_value, key variables taken geo_value along others other_keys field metadata, else explicitly set.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tsibble.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert to tsibble format — as_tsibble.epi_df","text":"","code":"# S3 method for epi_df as_tsibble(x, key, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tsibble.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert to tsibble format — as_tsibble.epi_df","text":"x epi_df key Optional. additional keys (geo_value) add tsibble. ... additional arguments passed tsibble::as_tsibble()","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Automatically plot an epi_df — autoplot.epi_df","title":"Automatically plot an epi_df — autoplot.epi_df","text":"Automatically plot epi_df","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Automatically plot an epi_df — autoplot.epi_df","text":"","code":"# S3 method for epi_df autoplot( object, ..., .color_by = c(\"all_keys\", \"geo_value\", \"other_keys\", \".response\", \"all\", \"none\"), .facet_by = c(\".response\", \"other_keys\", \"all_keys\", \"geo_value\", \"all\", \"none\"), .base_color = \"#3A448F\", .max_facets = Inf )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Automatically plot an epi_df — autoplot.epi_df","text":"object epi_df ... One unquoted expressions separated commas. Variable names can used positions data frame, expressions like x:y can used select range variables. .color_by variables determine color(s) used plot lines. Options include: all_keys - default uses interaction key variables including geo_value geo_value - geo_value other_keys - available keys geo_value .response - numeric variables (y-axis) - uses interaction keys numeric variables none - coloring aesthetic applied .facet_by Similar .color_by except default display numeric variable separate facet .base_color Lines shown color. example, single numeric variable faceting geo_value, locations share color line. .max_facets Cut number facets displayed. Especially useful testing many geo_value's keys.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Automatically plot an epi_df — autoplot.epi_df","text":"ggplot object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Automatically plot an epi_df — autoplot.epi_df","text":"","code":"autoplot(jhu_csse_daily_subset, cases, death_rate_7d_av) autoplot(jhu_csse_daily_subset, case_rate_7d_av, .facet_by = \"geo_value\") autoplot(jhu_csse_daily_subset, case_rate_7d_av, .color_by = \"none\", .facet_by = \"geo_value\" ) autoplot(jhu_csse_daily_subset, case_rate_7d_av, .color_by = \"none\", .base_color = \"red\", .facet_by = \"geo_value\" ) # .base_color specification won't have any effect due .color_by default autoplot(jhu_csse_daily_subset, case_rate_7d_av, .base_color = \"red\", .facet_by = \"geo_value\" )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/clone.html","id":null,"dir":"Reference","previous_headings":"","what":"Clone an epi_archive object. — clone","title":"Clone an epi_archive object. — clone","text":"Clone epi_archive object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/clone.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clone an epi_archive object. — clone","text":"","code":"clone(x) # S3 method for epi_archive clone(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/clone.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Clone an epi_archive object. — clone","text":"x epi_archive object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/compactify.html","id":null,"dir":"Reference","previous_headings":"","what":"Compactify — compactify","title":"Compactify — compactify","text":"section describes internals compactification works epi_archive(). Compactification can potentially improve code speed memory usage, depending data.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/compactify.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compactify — compactify","text":"general, last version observation carried forward (LOCF) fill data recorded versions, last recorded update versions_end. One consequence DT contain full snapshot every version (although generally works), can instead contain rows new changed previous version (see compactify, automatically). Currently, deletions must represented revising row special state (e.g., making entries NA including special column flags data removed performing kind post-processing), archive unaware state . Note NAs can introduced epi_archive methods reasons, e.g., epix_fill_through_version epix_merge, requested, represent potential update data yet access ; epix_merge represent \"value\" observation version first released, version observation appears archive data .","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":null,"dir":"Reference","previous_headings":"","what":"Detect outliers — detect_outlr","title":"Detect outliers — detect_outlr","text":"Applies one outlier detection methods given signal variable, optionally aggregates outputs create consensus result. See outliers vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Detect outliers — detect_outlr","text":"","code":"detect_outlr( x = seq_along(y), y, methods = tibble::tibble(method = \"rm\", args = list(list()), abbr = \"rm\"), combiner = c(\"median\", \"mean\", \"none\") )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Detect outliers — detect_outlr","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. methods tibble specifying method(s) use outlier detection, one row per method, following columns: method: Either \"rm\" \"stl\", custom function outlier detection; see details explanation. args: Named list arguments passed detection method. abbr: Abbreviation use naming output columns results method. combiner String, one \"median\", \"mean\", \"none\", specifying combine results different outlier detection methods thresholds determining whether particular observation classified outlier, well replacement value outliers. \"none\", summarized results calculated. Note number methods (number rows) odd, \"median\" equivalent majority vote purposes determining whether given observation outlier.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Detect outliers — detect_outlr","text":"tibble number rows equal length(y) columns giving outlier detection thresholds (lower upper) replacement values detection method (replacement).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Detect outliers — detect_outlr","text":"outlier detection method, one per row passed methods tibble, function must take first two arguments x y, number additional arguments. function must return tibble number rows equal length(y), columns lower, upper, replacement, representing lower upper bounds considered outlier, posited replacement value, respectively. convenience, outlier detection method can specified (method column methods) string \"rm\", shorthand detect_outlr_rm(), detects outliers via rolling median; \"stl\", shorthand detect_outlr_stl(), detects outliers via STL decomposition.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Detect outliers — detect_outlr","text":"","code":"detection_methods <- dplyr::bind_rows( dplyr::tibble( method = \"rm\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5 )), abbr = \"rm\" ), dplyr::tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = 7 )), abbr = \"stl_seasonal\" ), dplyr::tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = NULL )), abbr = \"stl_nonseasonal\" ) ) x <- incidence_num_outlier_example %>% dplyr::select(geo_value, time_value, cases) %>% as_epi_df() %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr( x = time_value, y = cases, methods = detection_methods, combiner = \"median\" )) %>% unnest(outlier_info)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":null,"dir":"Reference","previous_headings":"","what":"Detect outliers based on a rolling median — detect_outlr_rm","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"Detects outliers based distance rolling median specified terms multiples rolling interquartile range (IQR).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"","code":"detect_outlr_rm( x = seq_along(y), y, n = 21, log_transform = FALSE, detect_negatives = FALSE, detection_multiplier = 2, min_radius = 0, replacement_multiplier = 0 )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. n Number time steps use rolling window. Default 21. value centrally aligned. n odd number, rolling window extends (n-1)/2 time steps design point (n-1)/2 time steps . n even, rolling range extends n/2-1 time steps n/2 time steps . log_transform log transform applied running outlier detection? Default FALSE. TRUE, zeros present, log transform padded 1. detect_negatives negative values automatically count outliers? Default FALSE. detection_multiplier Value determining far outlier detection thresholds rolling median, calculated (rolling median) +/- (detection multiplier) * (rolling IQR). Default 2. min_radius Minimum distance rolling median threshold, transformed scale. Default 0. replacement_multiplier Value determining far replacement values rolling median. replacement original value within detection thresholds, otherwise rounded nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default 0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"tibble number rows equal length(y) columns giving outlier detection thresholds (lower upper) replacement values detection method (replacement).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"","code":"# Detect outliers based on a rolling median incidence_num_outlier_example %>% dplyr::select(geo_value, time_value, cases) %>% as_epi_df() %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr_rm( x = time_value, y = cases )) %>% unnest(outlier_info) #> An `epi_df` object, 730 x 6 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2022-05-21 22:17:14.962335 #> #> # A tibble: 730 × 6 #> # Groups: geo_value [2] #> geo_value time_value cases lower upper replacement #> * #> 1 fl 2020-06-01 667 530 2010 667 #> 2 nj 2020-06-01 486 150. 840. 486 #> 3 fl 2020-06-02 617 582. 1992. 617 #> 4 nj 2020-06-02 658 210. 771. 658 #> 5 fl 2020-06-03 1317 635 1975 1317 #> 6 nj 2020-06-03 541 270 702 541 #> 7 fl 2020-06-04 1419 713 1909 1419 #> 8 nj 2020-06-04 478 174. 790. 478 #> 9 fl 2020-06-05 1305 553 2081 1305 #> 10 nj 2020-06-05 825 118. 838. 825 #> # ℹ 720 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":null,"dir":"Reference","previous_headings":"","what":"Detect outliers based on an STL decomposition — detect_outlr_stl","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"Detects outliers based seasonal-trend decomposition using LOESS (STL).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"","code":"detect_outlr_stl( x = seq_along(y), y, n_trend = 21, n_seasonal = 21, n_threshold = 21, seasonal_period = NULL, log_transform = FALSE, detect_negatives = FALSE, detection_multiplier = 2, min_radius = 0, replacement_multiplier = 0 )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. n_trend Number time steps use rolling window trend. Default 21. n_seasonal Number time steps use rolling window seasonality. Default 21. n_threshold Number time steps use rolling window IQR outlier thresholds. seasonal_period Integer specifying period seasonality. example, daily data, period 7 means weekly seasonality. default NULL, meaning seasonal term included STL decomposition. log_transform log transform applied running outlier detection? Default FALSE. TRUE, zeros present, log transform padded 1. detect_negatives negative values automatically count outliers? Default FALSE. detection_multiplier Value determining far outlier detection thresholds rolling median, calculated (rolling median) +/- (detection multiplier) * (rolling IQR). Default 2. min_radius Minimum distance rolling median threshold, transformed scale. Default 0. replacement_multiplier Value determining far replacement values rolling median. replacement original value within detection thresholds, otherwise rounded nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default 0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"tibble number rows equal length(y) columns giving outlier detection thresholds (lower upper) replacement values detection method (replacement).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"STL decomposition computed using feasts package. computed, outlier detection method analogous rolling median method detect_outlr_rm(), except fitted values residuals STL decomposition taking place rolling median residuals rolling median, respectively. last set arguments, log_transform replacement_multiplier, exactly detect_outlr_rm().","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"","code":"# Detects outliers based on a seasonal-trend decomposition using LOESS incidence_num_outlier_example %>% dplyr::select(geo_value, time_value, cases) %>% as_epi_df() %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr_stl( x = time_value, y = cases, seasonal_period = 7 )) %>% # weekly seasonality for daily data unnest(outlier_info) #> An `epi_df` object, 730 x 6 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2022-05-21 22:17:14.962335 #> #> # A tibble: 730 × 6 #> # Groups: geo_value [2] #> geo_value time_value cases lower upper replacement #> * #> 1 fl 2020-06-01 667 -1193. 1233. 667 #> 2 nj 2020-06-01 486 281. 762. 486 #> 3 fl 2020-06-02 617 -691. 1890. 617 #> 4 nj 2020-06-02 658 317. 891. 658 #> 5 fl 2020-06-03 1317 -144. 2396. 1317 #> 6 nj 2020-06-03 541 292. 809. 541 #> 7 fl 2020-06-04 1419 260. 2696. 1419 #> 8 nj 2020-06-04 478 315. 792. 478 #> 9 fl 2020-06-05 1305 548. 2950. 1305 #> 10 nj 2020-06-05 825 382. 835. 825 #> # ℹ 720 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":null,"dir":"Reference","previous_headings":"","what":"epi_archive object — epi_archive","title":"epi_archive object — epi_archive","text":"epi_archive S3 class contains data table along several relevant pieces metadata. data table can seen full archive (version history) signal variables interest.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"epi_archive object — epi_archive","text":"","code":"new_epi_archive( x, geo_type = NULL, time_type = NULL, other_keys = NULL, additional_metadata = NULL, compactify = NULL, clobberable_versions_start = NULL, versions_end = NULL ) validate_epi_archive( x, geo_type = NULL, time_type = NULL, other_keys = NULL, additional_metadata = NULL, compactify = NULL, clobberable_versions_start = NULL, versions_end = NULL ) as_epi_archive( x, geo_type = NULL, time_type = NULL, other_keys = NULL, additional_metadata = NULL, compactify = NULL, clobberable_versions_start = NULL, versions_end = NULL )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"epi_archive object — epi_archive","text":"x data.frame, data.table, tibble, columns geo_value, time_value, version, additional number columns. geo_type Type geo values. missing, function attempt infer geo values present; fails, set \"custom\". time_type Type time values. missing, function attempt infer time values present; fails, set \"custom\". other_keys Character vector specifying names variables x considered key variables (language data.table) apart \"geo_value\", \"time_value\", \"version\". additional_metadata List additional metadata attach epi_archive object. metadata geo_type time_type fields; named entries passed list included well. compactify Optional; Boolean NULL. TRUE remove redundant rows, FALSE , missing NULL remove redundant rows, issue warning. See information compactify. clobberable_versions_start Optional; length-1; either value class typeof x$version, NA class typeof: specifically, either () earliest version subject \"clobbering\" (overwritten different update data, using version tag old update data), (b) NA, indicate versions clobberable. variety reasons versions clobberable routine circumstances, () today's version one/columns published initially filled NA LOCF, (b) buggy version today's data published fixed republished later day, (c) data pipeline delays (e.g., publisher uploading, periodic scraping, database syncing, periodic fetching, etc.) make events () (b) reflected later day (even different day) expected; potential causes vary different data pipelines. default value NA, consider versions clobberable. Another setting may appropriate pipelines max_version_with_row_in(x). versions_end Optional; length-1, class typeof x$version: last version observed? default max_version_with_row_in(x), values greater also valid, indicate observed additional versions data beyond max(x$version), contained empty updates. (default value clobberable_versions_start fully trust empty updates, assumes version >= max(x$version) clobbered.) nrow(x) == 0, argument mandatory.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"epi_archive object — epi_archive","text":"epi_archive object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"epi_archive object — epi_archive","text":"Epi Archive epi_archive contains data table DT, class data.table data.table package, (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. version: time value specifying version row measurements. example, given row version January 15, 2022 time_value January 14, 2022, row contains measurements data January 14, 2022 available one day later. data table DT key variables geo_value, time_value, version, well others (can specified instantiating epi_archive object via other_keys argument, /set operating DT directly). Refer documentation as_epi_archive() information examples relevant parameter names epi_archive object. Note can single row per unique combination key variables, thus key variables critical figuring generate snapshot data archive, given version.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"metadata","dir":"Reference","previous_headings":"","what":"Metadata","title":"epi_archive object — epi_archive","text":"following pieces metadata included fields epi_archive object: geo_type: type geo values. time_type: type time values. additional_metadata: list additional metadata data archive. Unlike epi_df object, metadata epi_archive object x can accessed (altered) directly, x$geo_type x$time_type, etc. Like epi_df object, geo_type time_type fields metadata epi_archive object currently used downstream functions epiprocess package, serve useful bits information convey data set hand.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"generating-snapshots","dir":"Reference","previous_headings":"","what":"Generating Snapshots","title":"epi_archive object — epi_archive","text":"epi_archive object can used generate snapshot data epi_df format, represents --date values signal variables, specified version. accomplished calling epix_as_of().","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"sliding-computations","dir":"Reference","previous_headings":"","what":"Sliding Computations","title":"epi_archive object — epi_archive","text":"can run sliding computation epi_archive object, much like epi_slide() epi_df object. accomplished calling slide() method epi_archive object, works similarly way epi_slide() works epi_df object, one key difference: version-aware. , epi_archive object, sliding computation given reference time point t performed data available t.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"epi_archive object — epi_archive","text":"","code":"# Simple ex. with necessary keys tib <- tibble::tibble( geo_value = rep(c(\"ca\", \"hi\"), each = 5), time_value = rep(seq(as.Date(\"2020-01-01\"), by = 1, length.out = 5 ), times = 2), version = rep(seq(as.Date(\"2020-01-02\"), by = 1, length.out = 5 ), times = 2), value = rnorm(10, mean = 2, sd = 1) ) toy_epi_archive <- tib %>% as_epi_archive( geo_type = \"state\", time_type = \"day\" ) toy_epi_archive #> → An `epi_archive` object, with metadata: #> ℹ Min/max time values: 2020-01-01 / 2020-01-05 #> ℹ First/last version with update: 2020-01-02 / 2020-01-06 #> ℹ Versions end: 2020-01-06 #> ℹ A preview of the table (10 rows x 4 columns): #> Key: #> geo_value time_value version value #> #> 1: ca 2020-01-01 2020-01-02 2.5429963 #> 2: ca 2020-01-02 2020-01-03 1.0859252 #> 3: ca 2020-01-03 2020-01-04 2.4681544 #> 4: ca 2020-01-04 2020-01-05 2.3629513 #> 5: ca 2020-01-05 2020-01-06 0.6954565 #> 6: hi 2020-01-01 2020-01-02 2.7377763 #> 7: hi 2020-01-02 2020-01-03 3.8885049 #> 8: hi 2020-01-03 2020-01-04 1.9025549 #> 9: hi 2020-01-04 2020-01-05 1.0641526 #> 10: hi 2020-01-05 2020-01-06 1.9840497 # Ex. with an additional key for county df <- data.frame( geo_value = c(replicate(2, \"ca\"), replicate(2, \"fl\")), county = c(1, 3, 2, 5), time_value = c( \"2020-06-01\", \"2020-06-02\", \"2020-06-01\", \"2020-06-02\" ), version = c( \"2020-06-02\", \"2020-06-03\", \"2020-06-02\", \"2020-06-03\" ), cases = c(1, 2, 3, 4), cases_rate = c(0.01, 0.02, 0.01, 0.05) ) x <- df %>% as_epi_archive( geo_type = \"state\", time_type = \"day\", other_keys = \"county\" )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlations between variables in an epi_df object — epi_cor","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"Computes correlations variables epi_df object, allowing grouping geo value, time value, variables. See correlation vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"","code":"epi_cor( x, var1, var2, dt1 = 0, dt2 = 0, shift_by = geo_value, cor_by = geo_value, use = \"na.or.complete\", method = c(\"pearson\", \"kendall\", \"spearman\") )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"x epi_df object consideration. var1, var2 variables x correlate. dt1, dt2 Time shifts consider two variables, respectively, computing correlations. Negative shifts translate lag value positive shifts lead value; example, dt = -1, new value June 2 original value June 1; dt = 1, new value June 2 original value June 3; dt = 0, values left . Default 0 dt1 dt2. shift_by variables(s) group , time shifts. default geo_value. However, also use, example, shift_by = c(geo_value, age_group), assuming x column age_group, perform time shifts per geo value age group. omit grouping entirely, use cor_by = NULL. Note grouping always undone correlation computations. cor_by variable(s) group , correlation computations. geo_value, default, correlations computed geo value, time; time_value, correlations computed time, geo values. grouping can also specified using number columns x; example, can use cor_by = c(geo_value, age_group), assuming x column age_group, order compute correlations pair geo value age group. omit grouping entirely, use cor_by = NULL. Note grouping always done time shifts. use, method Arguments pass cor(), \"na..complete\" default use (different cor()) \"pearson\" default method (cor()).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"tibble grouping columns first (geo_value, time_value, possibly others), column cor, gives correlation.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"","code":"# linear association of case and death rates on any given day epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = \"time_value\" ) #> Warning: There were 3 warnings in `dplyr::summarize()`. #> The first warning was: #> ℹ In argument: `cor = cor(x = .data$var1, y = .data$var2, use = use, method = #> method)`. #> ℹ In group 1: `time_value = 2020-03-01`. #> Caused by warning in `cor()`: #> ! the standard deviation is zero #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings. #> # A tibble: 671 × 2 #> time_value cor #> #> 1 2020-03-01 NA #> 2 2020-03-02 NA #> 3 2020-03-03 NA #> 4 2020-03-04 0.746 #> 5 2020-03-05 0.549 #> 6 2020-03-06 0.692 #> 7 2020-03-07 0.277 #> 8 2020-03-08 -0.226 #> 9 2020-03-09 -0.195 #> 10 2020-03-10 -0.227 #> # ℹ 661 more rows # correlation of death rates and lagged case rates epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = time_value, dt1 = -2 ) #> Warning: There was 1 warning in `dplyr::summarize()`. #> ℹ In argument: `cor = cor(x = .data$var1, y = .data$var2, use = use, method = #> method)`. #> ℹ In group 3: `time_value = 2020-03-03`. #> Caused by warning in `cor()`: #> ! the standard deviation is zero #> # A tibble: 671 × 2 #> time_value cor #> #> 1 2020-03-01 NA #> 2 2020-03-02 NA #> 3 2020-03-03 NA #> 4 2020-03-04 0.989 #> 5 2020-03-05 0.907 #> 6 2020-03-06 0.746 #> 7 2020-03-07 0.549 #> 8 2020-03-08 -0.158 #> 9 2020-03-09 -0.126 #> 10 2020-03-10 -0.163 #> # ℹ 661 more rows # correlation grouped by location epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = geo_value ) #> # A tibble: 6 × 2 #> geo_value cor #> #> 1 ca 0.573 #> 2 fl 0.488 #> 3 ga 0.465 #> 4 ny 0.285 #> 5 pa 0.708 #> 6 tx 0.750 # correlation grouped by location and incorporates lagged cases rates epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = geo_value, dt1 = -2 ) #> # A tibble: 6 × 2 #> geo_value cor #> #> 1 ca 0.618 #> 2 fl 0.576 #> 3 ga 0.525 #> 4 ny 0.337 #> 5 pa 0.734 #> 6 tx 0.784"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"epi_df object — epi_df","title":"epi_df object — epi_df","text":"epi_df tibble certain minimal column structure metadata. can seen snapshot data set contains --date values signal variables interest, given time.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"epi_df object — epi_df","text":"epi_df tibble (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. columns can considered measured variables, also refer signal variables. epi_df object also metadata (least) following fields: geo_type: type geo values. time_type: type time values. as_of: time value given data available. Metadata epi_df object x can accessed (altered) via attributes(x)$metadata. first two fields list, geo_type time_type, can usually inferred geo_value time_value columns, respectively. currently used downstream functions epiprocess package, serve useful bits information convey data set hand. information coding given . last field list, as_of, one unique aspects epi_df object. brief, can think epi_df object single snapshot data set contains --date values signals variables, time specified as_of field. companion object epi_archive object, contains full version history given data set. Revisions common many types epidemiological data streams, paying attention data revisions can important sorts downstream data analysis modeling tasks. See documentation epi_archive details data versioning works epiprocess package (including generate epi_df objects, data snapshots, epi_archive object).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":"geo-types","dir":"Reference","previous_headings":"","what":"Geo Types","title":"epi_df object — epi_df","text":"following geo types recognized epi_df. \"county\": observation corresponds U.S. county; coded 5-digit FIPS code. \"hrr\": observation corresponds U.S. hospital referral region (designed represent regional healthcare markets); 306 HRRs U.S; coded number (nonconsecutive, 1 457). \"state\": observation corresponds U.S. state; coded 2-digit postal abbreviation (lowercase); note Puerto Rico \"pr\" Washington D.C. \"dc\". \"hhs\": observation corresponds U.S. HHS region; coded number (consecutive, 1 10). \"nation\": observation corresponds country; coded ISO 31661- alpha-2 country codes (lowercase). unrecognizable geo type labeled \"custom\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":"time-types","dir":"Reference","previous_headings":"","what":"Time Types","title":"epi_df object — epi_df","text":"following time types recognized epi_df. \"day-time\": observation corresponds time given day (measured second); coded POSIXct object, .POSIXct(\"2022-01-31 18:45:40\"). \"day\": observation corresponds day; coded Date object, .Date(\"2022-01-31\"). \"week\": observation corresponds week; alignment can arbitrary (whether week starts Monday, Tuesday); coded Date object, representing start date week. \"yearweek\": observation corresponds week; alignment can arbitrary; coded tsibble::yearweek object, alignment stored week_start field attributes. \"yearmonth\": observation corresponds month; coded tsibble::yearmonth object. \"yearquarter\": observation corresponds quarter; coded tsibble::yearquarter object. \"year\": observation corresponds year; coded integer greater equal 1582. unrecognizable time type labeled \"custom\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":null,"dir":"Reference","previous_headings":"","what":"Slide a function over variables in an epi_df object — epi_slide","title":"Slide a function over variables in an epi_df object — epi_slide","text":"Slides given function variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Slide a function over variables in an epi_df object — epi_slide","text":"","code":"epi_slide( x, f, ..., before, after, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Slide a function over variables in an epi_df object — epi_slide","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. f Function, formula, missing; together ... specifies computation slide. \"slide\" means apply computation within sliding (.k.. \"rolling\") time window data group. window determined parameters described . One time step typically one day one week; see details explanation. function, f must take data frame column names original object, minus grouping variables, containing time window data one group-ref_time_value combination; followed one-row tibble containing values grouping variables associated group; followed number named arguments. formula, f can operate directly columns accessed via .x$var .$var, ~mean(.x$var) compute mean column var ref_time_value-group combination. group key can accessed via .y. f missing, ... specify computation. ... Additional arguments pass function formula specified via f. Alternatively, f missing, ... interpreted expression tidy evaluation; addition referring columns directly name, expression access .data .env pronouns dplyr verbs, can also refer .x, .group_key, .ref_time_value. See details. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name String indicating name new column contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column name overwrite column. as_list_col slide results held list column, unchopped/unnested? Default FALSE, case list object returned f unnested (using tidyr::unnest()), , slide computations output data frames, names resulting columns given prepending new_col_name names list elements. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Slide a function over variables in an epi_df object — epi_slide","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Slide a function over variables in an epi_df object — epi_slide","text":"\"slide\" means apply function formula rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date(\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide() still attempts perform computation anyway (require complete window). issue partial computations (run incomplete windows) therefore left user, either specified function formula f, post-processing. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.) f missing, expression tidy evaluation can specified, example, : equivalent : Thus, clear, computation specified via expression tidy evaluation (first example, ), name new column inferred given expression overrides name passed explicitly new_col_name argument.","code":"epi_slide(x, cases_7dav = mean(cases), before = 6) epi_slide(x, function(x, g) mean(x$cases), before = 6, new_col_name = \"cases_7dav\")"},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Slide a function over variables in an epi_df object — epi_slide","text":"","code":"# slide a 7-day trailing average formula on cases # Simple sliding means and sums are much faster to do using # the `epi_slide_mean` and `epi_slide_sum` functions instead. jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6 #> 2 ca 2020-03-02 4 5 #> 3 ca 2020-03-03 6 5.33 #> 4 ca 2020-03-04 11 6.75 #> 5 ca 2020-03-05 10 7.4 #> 6 ca 2020-03-06 18 9.17 #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day leading average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), after = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 11.6 #> 2 ca 2020-03-02 4 13.4 #> 3 ca 2020-03-03 6 16.1 #> 4 ca 2020-03-04 11 18.4 #> 5 ca 2020-03-05 10 20.4 #> 6 ca 2020-03-06 18 25.1 #> 7 ca 2020-03-07 26 30.1 #> 8 ca 2020-03-08 19 34.4 #> 9 ca 2020-03-09 23 37.3 #> 10 ca 2020-03-10 22 56.7 #> # ℹ 4,016 more rows # slide a 7-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 3, after = 3) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6.75 #> 2 ca 2020-03-02 4 7.4 #> 3 ca 2020-03-03 6 9.17 #> 4 ca 2020-03-04 11 11.6 #> 5 ca 2020-03-05 10 13.4 #> 6 ca 2020-03-06 18 16.1 #> 7 ca 2020-03-07 26 18.4 #> 8 ca 2020-03-08 19 20.4 #> 9 ca 2020-03-09 23 25.1 #> 10 ca 2020-03-10 22 30.1 #> # ℹ 4,016 more rows # slide a 14-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_14dav = mean(cases), before = 6, after = 7) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_14dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_14dav #> * #> 1 ca 2020-03-01 6 12.5 #> 2 ca 2020-03-02 4 13.7 #> 3 ca 2020-03-03 6 14.5 #> 4 ca 2020-03-04 11 15.5 #> 5 ca 2020-03-05 10 17.8 #> 6 ca 2020-03-06 18 20.5 #> 7 ca 2020-03-07 26 23 #> 8 ca 2020-03-08 19 25.4 #> 9 ca 2020-03-09 23 36.4 #> 10 ca 2020-03-10 22 42 #> # ℹ 4,016 more rows # nested new columns jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide( a = data.frame( cases_2dav = mean(cases), cases_2dma = mad(cases) ), before = 1, as_list_col = TRUE ) %>% ungroup() #> An `epi_df` object, 4,026 x 7 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 7 #> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av a #> * #> 1 ca 2020-03-01 6 1.29 0.00327 0 #> 2 ca 2020-03-02 4 1.71 0.00435 0 #> 3 ca 2020-03-03 6 2.43 0.00617 0 #> 4 ca 2020-03-04 11 3.86 0.00980 0.000363 #> 5 ca 2020-03-05 10 5.29 0.0134 0.000363 #> 6 ca 2020-03-06 18 7.86 0.0200 0.000363 #> 7 ca 2020-03-07 26 11.6 0.0294 0.000363 #> 8 ca 2020-03-08 19 13.4 0.0341 0.000363 #> 9 ca 2020-03-09 23 16.1 0.0410 0.000726 #> 10 ca 2020-03-10 22 18.4 0.0468 0.000726 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":null,"dir":"Reference","previous_headings":"","what":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"Slides n-timestep mean variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"","code":"epi_slide_mean( x, col_names, ..., before, after, ref_time_values, time_step, new_col_name = NULL, as_list_col = NULL, names_sep = NULL, all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. col_names unquoted column name(e.g., cases), multiple column names (e.g., c(cases, deaths)), tidy-select expression. Variable names can used positions data frame, expressions like x:y can used select range variables. desired column names stored vector vars, use col_names = all_of(vars). tidy-selection renaming interface supported, used provide output column names; want customize output column names, use dplyr::rename slide. ... Additional arguments pass data.table::frollmean, example, na.rm algo. data.table::frollmean automatically passed data x operate , window size n, alignment align. Providing args via ... cause error. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name Character vector indicating name(s) new column(s) contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column names overwrite columns. names_sep NULL, new_col_name must length col_names. as_list_col supported. Included match epi_slide interface. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"Wrapper around epi_slide_opt f = datatable::frollmean. \"slide\" means apply function rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date (\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide_*() fail; requires complete window perform computation. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.)","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"","code":"# slide a 7-day trailing average formula on cases jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, before = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day trailing average formula on cases. Adjust `frollmean` settings for speed # and accuracy, and to allow partially-missing windows. jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean( cases, before = 6, # `frollmean` options na.rm = TRUE, algo = \"exact\", hasNA = TRUE ) %>% dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6 #> 2 ca 2020-03-02 4 5 #> 3 ca 2020-03-03 6 5.33 #> 4 ca 2020-03-04 11 6.75 #> 5 ca 2020-03-05 10 7.4 #> 6 ca 2020-03-06 18 9.17 #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day leading average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, after = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 11.6 #> 2 ca 2020-03-02 4 13.4 #> 3 ca 2020-03-03 6 16.1 #> 4 ca 2020-03-04 11 18.4 #> 5 ca 2020-03-05 10 20.4 #> 6 ca 2020-03-06 18 25.1 #> 7 ca 2020-03-07 26 30.1 #> 8 ca 2020-03-08 19 34.4 #> 9 ca 2020-03-09 23 37.3 #> 10 ca 2020-03-10 22 56.7 #> # ℹ 4,016 more rows # slide a 7-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, before = 3, after = 3) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 11.6 #> 5 ca 2020-03-05 10 13.4 #> 6 ca 2020-03-06 18 16.1 #> 7 ca 2020-03-07 26 18.4 #> 8 ca 2020-03-08 19 20.4 #> 9 ca 2020-03-09 23 25.1 #> 10 ca 2020-03-10 22 30.1 #> # ℹ 4,016 more rows # slide a 14-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, before = 6, after = 7) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_14dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_14dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 23 #> 8 ca 2020-03-08 19 25.4 #> 9 ca 2020-03-09 23 36.4 #> 10 ca 2020-03-10 22 42 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":null,"dir":"Reference","previous_headings":"","what":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"Slides n-timestep data.table::froll slider::summary-slide function variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"","code":"epi_slide_opt( x, col_names, f, ..., before, after, ref_time_values, time_step, new_col_name = NULL, as_list_col = NULL, names_sep = NULL, all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. col_names unquoted column name(e.g., cases), multiple column names (e.g., c(cases, deaths)), tidy-select expression. Variable names can used positions data frame, expressions like x:y can used select range variables. desired column names stored vector vars, use col_names = all_of(vars). tidy-selection renaming interface supported, used provide output column names; want customize output column names, use dplyr::rename slide. f Function; together ... specifies computation slide. f must one data.table's rolling functions (frollmean, frollsum, frollapply. See data.table::roll) one slider's specialized sliding functions (slide_mean, slide_sum, etc. See slider::summary-slide). \"slide\" means apply computation within sliding (.k.. \"rolling\") time window data group. window determined parameters described . One time step typically one day one week; see details explanation. optimized data.table slider functions directly passed computation function epi_slide without careful handling make sure computation group made n dates rather n points. epi_slide_opt (wrapper functions epi_slide_mean epi_slide_sum) take care window completion automatically prevent associated errors. ... Additional arguments pass slide computation f, example, na.rm algo f data.table function. f data.table function, automatically passed data x operate , window size n, alignment align. Providing args via ... cause error. f slider function, automatically passed data x operate , number points use computation. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name Character vector indicating name(s) new column(s) contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column names overwrite columns. names_sep NULL, new_col_name must length col_names. as_list_col supported. Included match epi_slide interface. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"\"slide\" means apply function rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date (\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide_*() fail; requires complete window perform computation. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.)","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"","code":"# slide a 7-day trailing average formula on cases. This can also be done with `epi_slide_mean` jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = data.table::frollmean, before = 6 ) %>% # Remove a nonessential var. to ensure new col is printed, and rename new col dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day trailing average formula on cases. Adjust `frollmean` settings for speed # and accuracy, and to allow partially-missing windows. jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = data.table::frollmean, before = 6, # `frollmean` options na.rm = TRUE, algo = \"exact\", hasNA = TRUE ) %>% dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6 #> 2 ca 2020-03-02 4 5 #> 3 ca 2020-03-03 6 5.33 #> 4 ca 2020-03-04 11 6.75 #> 5 ca 2020-03-05 10 7.4 #> 6 ca 2020-03-06 18 9.17 #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day leading average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = slider::slide_mean, after = 6 ) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 11.6 #> 2 ca 2020-03-02 4 13.4 #> 3 ca 2020-03-03 6 16.1 #> 4 ca 2020-03-04 11 18.4 #> 5 ca 2020-03-05 10 20.4 #> 6 ca 2020-03-06 18 25.1 #> 7 ca 2020-03-07 26 30.1 #> 8 ca 2020-03-08 19 34.4 #> 9 ca 2020-03-09 23 37.3 #> 10 ca 2020-03-10 22 56.7 #> # ℹ 4,016 more rows # slide a 7-day centre-aligned sum. This can also be done with `epi_slide_sum` jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = data.table::frollsum, before = 3, after = 3 ) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 81 #> 5 ca 2020-03-05 10 94 #> 6 ca 2020-03-06 18 113 #> 7 ca 2020-03-07 26 129 #> 8 ca 2020-03-08 19 143 #> 9 ca 2020-03-09 23 176 #> 10 ca 2020-03-10 22 211 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":null,"dir":"Reference","previous_headings":"","what":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"Slides n-timestep sum variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"","code":"epi_slide_sum( x, col_names, ..., before, after, ref_time_values, time_step, new_col_name = NULL, as_list_col = NULL, names_sep = NULL, all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. col_names unquoted column name(e.g., cases), multiple column names (e.g., c(cases, deaths)), tidy-select expression. Variable names can used positions data frame, expressions like x:y can used select range variables. desired column names stored vector vars, use col_names = all_of(vars). tidy-selection renaming interface supported, used provide output column names; want customize output column names, use dplyr::rename slide. ... Additional arguments pass data.table::frollsum, example, na.rm algo. data.table::frollsum automatically passed data x operate , window size n, alignment align. Providing args via ... cause error. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name Character vector indicating name(s) new column(s) contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column names overwrite columns. names_sep NULL, new_col_name must length col_names. as_list_col supported. Included match epi_slide interface. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"Wrapper around epi_slide_opt f = datatable::frollsum. \"slide\" means apply function rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date (\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide_*() fail; requires complete window perform computation. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.)","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"","code":"# slide a 7-day trailing sum formula on cases jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_sum(cases, before = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dsum = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dsum #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 81 #> 8 ca 2020-03-08 19 94 #> 9 ca 2020-03-09 23 113 #> 10 ca 2020-03-10 22 129 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epiprocess.html","id":null,"dir":"Reference","previous_headings":"","what":"epiprocess: Tools for basic signal processing in epidemiology — epiprocess","title":"epiprocess: Tools for basic signal processing in epidemiology — epiprocess","text":"package introduces common data structure epidemiological data sets measured space time, offers associated utilities perform basic signal processing tasks.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epiprocess.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"epiprocess: Tools for basic signal processing in epidemiology — epiprocess","text":"Maintainer: Logan Brooks lcbrooks@andrew.cmu.edu Authors: Daniel McDonald Evan Ray Ryan Tibshirani contributors: Jacob Bien [contributor] Rafael Catoia [contributor] Nat DeFries [contributor] Rachel Lobay [contributor] Ken Mawer [contributor] Chloe [contributor] Quang Nguyen [contributor] Dmitry Shemetov [contributor] Lionel Henry (Author included rlang fragments) [contributor] Hadley Wickham (Author included rlang fragments) [contributor] Posit (Copyright holder included rlang fragments) [copyright holder]","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate a snapshot from an epi_archive object — epix_as_of","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"Generates snapshot epi_df format epi_archive object, given version. See archive vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"","code":"epix_as_of(x, max_version, min_time_value = -Inf, all_versions = FALSE)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"x epi_archive object max_version Time value specifying max version permit snapshot. , snapshot comprise unique rows current archive data represent --date signal values, specified max_version (whose time values least min_time_value.) min_time_value Time value specifying min time value permit snapshot. Default -Inf, effectively means minimum considered. all_versions all_versions = TRUE, output epi_archive format, contain rows specified time_value range version <= max_version. resulting object cover potentially narrower version time_value range x, depending user-provided arguments. Otherwise, one row output max_version time_value. Default FALSE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"epi_df object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"","code":"epix_as_of( archive_cases_dv_subset, max_version = max(archive_cases_dv_subset$DT$version) ) #> An `epi_df` object, 2,192 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2021-12-01 #> #> # A tibble: 2,192 × 4 #> geo_value time_value percent_cli case_rate_7d_av #> * #> 1 ca 2020-06-01 2.75 6.84 #> 2 ca 2020-06-02 2.57 6.82 #> 3 ca 2020-06-03 2.48 6.66 #> 4 ca 2020-06-04 2.41 6.98 #> 5 ca 2020-06-05 2.57 6.97 #> 6 ca 2020-06-06 2.63 6.66 #> 7 ca 2020-06-07 2.73 6.74 #> 8 ca 2020-06-08 3.04 6.67 #> 9 ca 2020-06-09 2.97 6.81 #> 10 ca 2020-06-10 2.99 7.13 #> # ℹ 2,182 more rows range(archive_cases_dv_subset$DT$version) # 2020-06-02 -- 2021-12-01 #> [1] \"2020-06-02\" \"2021-12-01\" epix_as_of(archive_cases_dv_subset, as.Date(\"2020-06-12\")) #> An `epi_df` object, 44 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2020-06-12 #> #> # A tibble: 44 × 4 #> geo_value time_value percent_cli case_rate_7d_av #> * #> 1 ca 2020-06-01 2.23 6.63 #> 2 ca 2020-06-02 2.06 6.45 #> 3 ca 2020-06-03 1.90 6.62 #> 4 ca 2020-06-04 1.79 6.64 #> 5 ca 2020-06-05 1.83 6.91 #> 6 ca 2020-06-06 1.86 6.76 #> 7 ca 2020-06-07 1.78 6.75 #> 8 ca 2020-06-08 1.90 6.90 #> 9 ca 2020-06-09 NA 7.02 #> 10 ca 2020-06-10 NA 7.36 #> # ℹ 34 more rows # --- Advanced: --- # When requesting recent versions of a data set, there can be some # reproducibility issues. For example, requesting data as of the current date # may return different values based on whether today's data is available yet # or not. Other factors include the time it takes between data becoming # available and when you download the data, and whether the data provider # will overwrite (\"clobber\") version data rather than just publishing new # versions. You can include information about these factors by setting the # `clobberable_versions_start` and `versions_end` of an `epi_archive`, in # which case you will get warnings about potential reproducibility issues: archive_cases_dv_subset2 <- as_epi_archive( archive_cases_dv_subset$DT, # Suppose last version with an update could potentially be rewritten # (a.k.a. \"hotfixed\", \"clobbered\", etc.): clobberable_versions_start = max(archive_cases_dv_subset$DT$version), # Suppose today is the following day, and there are no updates out yet: versions_end <- max(archive_cases_dv_subset$DT$version) + 1L, compactify = TRUE ) epix_as_of(archive_cases_dv_subset2, max(archive_cases_dv_subset$DT$version)) #> Warning: Getting data as of some recent version which could still be overwritten (under #> routine circumstances) without assigning a new version number (a.k.a. #> \"clobbered\"). Thus, the snapshot that we produce here should not be expected #> to be reproducible later. See `?epi_archive` for more info and `?epix_as_of` on #> how to muffle. #> An `epi_df` object, 2,192 x 4 with metadata: #> * geo_type = 2021-12-02 #> * time_type = day #> * as_of = 2021-12-01 #> #> # A tibble: 2,192 × 4 #> geo_value time_value percent_cli case_rate_7d_av #> * #> 1 ca 2020-06-01 2.75 6.84 #> 2 ca 2020-06-02 2.57 6.82 #> 3 ca 2020-06-03 2.48 6.66 #> 4 ca 2020-06-04 2.41 6.98 #> 5 ca 2020-06-05 2.57 6.97 #> 6 ca 2020-06-06 2.63 6.66 #> 7 ca 2020-06-07 2.73 6.74 #> 8 ca 2020-06-08 3.04 6.67 #> 9 ca 2020-06-09 2.97 6.81 #> 10 ca 2020-06-10 2.99 7.13 #> # ℹ 2,182 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":null,"dir":"Reference","previous_headings":"","what":"Fill epi_archive unobserved history — epix_fill_through_version","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"Sometimes, due upstream data pipeline issues, work version history completely date, functions expect archives completely date, equally --date another archive. function provides one way approach mismatches: pretend \"observed\" additional versions, filling versions NAs extrapolated values.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"","code":"epix_fill_through_version(x, fill_versions_end, how = c(\"na\", \"locf\"))"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"x epi_archive fill_versions_end Length-1, class&type x$version: version fill missing version history; result's $versions_end unless already later $versions_end. Optional; \"na\" \"locf\": \"na\" fill missing required version history NAs, inserting (necessary) update immediately current $versions_end revises existing measurements NA (supported version classes next_after implementation); \"locf\" fill missing version history last version observation carried forward (LOCF), leaving update $DT alone (epi_archive methods based LOCF). Default \"na\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":null,"dir":"Reference","previous_headings":"","what":"Merge two epi_archive objects — epix_merge","title":"Merge two epi_archive objects — epix_merge","text":"Merges two epi_archives share common geo_value, time_value, set key columns. also share common versions_end, using epix_as_of result using epix_as_of x y individually, performing full join DTs non-version key columns (potentially consolidating multiple warnings clobberable versions). versions_end values differ, sync parameter controls done.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Merge two epi_archive objects — epix_merge","text":"","code":"epix_merge( x, y, sync = c(\"forbid\", \"na\", \"locf\", \"truncate\"), compactify = TRUE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Merge two epi_archive objects — epix_merge","text":"x, y Two epi_archive objects join together. sync Optional; \"forbid\", \"na\", \"locf\", \"truncate\"; case x$versions_end match y$versions_end, ?: \"forbid\": emit error; \"na\": use max(x$versions_end, y$versions_end) result's versions_end, ensure , request snapshot version min(x$versions_end, y$versions_end), observation columns less --date archive NAs (.e., imagine update immediately versions_end revised observations NA); \"locf\": use max(x$versions_end, y$versions_end) result's versions_end, allowing last version observation carried forward extrapolate unavailable versions less --date input archive (.e., imagining less --date archive's data set remained unchanged actual versions_end archive's versions_end); \"truncate\": use min(x$versions_end, y$versions_end) result's versions_end, discard rows containing update rows later versions. compactify Optional; TRUE, FALSE, NULL; result compactified? See as_epi_archive() explanation means. Default TRUE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Merge two epi_archive objects — epix_merge","text":"resulting epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Merge two epi_archive objects — epix_merge","text":"cases, additional_metadata empty list, clobberable_versions_start set earliest version clobbered either input archive.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Merge two epi_archive objects — epix_merge","text":"","code":"# create two example epi_archive datasets x <- archive_cases_dv_subset$DT %>% dplyr::select(geo_value, time_value, version, case_rate_7d_av) %>% as_epi_archive(compactify = TRUE) y <- archive_cases_dv_subset$DT %>% dplyr::select(geo_value, time_value, version, percent_cli) %>% as_epi_archive(compactify = TRUE) # merge results stored in a third object: xy <- epix_merge(x, y)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":null,"dir":"Reference","previous_headings":"","what":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"Slides given function variables epi_archive object. behaves similarly epi_slide(), key exception version-aware: sliding computation given reference time t performed data available t. See archive vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"","code":"epix_slide( x, f, ..., before, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_versions = FALSE ) # S3 method for epi_archive epix_slide( x, f, ..., before, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_versions = FALSE ) # S3 method for grouped_epi_archive epix_slide( x, f, ..., before, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_versions = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"x epi_archive grouped_epi_archive object. ungrouped, data x treated part single data group. f Function, formula, missing; together ... specifies computation slide. \"slide\" means apply computation sliding (.k.. \"rolling\") time window data group. window determined parameter described . One time step typically one day one week; see epi_slide details explanation. function, f must take epi_df column names archive's DT, minus version column; followed one-row tibble containing values grouping variables associated group; followed reference time value, usually Date object; followed number named arguments. formula, f can operate directly columns accessed via .x$var .$var, ~ mean (.x$var) compute mean column var group-ref_time_value combination. group key can accessed via .y .group_key, reference time value can accessed via .z .ref_time_value. f missing, ... specify computation. ... Additional arguments pass function formula specified via f. Alternatively, f missing, ... interpreted expression tidy evaluation; addition referring columns directly name, expression access .data .env pronouns dplyr verbs, can also refer .group_key .ref_time_value. See details epi_slide. far ref_time_value sliding window extend? provided, single, non-NA, integer-compatible number time steps. window endpoint inclusive. example, = 7, one time step one day, produce value ref_time_value January 8, apply given function formula data (group present) time_values January 1 onward, reported January 8. typical disease surveillance sources, include data time_value January 8, , depending amount reporting latency, may include January 7 even earlier time_values. (instead archive hold nowcasts instead regular surveillance data, indeed expect data time_value January 8. hold forecasts, expect data time_values January 8, sliding window extend far ref_time_value needed include time_values.) ref_time_values Reference time values / versions sliding computations; element vector serves anchor point time_value window computation max_version epix_as_of fetch data window. missing, set regularly-spaced sequence values set cover range versions DT plus versions_end; spacing values guessed (using GCD skips values). time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take positive integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name String indicating name new column contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column name overwrite column. as_list_col slide results held list column, unchopped/unnested? Default FALSE, case list object returned f unnested (using tidyr::unnest()), , slide computations output data frames, names resulting columns given prepending new_col_name names list elements. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_versions (all_rows parameter epi_slide.) all_versions = TRUE, f passed version history (version <= ref_time_value) rows time_value ref_time_value - ref_time_value. Otherwise, f passed recent version every unique time_value. Default FALSE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"tibble whose columns : grouping variables, time_value, containing reference time values slide computation, column named according new_col_name argument, containing slide values.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"key distinctions current function epi_slide(): f functions epix_slide, one assume input data contain rows time_value matching computation's ref_time_value (accessible via attributes()$metadata$as_of); typical epidemiological surveillance data, observations pertaining particular time period (time_value) first reported as_of instant time period ended. epix_slide() accept argument; windows extend time steps given ref_time_value last time_value available version ref_time_value (typically, include ref_time_value , observations particular time interval (e.g., day) published time interval ends); epi_slide windows extend time steps ref_time_value time steps ref_time_value. input class columns similar different: epix_slide (default all_versions=FALSE) keeps columns epi_df-ness first argument computation; epi_slide provides grouping variables second input, convert first input regular tibble grouping variables include essential geo_value column. (all_versions=TRUE, epix_slide provide epi_archive rather epi-df computation.) output class columns similar different: epix_slide() returns tibble containing grouping variables, time_value, new column(s) slide computations, whereas epi_slide() returns epi_df original variables plus new columns slide computations. (mirror grouping ungroupedness input, one exception: epi_archives can trivial (zero-variable) groupings, dropped epix_slide results supported tibbles.) size stability checks element/row recycling maintain size stability epix_slide, unlike epi_slide. (epix_slide roughly analogous dplyr::group_modify, epi_slide roughly analogous dplyr::mutate followed dplyr::arrange) detailed \"advanced\" vignette. all_rows supported epix_slide; since slide computations allowed flexibility outputs epi_slide, guess good representation missing computations excluded group-ref_time_value pairs. ref_time_values default epix_slide based making evenly-spaced sequence versions DT plus versions_end, rather time_values. Apart distinctions, interfaces epix_slide() epi_slide() . Furthermore, current function can considerably slower epi_slide(), two reasons: (1) must repeatedly fetch properly-versioned snapshots data archive (via epix_as_of()), (2) performs \"manual\" sliding sorts, benefit highly efficient slider package. reason, never used place epi_slide(), used version-aware sliding necessary (purpose).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"","code":"library(dplyr) #> #> Attaching package: ‘dplyr’ #> The following objects are masked from ‘package:stats’: #> #> filter, lag #> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union # Reference time points for which we want to compute slide values: ref_time_values <- seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-15\"), by = \"1 day\" ) # A simple (but not very useful) example (see the archive vignette for a more # realistic one): archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( f = ~ mean(.x$case_rate_7d_av), before = 2, ref_time_values = ref_time_values, new_col_name = \"case_rate_7d_av_recent_av\" ) %>% ungroup() #> # A tibble: 57 × 3 #> geo_value time_value case_rate_7d_av_recent_av #> #> 1 NA 2020-06-01 NaN #> 2 ca 2020-06-02 6.63 #> 3 fl 2020-06-02 3.38 #> 4 ny 2020-06-02 6.57 #> 5 tx 2020-06-02 4.52 #> 6 ca 2020-06-03 6.54 #> 7 fl 2020-06-03 3.42 #> 8 ny 2020-06-03 6.66 #> 9 tx 2020-06-03 4.75 #> 10 ca 2020-06-04 6.53 #> # ℹ 47 more rows # We requested time windows that started 2 days before the corresponding time # values. The actual number of `time_value`s in each computation depends on # the reporting latency of the signal and `time_value` range covered by the # archive (2020-06-01 -- 2021-11-30 in this example). In this case, we have # * 0 `time_value`s, for ref time 2020-06-01 --> the result is automatically # discarded # * 1 `time_value`, for ref time 2020-06-02 # * 2 `time_value`s, for the rest of the results # * never the 3 `time_value`s we would get from `epi_slide`, since, because # of data latency, we'll never have an observation # `time_value == ref_time_value` as of `ref_time_value`. # The example below shows this type of behavior in more detail. # Examining characteristics of the data passed to each computation with # `all_versions=FALSE`. archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( function(x, gk, rtv) { tibble( time_range = if (nrow(x) == 0L) { \"0 `time_value`s\" } else { sprintf(\"%s -- %s\", min(x$time_value), max(x$time_value)) }, n = nrow(x), class1 = class(x)[[1L]] ) }, before = 5, all_versions = FALSE, ref_time_values = ref_time_values, names_sep = NULL ) %>% ungroup() %>% arrange(geo_value, time_value) #> # A tibble: 57 × 5 #> geo_value time_value time_range n class1 #> #> 1 ca 2020-06-02 2020-06-01 -- 2020-06-01 1 epi_df #> 2 ca 2020-06-03 2020-06-01 -- 2020-06-02 2 epi_df #> 3 ca 2020-06-04 2020-06-01 -- 2020-06-03 3 epi_df #> 4 ca 2020-06-05 2020-06-01 -- 2020-06-04 4 epi_df #> 5 ca 2020-06-06 2020-06-01 -- 2020-06-05 5 epi_df #> 6 ca 2020-06-07 2020-06-02 -- 2020-06-06 5 epi_df #> 7 ca 2020-06-08 2020-06-03 -- 2020-06-07 5 epi_df #> 8 ca 2020-06-09 2020-06-04 -- 2020-06-08 5 epi_df #> 9 ca 2020-06-10 2020-06-05 -- 2020-06-09 5 epi_df #> 10 ca 2020-06-11 2020-06-06 -- 2020-06-10 5 epi_df #> # ℹ 47 more rows # --- Advanced: --- # `epix_slide` with `all_versions=FALSE` (the default) applies a # version-unaware computation to several versions of the data. We can also # use `all_versions=TRUE` to apply a version-*aware* computation to several # versions of the data, again looking at characteristics of the data passed # to each computation. In this case, each computation should expect an # `epi_archive` containing the relevant version data: archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( function(x, gk, rtv) { tibble( versions_start = if (nrow(x$DT) == 0L) { \"NA (0 rows)\" } else { toString(min(x$DT$version)) }, versions_end = x$versions_end, time_range = if (nrow(x$DT) == 0L) { \"0 `time_value`s\" } else { sprintf(\"%s -- %s\", min(x$DT$time_value), max(x$DT$time_value)) }, n = nrow(x$DT), class1 = class(x)[[1L]] ) }, before = 5, all_versions = TRUE, ref_time_values = ref_time_values, names_sep = NULL ) %>% ungroup() %>% # Focus on one geo_value so we can better see the columns above: filter(geo_value == \"ca\") %>% select(-geo_value) #> # A tibble: 14 × 6 #> time_value versions_start versions_end time_range n class1 #> #> 1 2020-06-02 2020-06-02 2020-06-02 2020-06-01 -- 2020-06-01 1 epi_ar… #> 2 2020-06-03 2020-06-02 2020-06-03 2020-06-01 -- 2020-06-02 2 epi_ar… #> 3 2020-06-04 2020-06-02 2020-06-04 2020-06-01 -- 2020-06-03 3 epi_ar… #> 4 2020-06-05 2020-06-02 2020-06-05 2020-06-01 -- 2020-06-04 4 epi_ar… #> 5 2020-06-06 2020-06-02 2020-06-06 2020-06-01 -- 2020-06-05 8 epi_ar… #> 6 2020-06-07 2020-06-03 2020-06-07 2020-06-02 -- 2020-06-06 9 epi_ar… #> 7 2020-06-08 2020-06-04 2020-06-08 2020-06-03 -- 2020-06-07 9 epi_ar… #> 8 2020-06-09 2020-06-05 2020-06-09 2020-06-04 -- 2020-06-08 8 epi_ar… #> 9 2020-06-10 2020-06-06 2020-06-10 2020-06-05 -- 2020-06-09 8 epi_ar… #> 10 2020-06-11 2020-06-07 2020-06-11 2020-06-06 -- 2020-06-10 8 epi_ar… #> 11 2020-06-12 2020-06-08 2020-06-12 2020-06-07 -- 2020-06-11 8 epi_ar… #> 12 2020-06-13 2020-06-09 2020-06-13 2020-06-08 -- 2020-06-12 8 epi_ar… #> 13 2020-06-14 2020-06-10 2020-06-14 2020-06-09 -- 2020-06-13 8 epi_ar… #> 14 2020-06-15 2020-06-11 2020-06-15 2020-06-10 -- 2020-06-14 8 epi_ar…"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":null,"dir":"Reference","previous_headings":"","what":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"Generates filtered epi_archive epi_archive object, keeping rows version falling specified date.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"","code":"epix_truncate_versions_after(x, max_version) # S3 method for epi_archive epix_truncate_versions_after(x, max_version) # S3 method for grouped_epi_archive epix_truncate_versions_after(x, max_version)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"x epi_archive object. max_version latest version include archive.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"epi_archive object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":null,"dir":"Reference","previous_headings":"","what":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"group_by related methods epi_archive, grouped_epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"","code":"# S3 method for epi_archive group_by(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data)) # S3 method for grouped_epi_archive group_by(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data)) # S3 method for grouped_epi_archive group_by_drop_default(.tbl) # S3 method for grouped_epi_archive groups(x) # S3 method for grouped_epi_archive ungroup(x, ...) is_grouped_epi_archive(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":".data epi_archive grouped_epi_archive ... Similar dplyr::group_by (see \"Details:\" edge cases); group_by: unquoted variable name(s) \"data masking\" expression(s). possible use dplyr::mutate-like syntax calculate new columns perform grouping, note , regrouping already-grouped .data object, calculations carried ignoring grouping (dplyr). ungroup: either empty, order remove grouping output epi_archive; variable name(s) \"tidy-select\" expression(s), order remove matching variables list grouping variables, output another grouped_epi_archive. .add Boolean. FALSE, default, output grouped variable selection ... ; TRUE, output grouped current grouping variables plus variable selection .... .drop described dplyr::group_by; determines treatment factor columns. .tbl grouped_epi_archive object. x groups ungroup: grouped_epi_archive; is_grouped_epi_archive: object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"match dplyr, group_by allows \"data masking\" (also referred \"tidy evaluation\") expressions ..., just column names, way similar mutate. Note replacing removing key columns expressions disabled. archive %>% group_by() expressions group regroup zero columns (indicating rows treated part one large group) output grouped_epi_archive, order enable use grouped_epi_archive methods result. slight contrast operations tibbles grouped tibbles, output grouped_df circumstances. Using group_by .add=FALSE override existing grouping disabled; instead, ungroup first group_by. group_by_drop_default (ungrouped) epi_archives expected dispatch group_by_drop_default.default (dedicated method grouped_epi_archives).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"","code":"grouped_archive <- archive_cases_dv_subset %>% group_by(geo_value) # `print` for metadata and method listing: grouped_archive %>% print() #> A `grouped_epi_archive` object: #> * Groups: geo_value #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Min/max time values: 2020-06-01 / 2021-11-30 #> ℹ First/last version with update: 2020-06-02 / 2021-12-01 #> ℹ Versions end: 2021-12-01 #> ℹ A preview of the table (129638 rows x 5 columns): #> Key: #> geo_value time_value version percent_cli case_rate_7d_av #> #> 1: ca 2020-06-01 2020-06-02 NA 6.628329 #> 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 #> 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 #> 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 #> 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 #> --- #> 129634: tx 2021-11-26 2021-11-29 1.858596 7.957657 #> 129635: tx 2021-11-27 2021-11-28 NA 7.174299 #> 129636: tx 2021-11-28 2021-11-29 NA 6.834681 #> 129637: tx 2021-11-29 2021-11-30 NA 8.841247 #> 129638: tx 2021-11-30 2021-12-01 NA 9.566218 # The primary use for grouping is to perform a grouped `epix_slide`: archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( f = ~ mean(.x$case_rate_7d_av), before = 2, ref_time_values = as.Date(\"2020-06-11\") + 0:2, new_col_name = \"case_rate_3d_av\" ) %>% ungroup() #> # A tibble: 12 × 3 #> geo_value time_value case_rate_3d_av #> #> 1 ca 2020-06-11 7.19 #> 2 fl 2020-06-11 5.71 #> 3 ny 2020-06-11 4.59 #> 4 tx 2020-06-11 5.62 #> 5 ca 2020-06-12 7.52 #> 6 fl 2020-06-12 5.82 #> 7 ny 2020-06-12 4.34 #> 8 tx 2020-06-12 5.91 #> 9 ca 2020-06-13 7.62 #> 10 fl 2020-06-13 6.11 #> 11 ny 2020-06-13 4.14 #> 12 tx 2020-06-13 6.03 # ----------------------------------------------------------------- # Advanced: some other features of dplyr grouping are implemented: library(dplyr) toy_archive <- tribble( ~geo_value, ~age_group, ~time_value, ~version, ~value, \"us\", \"adult\", \"2000-01-01\", \"2000-01-02\", 121, \"us\", \"pediatric\", \"2000-01-02\", \"2000-01-03\", 5, # (addition) \"us\", \"adult\", \"2000-01-01\", \"2000-01-03\", 125, # (revision) \"us\", \"adult\", \"2000-01-02\", \"2000-01-03\", 130 # (addition) ) %>% mutate( age_group = ordered(age_group, c(\"pediatric\", \"adult\")), time_value = as.Date(time_value), version = as.Date(version) ) %>% as_epi_archive(other_keys = \"age_group\") # The following are equivalent: toy_archive %>% group_by(geo_value, age_group) #> A `grouped_epi_archive` object: #> * Groups: geo_value, age_group #> * Drops groups formed by factor levels that don't appear in the data #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 toy_archive %>% group_by(geo_value) %>% group_by(age_group, .add = TRUE) #> A `grouped_epi_archive` object: #> * Groups: geo_value, age_group #> * Drops groups formed by factor levels that don't appear in the data #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 grouping_cols <- c(\"geo_value\", \"age_group\") toy_archive %>% group_by(across(all_of(grouping_cols))) #> A `grouped_epi_archive` object: #> * Groups: geo_value, age_group #> * Drops groups formed by factor levels that don't appear in the data #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 # And these are equivalent: toy_archive %>% group_by(geo_value) #> A `grouped_epi_archive` object: #> * Groups: geo_value #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 toy_archive %>% group_by(geo_value, age_group) %>% ungroup(age_group) #> A `grouped_epi_archive` object: #> * Groups: geo_value #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 # To get the grouping variable names as a `list` of `name`s (a.k.a. symbols): toy_archive %>% group_by(geo_value) %>% groups() #> [[1]] #> geo_value #> toy_archive %>% group_by(geo_value, age_group, .drop = FALSE) %>% epix_slide(f = ~ sum(.x$value), before = 20) %>% ungroup() #> # A tibble: 4 × 4 #> geo_value age_group time_value slide_value #> #> 1 us pediatric 2000-01-02 0 #> 2 us adult 2000-01-02 121 #> 3 us pediatric 2000-01-03 5 #> 4 us adult 2000-01-03 255"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate growth rate — growth_rate","title":"Estimate growth rate — growth_rate","text":"Estimates growth rate signal given points along underlying sequence. Several methodologies available; see growth rate vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate growth rate — growth_rate","text":"","code":"growth_rate( x = seq_along(y), y, x0 = x, method = c(\"rel_change\", \"linear_reg\", \"smooth_spline\", \"trend_filter\"), h = 7, log_scale = FALSE, dup_rm = FALSE, na_rm = FALSE, ... )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate growth rate — growth_rate","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. x0 Points estimate growth rate. Must subset x (extrapolation allowed). Default x. method Either \"rel_change\", \"linear_reg\", \"smooth_spline\", \"trend_filter\", indicating method use growth rate calculation. first two local methods: run sliding fashion sequence (order estimate derivatives hence growth rates); latter two global methods: run entire sequence. See details explanation. h Bandwidth sliding window, method \"rel_change\" \"linear_reg\". See details explanation. log_scale growth rates estimated using parametrization log scale? See details explanation. Default FALSE. dup_rm check remove duplicates x (corresponding elements y) computation? methods might handle duplicate x values gracefully, whereas others might fail (either quietly loudly). Default FALSE. na_rm missing values removed computation? Default FALSE. ... Additional arguments pass method used estimate derivative.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate growth rate — growth_rate","text":"Vector growth rate estimates specified points x0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Estimate growth rate — growth_rate","text":"growth rate function f defined continuously-valued parameter t defined f'(t) / f(t), f'(t) derivative f t. estimate growth rate signal discrete-time (can thought evaluations discretizations underlying function continuous-time), can therefore estimate derivative divide signal value (possibly smoothed version signal value). following methods available estimating growth rate: \"rel_change\": uses (B/- 1) / h, B average y second half sliding window bandwidth h centered reference point x0, average first half. can seen using first-difference approximation derivative. \"linear_reg\": uses slope linear regression y x sliding window centered reference point x0, divided fitted value linear regression x0. \"smooth_spline\": uses estimated derivative x0 smoothing spline fit x y, via stats::smooth.spline(), divided fitted value spline x0. \"trend_filter\": uses estimated derivative x0 polynomial trend filtering (discrete spline) fit x y, via genlasso::trendfilter(), divided fitted value discrete spline x0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"log-scale","dir":"Reference","previous_headings":"","what":"Log Scale","title":"Estimate growth rate — growth_rate","text":"alternative view growth rate function f general given defining g(t) = log(f(t)), observing g'(t) = f'(t) / f(t). Therefore, method estimates derivative can simply applied log signal interest, light, method (\"rel_change\", \"linear_reg\", \"smooth_spline\", \"trend_filter\") log scale analog, can used setting log_scale = TRUE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"sliding-windows","dir":"Reference","previous_headings":"","what":"Sliding Windows","title":"Estimate growth rate — growth_rate","text":"local methods, \"rel_change\" \"linear_reg\", use sliding window centered reference point bandiwidth h. words, sliding window consists points x whose distance reference point h. Note unit distance implicitly defined x variable; example, x vector Date objects, h = 7, reference point January 7, sliding window contains data January 1 14 (matching behavior epi_slide() = h - 1 = h).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"additional-arguments","dir":"Reference","previous_headings":"","what":"Additional Arguments","title":"Estimate growth rate — growth_rate","text":"global methods, \"smooth_spline\" \"trend_filter\", additional arguments can specified via ... underlying estimation function. smoothing spline case, additional arguments passed directly stats::smooth.spline() (defaults exactly function). trend filtering case works bit differently: , custom set arguments allowed (distributed internally genlasso::trendfilter() genlasso::cv.trendfilter()): ord: order piecewise polynomial trend filtering fit. Default 3. maxsteps: maximum number steps take solution path terminating. Default 1000. cv: cross-validation used choose effective degrees freedom fit? Default TRUE. k: number folds cross-validation used. Default 3. df: desired effective degrees freedom trend filtering fit. cv = FALSE, df must positive integer; cv = TRUE, df must one \"min\" \"1se\" indicating selection rule use based cross-validation error curve: minimum 1-standard-error rule, respectively. Default \"min\" (going along default cv = TRUE). Note cv = FALSE, require df set user.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate growth rate — growth_rate","text":"","code":"# COVID cases growth rate by state using default method relative change jhu_csse_daily_subset %>% group_by(geo_value) %>% mutate(cases_gr = growth_rate(x = time_value, y = cases)) #> An `epi_df` object, 4,026 x 7 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 7 #> # Groups: geo_value [6] #> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av #> * #> 1 ca 2020-03-01 6 1.29 0.00327 0 #> 2 ca 2020-03-02 4 1.71 0.00435 0 #> 3 ca 2020-03-03 6 2.43 0.00617 0 #> 4 ca 2020-03-04 11 3.86 0.00980 0.000363 #> 5 ca 2020-03-05 10 5.29 0.0134 0.000363 #> 6 ca 2020-03-06 18 7.86 0.0200 0.000363 #> 7 ca 2020-03-07 26 11.6 0.0294 0.000363 #> 8 ca 2020-03-08 19 13.4 0.0341 0.000363 #> 9 ca 2020-03-09 23 16.1 0.0410 0.000726 #> 10 ca 2020-03-10 22 18.4 0.0468 0.000726 #> # ℹ 4,016 more rows #> # ℹ 1 more variable: cases_gr # Log scale, degree 4 polynomial and 6-fold cross validation jhu_csse_daily_subset %>% group_by(geo_value) %>% mutate(gr_poly = growth_rate(x = time_value, y = cases, log_scale = TRUE, ord = 4, k = 6)) #> Warning: There were 3 warnings in `mutate()`. #> The first warning was: #> ℹ In argument: `gr_poly = growth_rate(...)`. #> ℹ In group 1: `geo_value = \"ca\"`. #> Caused by warning in `log()`: #> ! NaNs produced #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings. #> An `epi_df` object, 4,026 x 7 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 7 #> # Groups: geo_value [6] #> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av #> * #> 1 ca 2020-03-01 6 1.29 0.00327 0 #> 2 ca 2020-03-02 4 1.71 0.00435 0 #> 3 ca 2020-03-03 6 2.43 0.00617 0 #> 4 ca 2020-03-04 11 3.86 0.00980 0.000363 #> 5 ca 2020-03-05 10 5.29 0.0134 0.000363 #> 6 ca 2020-03-06 18 7.86 0.0200 0.000363 #> 7 ca 2020-03-07 26 11.6 0.0294 0.000363 #> 8 ca 2020-03-08 19 13.4 0.0341 0.000363 #> 9 ca 2020-03-09 23 16.1 0.0410 0.000726 #> 10 ca 2020-03-10 22 18.4 0.0468 0.000726 #> # ℹ 4,016 more rows #> # ℹ 1 more variable: gr_poly "},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":null,"dir":"Reference","previous_headings":"","what":"Use max valid period as guess for period of ref_time_values — guess_period","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":"Use max valid period guess period ref_time_values","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":"","code":"guess_period( ref_time_values, ref_time_values_arg = rlang::caller_arg(ref_time_values) )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":"ref_time_values Vector containing time-interval-like time-like data, least two distinct values, diff-able (e.g., time_value version column), sensible result adding .numeric versions diff result (via .integer typeof \"integer\", otherwise via .numeric). ref_time_values_arg Optional, string; name give ref_time_values error messages. Defaults quoting expression caller fed ref_time_values argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":".numeric, length 1; attempts match typeof(ref_time_values)","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"data source confirmed COVID-19 cases based reports made available Center Systems Science Engineering Johns Hopkins University. example data snapshot Oct 28, 2021 captures cases June 1, 2020 May 31, 2021 limited California Florida.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"","code":"incidence_num_outlier_example"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"tibble 730 rows 3 variables: geo_value geographic value associated row measurements. time_value time value associated row measurements. cases Number new confirmed COVID-19 cases, daily","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes. Furthermore, data limited small number rows, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Test for epi_df format — is_epi_df","title":"Test for epi_df format — is_epi_df","text":"Test epi_df format","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test for epi_df format — is_epi_df","text":"","code":"is_epi_df(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Test for epi_df format — is_epi_df","text":"x object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Test for epi_df format — is_epi_df","text":"TRUE object inherits epi_df.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"data source confirmed COVID-19 cases deaths based reports made available Center Systems Science Engineering Johns Hopkins University. example data ranges Mar 1, 2020 Dec 31, 2021, limited Massachusetts Vermont.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"","code":"jhu_csse_county_level_subset"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"tibble 16,212 rows 5 variables: geo_value geographic value associated row measurements. time_value time value associated row measurements. cases Number new confirmed COVID-19 cases, daily county_name name county state_name full name state","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes. 7-day average signals computed Delphi calculating moving averages preceding 7 days, signal June 7 average underlying data June 1 7, inclusive. Furthermore, data limited small number rows, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"data source confirmed COVID-19 cases deaths based reports made available Center Systems Science Engineering Johns Hopkins University. example data ranges Mar 1, 2020 Dec 31, 2021, limited California, Florida, Texas, New York, Georgia, Pennsylvania.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"","code":"jhu_csse_daily_subset"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"tibble 4026 rows 6 variables: geo_value geographic value associated row measurements. time_value time value associated row measurements. case_rate_7d_av 7-day average signal number new confirmed COVID-19 cases per 100,000 population, daily death_rate_7d_av 7-day average signal number new confirmed deaths due COVID-19 per 100,000 population, daily cases Number new confirmed COVID-19 cases, daily cases_7d_av 7-day average signal number new confirmed COVID-19 cases, daily","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Epidata API: case signal taken directly JHU CSSE COVID-19 GitHub repository. rate signals computed Delphi using Census population data. 7-day average signals computed Delphi calculating moving averages preceding 7 days, signal June 7 average underlying data June 1 7, inclusive. Furthermore, data limited small number rows, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Grab any keys associated to an epi_df — key_colnames","title":"Grab any keys associated to an epi_df — key_colnames","text":"Grab keys associated epi_df","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Grab any keys associated to an epi_df — key_colnames","text":"","code":"key_colnames(x, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Grab any keys associated to an epi_df — key_colnames","text":"x data.frame, tibble, epi_df ... additional arguments passed methods","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Grab any keys associated to an epi_df — key_colnames","text":"epi_df, returns \"keys\". Otherwise NULL","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":null,"dir":"Reference","previous_headings":"","what":"max(x$version), with error if x has 0 rows — max_version_with_row_in","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"Exported make defaults easily copyable.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"","code":"max_version_with_row_in(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"x x argument as_epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"max(x$version) rows; raises error 0 rows NA version value","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates an epi_df object — new_epi_df","title":"Creates an epi_df object — new_epi_df","text":"Creates new epi_df object. default, builds empty tibble correct metadata epi_df object (ie. geo_type, time_type, as_of). Refer info. arguments details.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates an epi_df object — new_epi_df","text":"","code":"new_epi_df( x = tibble::tibble(), geo_type, time_type, as_of, additional_metadata = list(), ... )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates an epi_df object — new_epi_df","text":"x data.frame, tibble::tibble, tsibble::tsibble converted geo_type Type geo values. missing, function attempt infer geo values present; fails, set \"custom\". time_type Type time values. missing, function attempt infer time values present; fails, set \"custom\". as_of Time value representing time given data available. example, as_of January 31, 2022, epi_df object created represent --date version data available January 31, 2022. as_of argument missing, current day-time used. additional_metadata List additional metadata attach epi_df object. metadata geo_type, time_type, as_of fields; named entries passed list included well. tibble additional keys, sure specify character vector other_keys component additional_metadata. ... Additional arguments passed methods.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Creates an epi_df object — new_epi_df","text":"epi_df object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the next possible value greater than x of the same type — next_after","title":"Get the next possible value greater than x of the same type — next_after","text":"Get next possible value greater x type","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the next possible value greater than x of the same type — next_after","text":"","code":"next_after(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the next possible value greater than x of the same type — next_after","text":"x starting \"value\"(s)","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the next possible value greater than x of the same type — next_after","text":"class, typeof, length x","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe operator — %>%","title":"Pipe operator — %>%","text":"See magrittr::%>% details.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/pipe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pipe operator — %>%","text":"","code":"lhs %>% rhs"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_archive.html","id":null,"dir":"Reference","previous_headings":"","what":"Print information about an epi_archive object — print.epi_archive","title":"Print information about an epi_archive object — print.epi_archive","text":"Print information epi_archive object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_archive.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print information about an epi_archive object — print.epi_archive","text":"","code":"# S3 method for epi_archive print(x, ..., class = TRUE, methods = TRUE)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_archive.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print information about an epi_archive object — print.epi_archive","text":"x epi_archive object. ... empty, satisfy S3 generic. class Boolean; whether print class label header methods Boolean; whether print available methods archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Base S3 methods for an epi_df object — print.epi_df","title":"Base S3 methods for an epi_df object — print.epi_df","text":"Print summary functions epi_df object. Prints variety summary statistics epi_df object, time range included geographic coverage.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Base S3 methods for an epi_df object — print.epi_df","text":"","code":"# S3 method for epi_df print(x, ...) # S3 method for epi_df summary(object, ...) # S3 method for epi_df group_by(.data, ...) # S3 method for epi_df ungroup(x, ...) # S3 method for epi_df group_modify(.data, .f, ..., .keep = FALSE) # S3 method for epi_df unnest(data, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Base S3 methods for an epi_df object — print.epi_df","text":"x epi_df ... Additional arguments, compatibility summary(). Currently unused. object epi_df .data epi_df .f function formula; see dplyr::group_modify .keep Boolean; see dplyr::group_modify data epi_df","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. dplyr arrange, filter, group_by, group_modify, mutate, relocate, rename, slice, ungroup ggplot2 autoplot tidyr unnest tsibble as_tsibble","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-8","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.8","text":"epi_slide computations now 2-4 times faster changing reference time values, made accessible within sliding functions, calculated (#397). Add new epi_slide_mean function allow much (~30x) faster rolling average computations cases (#400). Add new epi_slide_sum function allow much faster rolling sum computations cases (#433). Add new epi_slide_opt function allow much faster rolling computations cases, using data.table slider optimized rolling functions (#433). Add tidyselect interface epi_slide_opt derivatives (#452). regenerated jhu_csse_daily_subset dataset latest versions data API changed approach versioning, see DEVELOPMENT.md details select grouped epi_dfs now drops epi_dfness makes sense; PR #390 Minor documentation updates; PR #393 Improved epi_archive print method. Compactified metadata shows snippet underlying DT (#341). Added autoplot method epi_df objects, creates ggplot2 plot epi_df (#382). Refactored internals use cli warnings/errors checkmate argument checking (#413). Fix logic auto-assign epi_df time_type week (#416) year (#441). Clarified “Get started” example getting Ebola line list data epi_df format. Improved documentation web site landing page’s introduction.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-7-0","dir":"Changelog","previous_headings":"","what":"Breaking changes:","title":"epiprocess 0.7.0","text":"Switched epi_df’s other_keys default NULL character(0); PR #390 Refactored epi_archive use S3 instead R6 object model. functionality stay , break member function interface. migration, can usually just convert epi_archive$merge(...) epi_archive <- epi_archive %>% epix_merge(...) (fill_through_version truncate_after_version) epi_archive$slide(...) epi_archive %>% epix_slide(...) (as_of, group_by, slide, etc.) (#340). limited situations, helper function calls epi_archive$merge etc. one arguments, may need carefully refactor .","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-7-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.7.0","text":"Updated vignettes compatibility epidatr 1.0.0 PR #377.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-7-0-1","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.7.0","text":"make existing slide computations work, add third argument f function accept new input: e.g., change f = function(x, g, ) { } f = function(x, g, rt, ) { }.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-7-0","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.7.0","text":"f formula, can now access reference time value via .z .ref_time_value. f missing, tidy evaluation expression ... can now refer window data epi_df tibble .x, group key .group_key, reference time value .ref_time_value. usual .data .env pronouns also work, butpick() cur_data() ; work .x instead. keep old behavior, manually perform row recycling within f computations, /left_join data frame representing desired output structure current epix_slide() result obtain desired repetitions completions expected all_rows = TRUE. keep old behavior, convert output epix_slide() epi_df desired set metadata appropriately.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-7-0-1","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.7.0","text":"epi_slide epix_slide now support as_list_col = TRUE slide computations output atomic vectors, output list column “chopped” format (see tidyr::chop). epi_slide now works properly slide computations output just Date vector, rather converting slide_value numeric column. Fix ?archive_cases_dv_subset information regarding modifications upstream data @brookslogan (#299). Update use updated epidatr (fetch_tbl -> fetch) @brookslogan (#319).","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-6-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.6.0","text":"epi_slide’s time windows now extend time steps time steps corresponding ref_time_values. See ?epi_slide details matching old alignments. epix_slide’s time windows now extend time steps corresponding ref_time_values way latest data available corresponding ref_time_values. obtain old behavior, dplyr::ungroup slide results immediately. using as_list_col = TRUE together ref_time_values all_rows=TRUE, marker excluded computations now NULL entry list column, rather NA; using tidyr::unnest() afterward want keep missing data markers, need replace NULL entries NAs. Skipped computations now uniformly detectable using vctrs methods. x %>% epix_slide(, group_by=c(col1, col2)) x %>% epix_slide(, group_by=all_of(colname_vector)) x %>% group_by(col1, col2) %>% epix_slide() x %>% group_by(across(all_of(colname_vector))) %>% epix_slide() obtain old behavior, precede epix_slide call lacking group_by argument appropriate group_by call. epix_slide now guesses ref_time_values regularly spaced sequence covering DT$version values version_end, rather distinct DT$time_values. obtain old behavior, pass ref_time_values = unique($DT$time_value). epi_archive’s clobberable_versions_start’s default now NA, warnings default potential nonreproducibility. obtain old behavior, pass clobberable_versions_start = max_version_with_row_in(x).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"potentially-breaking-changes-0-6-0","dir":"Changelog","previous_headings":"","what":"Potentially-breaking changes","title":"epiprocess 0.6.0","text":"Fixed [ grouped epi_dfs maintain grouping possible dropping epi_df class (e.g., removing time_value column). Fixed epi_df operations consistent decaying non-epi_dfs result operation doesn’t make sense epi_df (e.g., removing time_value column). Changed bind_rows grouped epi_dfs drop epi_df class. Like ungrouped epi_dfs, metadata result still simply taken first result, may inappropriate (#242). epi_slide epix_slide now raise error rather silently filtering ref_time_values don’t meet expectations.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-6-0","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.6.0","text":"epix_slide, $slide new parameter all_versions. all_versions=TRUE, epix_slide pass filtered epi_archive computation rather epi_df snapshot. enables, e.g., performing pseudoprospective forecasts revision-aware forecaster using nested epix_slide operations.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-6-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.6.0","text":"Added dplyr::group_by dplyr::ungroup S3 methods epi_archive objects, plus corresponding $group_by $ungroup R6 methods. group_by implementation supports .add .drop arguments, ungroup supports partial ungrouping .... as_epi_archive, epi_archive$new now perform checks key uniqueness requirement (part #154).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-6-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.6.0","text":"Added NEWS.md file track changes package. Implemented ?dplyr::dplyr_extending epi_dfs (#223). Fixed various small documentation issues (#217).","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"potentially-breaking-changes-0-5-0","dir":"Changelog","previous_headings":"","what":"Potentially-breaking changes","title":"epiprocess 0.5.0","text":"epix_slide, $slide now feed f epi_df rather converting tibble/tbl_df first, allowing use epi_df methods metadata, often yielding epi_dfs slide result. obtain old behavior, convert tibble within f.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-5-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.5.0","text":"Fixed epix_merge, $merge always raising error sync=\"truncate\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-5-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.5.0","text":"Added Remotes: entry genlasso, removed CRAN. Added as_epi_archive tests. Added missing epix_merge test sync=\"truncate\".","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"potentially-breaking-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Potentially-breaking changes","title":"epiprocess 0.4.0","text":"Fixed [.epi_df reorder columns, incompatible downstream packages. Changed [.epi_df decay--tibble logic coherent epi_dfs current tolerance nonunique keys: stopped decaying tibble cases unique key wouldn’t preserved, since don’t enforce unique key elsewhere. Fixed [.epi_df adjust \"other_keys\" metadata corresponding columns selected . Fixed [.epi_df raise error resulting column names nonunique. Fixed [.epi_df drop metadata decaying tibble (due removal essential columns).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-4-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.4.0","text":"Added check epi_df additional_metadata list. Fixed incorrect as_epi_df examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-4-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.4.0","text":"Applied rename upstream package examples: delphi.epidata -> epidatr. Rounded [.epi_df tests.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-3-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.3.0","text":"Compactification (see ) default may change results working directly epi_archive’s DT field; disable, pass compactify=FALSE. epix_ mutate input epi_archives, may alias alias fields (worry user sticks epix_* functions “regular” R functions copy--write-like behavior, avoiding mutating functions [.data.table). x$ may mutate x; mutates x, return x invisibly (makes sense), , fields, may either mutate object refers reseat reference (); x$ mutate x, result may contain aliases x fields. Removed ..., locf, nan parameters. Changed default behavior, now corresponds using =key(x$DT) (demanding set column names key(y$DT)), =TRUE, locf=TRUE, nan=NaN (post-filling step fixed apply gaps, longer fill NAs originating x$DT y$DT). x y longer allowed share names non-columns. epix_merge longer mutates x argument ($merge continues ). Removed (undocumented) capability passing data.table y. Removed inappropriate/misleading n=7 default argument (due reporting latency, n=7 yield 7 days data typical daily-reporting surveillance data source, one might assumed).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-3-0","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.3.0","text":"New compactify parameter allows removal rows redundant purposes epi_archive’s methods, use last version observation carried forward. New clobberable_versions_start field allows marking range versions “clobbered” (rewritten without assigning new version tags); previously, hard-coded max($DT$version). New versions_end field allows marking range versions beyond max($DT$version) observed, contained changes. New sync parameter controls x y aren’t equally date (.e., x$versions_end y$versions_end different). New function epix_fill_through_version, method $fill_through_version: non-mutating & mutating way ensure archive contains versions least fill_versions_end, extrapolating according necessary. Example archive data object now constructed demand underlying data, based user’s version epi_archive rather outdated R6 implementation whenever data object generated.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.2.0","text":"Removed default n=7 argument epix_slide.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-2-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.2.0","text":"Ignore NAs printing time_value range epi_archive. Fixed misleading column naming epix_slide example. Trimmed epi_slide examples. Synced --date docs.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-2-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.2.0","text":"Removed dependency epi_archive tests example archive. object, made understandable reading without running. Fixed epi_df tests relying S3 method epi_df implemented externally epiprocess. Added tests epi_archive methods wrapper functions. Removed dead code. Made .{Rbuild,git}ignore files comprehensive.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-1-2","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.1.2","text":"treats x optional, constructing empty epi_df default.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-1-2","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.1.2","text":"Fixed geo_type guessing alphabetical strings 2 characters yield \"custom\", US \"nation\". Fixed time_type guessing actually detect Date-class time_values regularly spaced 7 days apart \"week\"-type intended. Improved printing epi_dfs, epi_archivess. Fixed as_of cut (forecast-like) data time_value > max_version. Expanded epi_df docs include conversion tsibble/tbl_ts objects, usage other_keys, pre-processing objects following geo_value, time_value naming scheme. Expanded epi_slide examples show use f argument named parameters. Updated examples print relevant columns given common 80-column terminal width. Added growth rate examples. Improved as_epi_archive epi_archive$new/$initialize documentation, including constructing toy archive.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-1-2","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.1.2","text":"Added tests epi_slide, epi_cor, internal utility functions. Fixed currently-unused internal utility functions MiddleL, MiddleR yield correct results odd-length vectors.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-1-1","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.1.1","text":"New example data objects allow one quickly experiment epi_dfs epi_archives without relying/waiting API fetch data.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-1-1","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.1.1","text":"Improved epi_slide error messaging. Fixed description appropriate parameters f argument epi_slide; previous description give incorrect behavior f named parameters receive values epi_slide’s .... Added examples throughout package. Using example data objects vignettes also speeds vignette compilation.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-1-1","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.1.1","text":"Set gh-actions CI. Added tests epi_dfs.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"implemented-core-functionality-vignettes-0-1-0","dir":"Changelog","previous_headings":"","what":"Implemented core functionality, vignettes","title":"epiprocess 0.1.0","text":"as_epi_df converts epi_df, guessing geo_type, time_type, other_keys, as_of specified. as_epi_df.tbl_ts as_tsibble.epi_df automatically set other_keys key&index, respectively. epi_slide applies user-supplied computation sliding/rolling time window user-specified groups, adding results new columns, recycling/broadcasting results keep result size stable. Allows computation provided function, purrr-style formula, tidyeval dots. Uses slider underneath efficiency. epi_cor calculates Pearson, Kendall, Spearman correlations two (optionally time-shifted) variables epi_df within user-specified groups. Convenience function: is_epi_df. as_epi_archive: prepares epi_archive object data frame containing snapshots /patch data every available version data set. as_of: extracts snapshot data set requested version, epi_df format. epix_slide, $slide: similar epi_slide, epi_archives; requested ref_time_value group, applies time window user-specified computation snapshot data ref_time_value. epix_merge, $merge: like merge epi_archives, allowing last version observation carried forward fill gaps x y. Convenience function: is_epi_archive. growth_rate: estimates growth rate time series using one built-methods based relative change, linear regression, smoothing splines, trend filtering. detect_outlr: applies one outlier detection methods given signal variable, optionally aggregates outputs create consensus result. detect_outlr_rm: outlier detection function based rolling-median-based outlier detection function; one methods included detect_outlr. detect_outlr_stl: outlier detection function based seasonal-trend decomposition using LOESS (STL); one methods included detect_outlr.","code":""}] +[{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"setting-up-the-development-environment","dir":"","previous_headings":"","what":"Setting up the development environment","title":"NA","text":"","code":"install.packages(c('devtools', 'pkgdown', 'styler', 'lintr')) # install dev dependencies devtools::install_deps(dependencies = TRUE) # install package dependencies devtools::document() # generate package meta data and man files devtools::build() # build package"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"validating-the-package","dir":"","previous_headings":"","what":"Validating the package","title":"NA","text":"","code":"styler::style_pkg() # format code lintr::lint_package() # lint code devtools::test() # test package devtools::check() # check package for errors"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"developing-the-documentation-site","dir":"","previous_headings":"","what":"Developing the documentation site","title":"NA","text":"CI builds two version documentation: https://cmu-delphi.github.io/epiprocess/ main branch https://cmu-delphi.github.io/epiprocess/dev dev branch. documentation site can previewed locally running R: open browser, can try using Python server command line:","code":"# Should automatically open a browser pkgdown::build_site(preview=TRUE) R -e 'devtools::document()' R -e 'pkgdown::build_site()' python -m http.server -d docs"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"versioning","dir":"","previous_headings":"","what":"Versioning","title":"NA","text":"Please follow guidelines PR template document.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/DEVELOPMENT.html","id":"planned-cran-release-process","dir":"","previous_headings":"","what":"Planned CRAN release process","title":"NA","text":"Open release issue copy follow checklist issue (modified checklist generated usethis::use_release_issue(version = \"1.0.2\")): git pull dev branch. Make sure changes committed pushed. Check current CRAN check results. Aim 10/10, notes. check works well enough, merge main. Otherwise open PR fix . guidelines. git checkout main git pull may choke MIT license url, ’s ok. devtools::build_readme() devtools::check_win_devel() maintainer (“cre” description) check email problems. may choke, sensitive binary versions packages given system. Either bypass ask someone else run ’re concerned. Update cran-comments.md PR changes (go list ) dev run list . Submit CRAN: devtools::submit_cran(). Maintainer approves email. Wait CRAN… accepted 🎉, move next steps. rejected, fix resubmit. Open merge PR containing updates made main back dev. usethis::use_github_release(publish = FALSE) (publish , otherwise won’t push) create draft release based commit hash CRAN-SUBMISSION push tag GitHub repo. Go repo, verify release notes, publish ready.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022 epiprocess authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"recycling-outputs","dir":"Articles","previous_headings":"","what":"Recycling outputs","title":"Advanced sliding with nonstandard outputs","text":"computation returns single atomic value, epi_slide() internally try recycle output size stable (sense described ). can use advantage, example, order compute trailing average marginally geo values, demonstrate simple synthetic example. slide computation returns atomic vector (rather single value) epi_slide() checks whether return length ensures size stability, , uses fill new column. example, next computation gives result last one. However, output atomic vector (rather single value) size stable, epi_slide() throws error. example, trying return 2 things 3 states.","code":"library(epiprocess) library(dplyr) set.seed(123) edf <- tibble( geo_value = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), time_value = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(geo_value)), x = seq_along(geo_value) + 0.01 * rnorm(length(geo_value)), ) %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) # 2-day trailing average, per geo value edf %>% group_by(geo_value) %>% epi_slide(x_2dav = mean(x), before = 1) %>% ungroup() ## An `epi_df` object, 9 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 4 ## geo_value time_value x x_2dav ## * ## 1 ca 2020-06-01 0.994 0.994 ## 2 ca 2020-06-02 2.00 1.50 ## 3 ca 2020-06-03 3.02 2.51 ## 4 fl 2020-06-01 4.00 4.00 ## 5 fl 2020-06-02 5.00 4.50 ## 6 fl 2020-06-03 6.02 5.51 ## 7 pa 2020-06-01 7.00 7.00 ## 8 pa 2020-06-02 7.99 7.50 ## 9 pa 2020-06-03 8.99 8.49 # 2-day trailing average, marginally edf %>% epi_slide(x_2dav = mean(x), before = 1) ## An `epi_df` object, 9 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 4 ## geo_value time_value x x_2dav ## * ## 1 ca 2020-06-01 0.994 4.00 ## 2 fl 2020-06-01 4.00 4.00 ## 3 pa 2020-06-01 7.00 4.00 ## 4 ca 2020-06-02 2.00 4.50 ## 5 fl 2020-06-02 5.00 4.50 ## 6 pa 2020-06-02 7.99 4.50 ## 7 ca 2020-06-03 3.02 5.50 ## 8 fl 2020-06-03 6.02 5.50 ## 9 pa 2020-06-03 8.99 5.50 edf %>% epi_slide(y_2dav = rep(mean(x), 3), before = 1) ## An `epi_df` object, 9 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 4 ## geo_value time_value x y_2dav ## * ## 1 ca 2020-06-01 0.994 4.00 ## 2 fl 2020-06-01 4.00 4.00 ## 3 pa 2020-06-01 7.00 4.00 ## 4 ca 2020-06-02 2.00 4.50 ## 5 fl 2020-06-02 5.00 4.50 ## 6 pa 2020-06-02 7.99 4.50 ## 7 ca 2020-06-03 3.02 5.50 ## 8 fl 2020-06-03 6.02 5.50 ## 9 pa 2020-06-03 8.99 5.50 edf %>% epi_slide(x_2dav = rep(mean(x), 2), before = 1) ## Error in `.f()`: ## ! The slide computations must either (a) output a single element/row ## each, or (b) one element/row per appearance of the reference time value in ## the local window."},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"multi-column-outputs","dir":"Articles","previous_headings":"","what":"Multi-column outputs","title":"Advanced sliding with nonstandard outputs","text":"Now move outputs data frames single row multiple columns. Working type output structure fact already demonstrated slide vignette. set as_list_col = TRUE call epi_slide(), resulting epi_df object returned epi_slide() list column containing slide values. use as_list_col = FALSE (default epi_slide()), function unnests (sense tidyr::unnest()) list column , resulting epi_df multiple new columns containing slide values. default name unnested columns prefixing name assigned list column () onto column names output data frame slide computation (x_2dav x_2dma) separated “_“. can use names_sep = NULL (gets passed tidyr::unnest()) drop prefix associated list column name, naming unnested columns. Furthermore, epi_slide() recycle single row data frame needed order make result size stable, just like case atomic values.","code":"edf2 <- edf %>% group_by(geo_value) %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = TRUE ) %>% ungroup() class(edf2$a) ## [1] \"list\" length(edf2$a) ## [1] 9 edf2$a[[2]] ## x_2dav x_2dma ## 1 1.496047 0.7437485 edf %>% group_by(geo_value) %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = FALSE ) %>% ungroup() ## An `epi_df` object, 9 x 5 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 5 ## geo_value time_value x a_x_2dav a_x_2dma ## * ## 1 ca 2020-06-01 0.994 0.994 0 ## 2 ca 2020-06-02 2.00 1.50 0.744 ## 3 ca 2020-06-03 3.02 2.51 0.755 ## 4 fl 2020-06-01 4.00 4.00 0 ## 5 fl 2020-06-02 5.00 4.50 0.742 ## 6 fl 2020-06-03 6.02 5.51 0.753 ## 7 pa 2020-06-01 7.00 7.00 0 ## 8 pa 2020-06-02 7.99 7.50 0.729 ## 9 pa 2020-06-03 8.99 8.49 0.746 edf %>% group_by(geo_value) %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = FALSE, names_sep = NULL ) %>% ungroup() ## An `epi_df` object, 9 x 5 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 5 ## geo_value time_value x x_2dav x_2dma ## * ## 1 ca 2020-06-01 0.994 0.994 0 ## 2 ca 2020-06-02 2.00 1.50 0.744 ## 3 ca 2020-06-03 3.02 2.51 0.755 ## 4 fl 2020-06-01 4.00 4.00 0 ## 5 fl 2020-06-02 5.00 4.50 0.742 ## 6 fl 2020-06-03 6.02 5.51 0.753 ## 7 pa 2020-06-01 7.00 7.00 0 ## 8 pa 2020-06-02 7.99 7.50 0.729 ## 9 pa 2020-06-03 8.99 8.49 0.746 edf %>% epi_slide( a = data.frame(x_2dav = mean(x), x_2dma = mad(x)), before = 1, as_list_col = FALSE, names_sep = NULL ) ## An `epi_df` object, 9 x 5 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 5 ## geo_value time_value x x_2dav x_2dma ## * ## 1 ca 2020-06-01 0.994 4.00 4.45 ## 2 fl 2020-06-01 4.00 4.00 4.45 ## 3 pa 2020-06-01 7.00 4.00 4.45 ## 4 ca 2020-06-02 2.00 4.50 3.71 ## 5 fl 2020-06-02 5.00 4.50 3.71 ## 6 pa 2020-06-02 7.99 4.50 3.71 ## 7 ca 2020-06-03 3.02 5.50 3.69 ## 8 fl 2020-06-03 6.02 5.50 3.69 ## 9 pa 2020-06-03 8.99 5.50 3.69"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"multi-row-outputs","dir":"Articles","previous_headings":"","what":"Multi-row outputs","title":"Advanced sliding with nonstandard outputs","text":"slide computation outputs data frame one row, behavior analogous slide computation outputs atomic vector. Meaning, epi_slide() check result size stable, , fill new column(s) resulting epi_df object appropriately. can convenient modeling following sense: can, example, fit sliding, data-versioning-unaware nowcasting forecasting model pooling data different locations, return separate forecasts common model location. use synthetic example demonstrate idea abstractly simply forecasting (actually, nowcasting) y x fitting time-windowed linear model pooling data across locations. example focused simplicity show work multi-row outputs. Note however, following issues example: lm fitting data includes testing instances, training-test split performed. Adding simple training-test split factor reporting latency properly. Data revisions taken account. three factors contribute unrealistic retrospective forecasts overly optimistic retrospective performance evaluations. Instead, one favor epix_slide realistic “pseudoprospective” forecasts. Using epix_slide also makes easier express certain types forecasts; epi_slide, forecasts additional aheads quantile levels need expressed additional columns, nested inside list columns, epix_slide perform size stability checks recycling, allowing computations output number rows.","code":"edf$y <- 2 * edf$x + 0.05 * rnorm(length(edf$x)) edf %>% epi_slide(function(d, ...) { obj <- lm(y ~ x, data = d) return( as.data.frame( predict(obj, newdata = d %>% group_by(geo_value) %>% filter(time_value == max(time_value)), interval = \"prediction\", level = 0.9 ) ) ) }, before = 1, new_col_name = \"fc\", names_sep = NULL) ## Warning: `f` might not have enough positional arguments before its `...`; in the current ## `epi[x]_slide` call, the group key and reference time value will be included in ## `f`'s `...`; if `f` doesn't expect those arguments, it may produce confusing ## error messages ## An `epi_df` object, 9 x 7 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 9 × 7 ## geo_value time_value x y fit lwr upr ## * ## 1 ca 2020-06-01 0.994 1.97 1.96 1.87 2.06 ## 2 fl 2020-06-01 4.00 8.02 8.03 7.95 8.11 ## 3 pa 2020-06-01 7.00 14.1 14.1 14.0 14.2 ## 4 ca 2020-06-02 2.00 4.06 4.01 3.91 4.11 ## 5 fl 2020-06-02 5.00 10.0 10.0 9.94 10.1 ## 6 pa 2020-06-02 7.99 16.0 16.0 15.9 16.1 ## 7 ca 2020-06-03 3.02 6.05 6.07 5.96 6.17 ## 8 fl 2020-06-03 6.02 12.0 12.0 11.9 12.1 ## 9 pa 2020-06-03 8.99 17.9 17.9 17.8 18.0"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"version-aware-forecasting-revisited","dir":"Articles","previous_headings":"","what":"Version-aware forecasting, revisited","title":"Advanced sliding with nonstandard outputs","text":"revisit COVID-19 forecasting example archive vignette order demonstrate preceding points regarding forecast evaluation realistic setting. First, fetch versioned data build archive. Next, extend ARX function handle multiple geo values, since present case, grouping geo value slide computation run multiple geo values . Note , epix_slide() returns grouping variables, time_value, slide computations eventual returned tibble, need include geo_value column output data frame ARX computation. now make forecasts archive compare forecasts latest data. can see forecasts, come training ARX model jointly CA FL, exhibit generally less variability wider prediction bands compared ones archive vignette, come training separate ARX model state. archive vignette, can see difference version-aware (right column) -unaware (left column) forecasting, well.","code":"library(epidatr) library(data.table) library(ggplot2) theme_set(theme_bw()) y1 <- pub_covidcast( source = \"doctor-visits\", signals = \"smoothed_adj_cli\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl\", time_value = epirange(20200601, 20211201), issues = epirange(20200601, 20211201) ) y2 <- pub_covidcast( source = \"jhu-csse\", signal = \"confirmed_7dav_incidence_prop\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl\", time_value = epirange(20200601, 20211201), issues = epirange(20200601, 20211201) ) x <- y1 %>% select(geo_value, time_value, version = issue, percent_cli = value ) %>% as_epi_archive(compactify = FALSE) # mutating merge operation: x <- epix_merge( x, y2 %>% select(geo_value, time_value, version = issue, case_rate_7d_av = value ) %>% as_epi_archive(compactify = FALSE), sync = \"locf\", compactify = FALSE ) library(tidyr) library(purrr) ## ## Attaching package: 'purrr' ## The following object is masked from 'package:data.table': ## ## transpose prob_arx_args <- function(lags = c(0, 7, 14), ahead = 7, min_train_window = 20, lower_level = 0.05, upper_level = 0.95, symmetrize = TRUE, intercept = FALSE, nonneg = TRUE) { return(list( lags = lags, ahead = ahead, min_train_window = min_train_window, lower_level = lower_level, upper_level = upper_level, symmetrize = symmetrize, intercept = intercept, nonneg = nonneg )) } prob_arx <- function(x, y, geo_value, time_value, args = prob_arx_args()) { # Return NA if insufficient training data if (length(y) < args$min_train_window + max(args$lags) + args$ahead) { return(data.frame( geo_value = unique(geo_value), # Return geo value! point = NA, lower = NA, upper = NA )) } # Set up x, y, lags list if (!missing(x)) { x <- data.frame(x, y) } else { x <- data.frame(y) } if (!is.list(args$lags)) args$lags <- list(args$lags) args$lags <- rep(args$lags, length.out = ncol(x)) # Build features and response for the AR model, and then fit it dat <- tibble(i = seq_len(ncol(x)), lag = args$lags) %>% unnest(lag) %>% mutate(name = paste0(\"x\", seq_len(nrow(.)))) %>% # nolint: object_usage_linter # One list element for each lagged feature pmap(function(i, lag, name) { tibble( geo_value = geo_value, time_value = time_value + lag, # Shift back !!name := x[, i] ) }) %>% # One list element for the response vector c(list( tibble( geo_value = geo_value, time_value = time_value - args$ahead, # Shift forward y = y ) )) %>% # Combine them together into one data frame reduce(full_join, by = c(\"geo_value\", \"time_value\")) %>% arrange(time_value) if (args$intercept) dat$x0 <- rep(1, nrow(dat)) obj <- lm(y ~ . + 0, data = select(dat, -geo_value, -time_value)) # Use LOCF to fill NAs in the latest feature values (do this by geo value) setDT(dat) # Convert to a data.table object by reference cols <- setdiff(names(dat), c(\"geo_value\", \"time_value\")) dat[, (cols) := nafill(.SD, type = \"locf\"), .SDcols = cols, by = \"geo_value\"] # Make predictions test_time_value <- max(time_value) point <- predict( obj, newdata = dat %>% dplyr::group_by(geo_value) %>% dplyr::filter(time_value == test_time_value) ) # Compute bands r <- residuals(obj) s <- ifelse(args$symmetrize, -1, NA) # Should the residuals be symmetrized? q <- quantile(c(r, s * r), probs = c(args$lower, args$upper), na.rm = TRUE) lower <- point + q[1] upper <- point + q[2] # Clip at zero if we need to, then return if (args$nonneg) { point <- pmax(point, 0) lower <- pmax(lower, 0) upper <- pmax(upper, 0) } return(data.frame( geo_value = unique(geo_value), # Return geo value! point = point, lower = lower, upper = upper )) } # Latest snapshot of data, and forecast dates x_latest <- epix_as_of(x, max_version = max(x$DT$version)) fc_time_values <- seq(as.Date(\"2020-08-01\"), as.Date(\"2021-11-30\"), by = \"1 month\" ) # Simple function to produce forecasts k weeks ahead k_week_ahead <- function(x, ahead = 7, as_of = TRUE) { if (as_of) { x %>% epix_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, .data$geo_value, .data$time_value, args = prob_arx_args(ahead = ahead) ), before = 119, ref_time_values = fc_time_values ) %>% mutate( target_date = .data$time_value + ahead, as_of = TRUE, geo_value = .data$fc_geo_value ) } else { x_latest %>% epi_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, .data$geo_value, .data$time_value, args = prob_arx_args(ahead = ahead) ), before = 119, ref_time_values = fc_time_values ) %>% mutate(target_date = .data$time_value + ahead, as_of = FALSE) } } # Generate the forecasts, and bind them together fc <- bind_rows( k_week_ahead(x, ahead = 7, as_of = TRUE), k_week_ahead(x, ahead = 14, as_of = TRUE), k_week_ahead(x, ahead = 21, as_of = TRUE), k_week_ahead(x, ahead = 28, as_of = TRUE), k_week_ahead(x, ahead = 7, as_of = FALSE), k_week_ahead(x, ahead = 14, as_of = FALSE), k_week_ahead(x, ahead = 21, as_of = FALSE), k_week_ahead(x, ahead = 28, as_of = FALSE) ) # Plot them, on top of latest COVID-19 case rates ggplot(fc, aes(x = target_date, group = time_value, fill = as_of)) + geom_ribbon(aes(ymin = fc_lower, ymax = fc_upper), alpha = 0.4) + geom_line( data = x_latest, aes(x = time_value, y = case_rate_7d_av), inherit.aes = FALSE, color = \"gray50\" ) + geom_line(aes(y = fc_point)) + geom_point(aes(y = fc_point), size = 0.5) + geom_vline(aes(xintercept = time_value), linetype = 2, alpha = 0.5) + facet_grid(vars(geo_value), vars(as_of), scales = \"free\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 case rates\") + theme(legend.position = \"none\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/advanced.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Advanced sliding with nonstandard outputs","text":"case_rate_7d_av data used document modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. percent_cli data modified part COVIDcast Epidata API Doctor Visits data. dataset licensed terms Creative Commons Attribution 4.0 International license. Copyright Delphi Research Group Carnegie Mellon University 2020.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"converting-to-tsibble-format","dir":"Articles","previous_headings":"","what":"Converting to tsibble format","title":"Aggregate signals over space and time","text":"manipulating wrangling time series data, tsibble already provides whole bunch useful tools. tsibble object (formerly, class tbl_ts) basically tibble (data frame) two specially-marked columns: index column representing time variable (defining order past present), key column identifying unique observational unit time point. fact, key can made number columns, just single one. epi_df object, index variable time_value, key variable typically geo_value (though need always case: example, age group variable another column, serve second key variable). epiprocess package thus provides implementation as_tsibble() epi_df objects, sets variables according defaults. can also set key variable(s) directly call as_tsibble(). Similar SQL keys, key uniquely identify time point (, key index together uniquely identify row), as_tsibble() throws error: can see, duplicate county names Massachusetts Vermont, caused error. Keying county name state name, however, work:","code":"library(tsibble) xt <- as_tsibble(x) head(xt) ## # A tsibble: 6 x 5 [1D] ## # Key: geo_value [1] ## geo_value time_value cases county_name state_name ## ## 1 25001 2020-06-01 4 Barnstable County Massachusetts ## 2 25001 2020-06-02 6 Barnstable County Massachusetts ## 3 25001 2020-06-03 5 Barnstable County Massachusetts ## 4 25001 2020-06-04 8 Barnstable County Massachusetts ## 5 25001 2020-06-05 3 Barnstable County Massachusetts ## 6 25001 2020-06-06 4 Barnstable County Massachusetts key(xt) ## [[1]] ## geo_value index(xt) ## time_value interval(xt) ## ## [1] 1D head(as_tsibble(x, key = \"county_name\")) ## Error in `validate_tsibble()`: ## ! A valid tsibble must have distinct rows identified by key and index. ## ℹ Please use `duplicates()` to check the duplicated rows. head(duplicates(x, key = \"county_name\")) ## # A tibble: 6 × 5 ## geo_value time_value cases county_name state_name ## ## 1 25009 2020-06-01 63 Essex County Massachusetts ## 2 25011 2020-06-01 0 Franklin County Massachusetts ## 3 50009 2020-06-01 0 Essex County Vermont ## 4 50011 2020-06-01 0 Franklin County Vermont ## 5 25009 2020-06-02 74 Essex County Massachusetts ## 6 25011 2020-06-02 0 Franklin County Massachusetts head(as_tsibble(x, key = c(\"county_name\", \"state_name\"))) ## # A tsibble: 6 x 5 [1D] ## # Key: county_name, state_name [1] ## geo_value time_value cases county_name state_name ## ## 1 50001 2020-06-01 0 Addison County Vermont ## 2 50001 2020-06-02 0 Addison County Vermont ## 3 50001 2020-06-03 0 Addison County Vermont ## 4 50001 2020-06-04 0 Addison County Vermont ## 5 50001 2020-06-05 0 Addison County Vermont ## 6 50001 2020-06-06 1 Addison County Vermont"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"detecting-and-filling-time-gaps","dir":"Articles","previous_headings":"","what":"Detecting and filling time gaps","title":"Aggregate signals over space and time","text":"One major advantages tsibble package ability handle implicit gaps time series data. words, can infer time scale ’re interested (say, daily data), detect apparent gaps (say, values reported January 1 3 January 2). can subsequently use functionality make missing entries explicit, generally help avoid bugs downstream data processing tasks. Let’s first remove certain dates data set create gaps: functions has_gaps(), scan_gaps(), count_gaps() tsibble package provide useful summaries, slightly different formats. can also visualize patterns missingness: Using fill_gaps() function tsibble, can replace gaps explicit value. default NA, current case, missingness random rather represents small value censored (hypothetical COVID-19 reports, certainly real phenomenon occurs signals), better replace zero, . (approaches, LOCF: last observation carried forward time, accomplished first filling NA values following second call tidyr::fill().) Note time series Addison, VT starts August 27, 2020, even though original (uncensored) data set drawn period went back June 6, 2020. setting .full = TRUE, can zero-fill entire span observed (censored) data. Explicit imputation missingness (zero-filling case) can important protecting bugs sorts downstream tasks. example, even something simple 7-day trailing average complicated missingness. function epi_slide() looks rows within window 7 days anchored right reference time point (= 6). days given week missing censored small case counts, taking average observed case counts can misleading unintentionally biased upwards. Meanwhile, running epi_slide() zero-filled data brings trailing averages (appropriately) downwards, can see inspecting Plymouth, MA around July 1, 2021.","code":"# First make geo value more readable for tables, plots, etc. x <- x %>% mutate(geo_value = paste( substr(county_name, 1, nchar(county_name) - 7), name_to_abbr(state_name), sep = \", \" )) %>% select(geo_value, time_value, cases) xt <- as_tsibble(x) %>% filter(cases >= 3) head(has_gaps(xt)) ## # A tibble: 6 × 2 ## geo_value .gaps ## ## 1 Addison, VT TRUE ## 2 Barnstable, MA TRUE ## 3 Bennington, VT TRUE ## 4 Berkshire, MA TRUE ## 5 Bristol, MA TRUE ## 6 Caledonia, VT TRUE head(scan_gaps(xt)) ## # A tsibble: 6 x 2 [1D] ## # Key: geo_value [1] ## geo_value time_value ## ## 1 Addison, VT 2020-08-28 ## 2 Addison, VT 2020-08-29 ## 3 Addison, VT 2020-08-30 ## 4 Addison, VT 2020-08-31 ## 5 Addison, VT 2020-09-01 ## 6 Addison, VT 2020-09-02 head(count_gaps(xt)) ## # A tibble: 6 × 4 ## geo_value .from .to .n ## ## 1 Addison, VT 2020-08-28 2020-10-04 38 ## 2 Addison, VT 2020-10-06 2020-10-23 18 ## 3 Addison, VT 2020-10-25 2020-11-04 11 ## 4 Addison, VT 2020-11-06 2020-11-10 5 ## 5 Addison, VT 2020-11-14 2020-11-18 5 ## 6 Addison, VT 2020-11-20 2020-11-20 1 library(ggplot2) theme_set(theme_bw()) ggplot( count_gaps(xt), aes( x = reorder(geo_value, desc(geo_value)), color = geo_value ) ) + geom_linerange(aes(ymin = .from, ymax = .to)) + geom_point(aes(y = .from)) + geom_point(aes(y = .to)) + coord_flip() + labs(x = \"County\", y = \"Date\") + theme(legend.position = \"none\") fill_gaps(xt, cases = 0) %>% head() ## # A tsibble: 6 x 3 [1D] ## # Key: geo_value [1] ## geo_value time_value cases ## ## 1 Addison, VT 2020-08-27 3 ## 2 Addison, VT 2020-08-28 0 ## 3 Addison, VT 2020-08-29 0 ## 4 Addison, VT 2020-08-30 0 ## 5 Addison, VT 2020-08-31 0 ## 6 Addison, VT 2020-09-01 0 xt_filled <- fill_gaps(xt, cases = 0, .full = TRUE) head(xt_filled) ## # A tsibble: 6 x 3 [1D] ## # Key: geo_value [1] ## geo_value time_value cases ## ## 1 Addison, VT 2020-06-01 0 ## 2 Addison, VT 2020-06-02 0 ## 3 Addison, VT 2020-06-03 0 ## 4 Addison, VT 2020-06-04 0 ## 5 Addison, VT 2020-06-05 0 ## 6 Addison, VT 2020-06-06 0 xt %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% ungroup() %>% filter( geo_value == \"Plymouth, MA\", abs(time_value - as.Date(\"2021-07-01\")) <= 3 ) %>% print(n = 7) ## An `epi_df` object, 4 x 4 with metadata: ## * geo_type = custom ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 4 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 Plymouth, MA 2021-06-28 3 4.25 ## 2 Plymouth, MA 2021-06-30 7 5 ## 3 Plymouth, MA 2021-07-01 6 5 ## 4 Plymouth, MA 2021-07-02 6 5.2 xt_filled %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% ungroup() %>% filter( geo_value == \"Plymouth, MA\", abs(time_value - as.Date(\"2021-07-01\")) <= 3 ) %>% print(n = 7) ## An `epi_df` object, 7 x 4 with metadata: ## * geo_type = custom ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 7 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 Plymouth, MA 2021-06-28 3 2.43 ## 2 Plymouth, MA 2021-06-29 0 2.43 ## 3 Plymouth, MA 2021-06-30 7 2.86 ## 4 Plymouth, MA 2021-07-01 6 2.86 ## 5 Plymouth, MA 2021-07-02 6 3.71 ## 6 Plymouth, MA 2021-07-03 0 3.71 ## 7 Plymouth, MA 2021-07-04 0 3.14"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"aggregate-to-different-time-scales","dir":"Articles","previous_headings":"","what":"Aggregate to different time scales","title":"Aggregate signals over space and time","text":"Continuing useful tsibble functionality, can aggregate different time scales using index_by() tsibble, modifies index variable given object applying suitable time-coarsening transformation (say, moving days weeks, weeks months, ). common use case follow call dplyr verb like summarize() order perform kind aggregation measured variables new index variable. , use functions yearweek() yearmonth() provided tsibble package order aggregate weekly monthly resolutions. former call, set week_start = 7 coincide CDC definition epiweek (epidemiological week).","code":"# Aggregate to weekly xt_filled_week <- xt_filled %>% index_by(epiweek = ~ yearweek(., week_start = 7)) %>% group_by(geo_value) %>% summarize(cases = sum(cases, na.rm = TRUE)) head(xt_filled_week) ## # A tsibble: 6 x 3 [1W] ## # Key: geo_value [1] ## geo_value epiweek cases ## ## 1 Addison, VT 2020 W23 0 ## 2 Addison, VT 2020 W24 0 ## 3 Addison, VT 2020 W25 0 ## 4 Addison, VT 2020 W26 0 ## 5 Addison, VT 2020 W27 0 ## 6 Addison, VT 2020 W28 0 # Aggregate to monthly xt_filled_month <- xt_filled_week %>% index_by(month = ~ yearmonth(.)) %>% group_by(geo_value) %>% summarize(cases = sum(cases, na.rm = TRUE)) head(xt_filled_month) ## # A tsibble: 6 x 3 [1M] ## # Key: geo_value [1] ## geo_value month cases ## ## 1 Addison, VT 2020 May 0 ## 2 Addison, VT 2020 Jun 0 ## 3 Addison, VT 2020 Jul 0 ## 4 Addison, VT 2020 Aug 3 ## 5 Addison, VT 2020 Sep 0 ## 6 Addison, VT 2020 Oct 29"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"geographic-aggregation","dir":"Articles","previous_headings":"","what":"Geographic aggregation","title":"Aggregate signals over space and time","text":"TODO","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/aggregation.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Aggregate signals over space and time","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"getting-data-into-epi_archive-format","dir":"Articles","previous_headings":"","what":"Getting data into epi_archive format","title":"Work with archive objects and data revisions","text":"epi_archive() object can constructed data frame, data table, tibble, provided (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. version: time value specifying version row measurements. example, given row version January 15, 2022 time_value January 14, 2022, row contains measurements data January 14, 2022 available one day later. can see , data frame returned epidatr::pub_covidcast() columns required epi_archive format, issue playing role version. can now use as_epi_archive() bring epi_archive format. removal redundant version updates as_epi_archive using compactify, please refer compactify vignette. epi_archive consists primary field DT, data table (data.table package) columns geo_value, time_value, version (possibly additional ones), metadata fields, geo_type time_type. variables geo_value, time_value, version serve key variables data table, well specified metadata (described ). can single row per unique combination key variables, therefore key variables critical figuring generate snapshot data archive, given version (also described ). general, last version observation carried forward (LOCF) fill data recorded versions.","code":"x <- dv %>% select(geo_value, time_value, version = issue, percent_cli = value) %>% as_epi_archive(compactify = TRUE) class(x) print(x) ## [1] \"epi_archive\" ## → An `epi_archive` object, with metadata: ## ℹ Min/max time values: 2020-06-01 / 2021-11-30 ## ℹ First/last version with update: 2020-06-02 / 2021-12-01 ## ℹ Versions end: 2021-12-01 ## ℹ A preview of the table (119316 rows x 4 columns): ## Key: ## geo_value time_value version percent_cli ## ## 1: ca 2020-06-01 2020-06-02 NA ## 2: ca 2020-06-01 2020-06-06 2.140116 ## 3: ca 2020-06-01 2020-06-08 2.140379 ## 4: ca 2020-06-01 2020-06-09 2.114430 ## 5: ca 2020-06-01 2020-06-10 2.133677 ## --- ## 119312: tx 2021-11-26 2021-11-29 1.858596 ## 119313: tx 2021-11-27 2021-11-28 NA ## 119314: tx 2021-11-28 2021-11-29 NA ## 119315: tx 2021-11-29 2021-11-30 NA ## 119316: tx 2021-11-30 2021-12-01 NA class(x$DT) ## [1] \"data.table\" \"data.frame\" head(x$DT) ## Key: ## geo_value time_value version percent_cli ## ## 1: ca 2020-06-01 2020-06-02 NA ## 2: ca 2020-06-01 2020-06-06 2.140116 ## 3: ca 2020-06-01 2020-06-08 2.140379 ## 4: ca 2020-06-01 2020-06-09 2.114430 ## 5: ca 2020-06-01 2020-06-10 2.133677 ## 6: ca 2020-06-01 2020-06-11 2.197207 key(x$DT) ## [1] \"geo_value\" \"time_value\" \"version\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"some-details-on-metadata","dir":"Articles","previous_headings":"","what":"Some details on metadata","title":"Work with archive objects and data revisions","text":"following pieces metadata included fields epi_archive object: geo_type: type geo values. time_type: type time values. additional_metadata: list additional metadata data archive. Metadata epi_archive object x can accessed (altered) directly, x$geo_type x$time_type, etc. Just like as_epi_df(), function as_epi_archive() attempts guess metadata fields epi_archive object instantiated, explicitly specified function call (case ).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"producing-snapshots-in-epi_df-form","dir":"Articles","previous_headings":"","what":"Producing snapshots in epi_df form","title":"Work with archive objects and data revisions","text":"key method epi_archive class epix_as_of(), generates snapshot archive epi_df format. represents --date values signal variables given version. can see max time value epi_df object x_snapshot generated archive May 29, 2021, even though specified version date June 1, 2021. can infer doctor’s visits signal 2 days latent June 1. Also, can see metadata epi_df object version date recorded as_of field. default, using maximum version column underlying data table epi_archive object generates snapshot latest values signal variables entire archive. epix_as_of() function issues warning case, since updates current version may still come later point time, due various reasons, synchronization issues. , pull several snapshots archive, spaced one month apart. overlay corresponding signal curves colored lines, version dates marked dotted vertical lines, draw latest curve black (latest snapshot x_latest archive can provide). can see interesting highly nontrivial revision behavior: points time provisional data snapshots grossly underestimate latest curve (look particular Florida close end 2021), others overestimate (states towards beginning 2021), though quite dramatically. Modeling revision process, often called backfill modeling, important statistical problem .","code":"x_snapshot <- epix_as_of(x, max_version = as.Date(\"2021-06-01\")) class(x_snapshot) ## [1] \"epi_df\" \"tbl_df\" \"tbl\" \"data.frame\" head(x_snapshot) ## An `epi_df` object, 6 x 3 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2021-06-01 ## ## # A tibble: 6 × 3 ## geo_value time_value percent_cli ## * ## 1 ca 2020-06-01 2.75 ## 2 ca 2020-06-02 2.57 ## 3 ca 2020-06-03 2.48 ## 4 ca 2020-06-04 2.41 ## 5 ca 2020-06-05 2.57 ## 6 ca 2020-06-06 2.63 max(x_snapshot$time_value) ## [1] \"2021-05-31\" attributes(x_snapshot)$metadata$as_of ## [1] \"2021-06-01\" x_latest <- epix_as_of(x, max_version = max(x$DT$version)) theme_set(theme_bw()) self_max <- max(x$DT$version) versions <- seq(as.Date(\"2020-06-01\"), self_max - 1, by = \"1 month\") snapshots <- map_dfr(versions, function(v) { epix_as_of(x, max_version = v) %>% mutate(version = v) }) %>% bind_rows( x_latest %>% mutate(version = self_max) ) %>% mutate(latest = version == self_max) ggplot( snapshots %>% filter(!latest), aes(x = time_value, y = percent_cli) ) + geom_line(aes(color = factor(version)), na.rm = TRUE) + geom_vline(aes(color = factor(version), xintercept = version), lty = 2) + facet_wrap(~geo_value, scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"% of doctor's visits with CLI\") + theme(legend.position = \"none\") + geom_line( data = snapshots %>% filter(latest), aes(x = time_value, y = percent_cli), inherit.aes = FALSE, color = \"black\", na.rm = TRUE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"merging-epi_archive-objects","dir":"Articles","previous_headings":"","what":"Merging epi_archive objects","title":"Work with archive objects and data revisions","text":"Now demonstrate merge two epi_archive objects together, e.g., grabbing data multiple sources particular version can performed single epix_as_of call. function epix_merge() made purpose. merge working epi_archive versioned percentage CLI outpatient visits another one versioned COVID-19 case reporting data, fetch COVIDcast API, rate scale (counts per 100,000 people population). merging archives, unless archives identical data release patterns, NAs can introduced non-key variables reasons: - represent “value” observation initial release (need pair additional observations archive released) - represent “value” observation recorded versions (sort situation) - requested via sync=\"na\", represent potential update data yet access (e.g., due encountering issues attempting download currently available version data one archives, ).","code":"y <- pub_covidcast( source = \"jhu-csse\", signals = \"confirmed_7dav_incidence_prop\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl,ny,tx\", time_values = epirange(20200601, 20211201), issues = epirange(20200601, 20211201) ) %>% select(geo_value, time_value, version = issue, case_rate_7d_av = value) %>% as_epi_archive(compactify = TRUE) x <- epix_merge(x, y, sync = \"locf\", compactify = TRUE) print(x) head(x$DT) ## → An `epi_archive` object, with metadata: ## ℹ Min/max time values: 2020-06-01 / 2021-11-30 ## ℹ First/last version with update: 2020-06-02 / 2021-12-01 ## ℹ Versions end: 2021-12-01 ## ℹ A preview of the table (129638 rows x 5 columns): ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## --- ## 129634: tx 2021-11-26 2021-11-29 1.858596 7.957657 ## 129635: tx 2021-11-27 2021-11-28 NA 7.174299 ## 129636: tx 2021-11-28 2021-11-29 NA 6.834681 ## 129637: tx 2021-11-29 2021-11-30 NA 8.841247 ## 129638: tx 2021-11-30 2021-12-01 NA 9.566218 ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## 6: ca 2020-06-01 2020-06-10 2.133677 6.628329"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"sliding-version-aware-computations","dir":"Articles","previous_headings":"","what":"Sliding version-aware computations","title":"Work with archive objects and data revisions","text":"Lastly, demonstrate another key method archives, epix_slide(). works just like epi_slide() epi_df object, one key difference: performs version-aware computations. , computation given reference time t, uses data available t. demonstration, ’ll revisit forecasting example slide vignette, now ’ll build forecaster uses properly-versioned data (available real-time) forecast future COVID-19 case rates current past COVID-19 case rates, well current past values outpatient CLI signal medical claims. ’ll extend prob_ar() function slide vignette accomodate exogenous variables autoregressive model, often referred ARX model. Next slide forecaster working epi_archive object, order forecast COVID-19 case rates 7 days future. get back tibble z grouping variables (geo value), time values, three columns fc_point, fc_lower, fc_upper produced slide computation correspond point forecast, lower upper endpoints 95% prediction band, respectively. (instead set as_list_col = TRUE call epix_slide(), gotten list column fc, element fc data frame named columns point, lower, upper.) whole, epix_slide() works similarly epix_slide(), though notable differences, even apart version-aware aspect. can read documentation epix_slide() details. finish comparing version-aware -unaware forecasts various points time forecast horizons. former comes using epix_slide() epi_archive object x, latter applying epi_slide() latest snapshot data x_latest. row displays forecasts different location (CA, FL, NY, TX), column corresponds whether properly-versioned data used (FALSE means , TRUE means yes). can see properly-versioned forecaster , points time, problematic; example, massively overpredicts peak locations winter wave 2020. However, performance pretty poor across board , whether properly-versioned data used. Similar saw slide vignette, ARX forecasts can volatile, overconfident, . volatility can attenuated training ARX model jointly locations; advanced sliding vignette gives demonstration . really, epipredict package, builds data structures functionality current package, place look robust forecasting methodology. forecasters appear vignettes current package meant demo slide functionality basic forecasting methodology possible.","code":"prob_arx <- function(x, y, lags = c(0, 7, 14), ahead = 7, min_train_window = 20, lower_level = 0.05, upper_level = 0.95, symmetrize = TRUE, intercept = FALSE, nonneg = TRUE) { # Return NA if insufficient training data if (length(y) < min_train_window + max(lags) + ahead) { return(data.frame(point = NA, lower = NA, upper = NA)) } # Useful transformations if (!missing(x)) { x <- data.frame(x, y) } else { x <- data.frame(y) } if (!is.list(lags)) lags <- list(lags) lags <- rep(lags, length.out = ncol(x)) # Build features and response for the AR model, and then fit it dat <- do.call( data.frame, unlist( # Below we loop through and build the lagged features purrr::map(seq_len(ncol(x)), function(i) { purrr::map(lags[[i]], function(j) lag(x[, i], n = j)) }), recursive = FALSE ) ) names(dat) <- paste0(\"x\", seq_len(ncol(dat))) if (intercept) dat$x0 <- rep(1, nrow(dat)) dat$y <- lead(y, n = ahead) obj <- lm(y ~ . + 0, data = dat) # Use LOCF to fill NAs in the latest feature values, make a prediction setDT(dat) setnafill(dat, type = \"locf\") point <- predict(obj, newdata = tail(dat, 1)) # Compute a band r <- residuals(obj) s <- ifelse(symmetrize, -1, NA) # Should the residuals be symmetrized? q <- quantile(c(r, s * r), probs = c(lower_level, upper_level), na.rm = TRUE) lower <- point + q[1] upper <- point + q[2] # Clip at zero if we need to, then return if (nonneg) { point <- max(point, 0) lower <- max(lower, 0) upper <- max(upper, 0) } return(data.frame(point = point, lower = lower, upper = upper)) } fc_time_values <- seq(as.Date(\"2020-08-01\"), as.Date(\"2021-11-30\"), by = \"1 month\" ) z <- x %>% group_by(geo_value) %>% epix_slide( fc = prob_arx(x = percent_cli, y = case_rate_7d_av), before = 119, ref_time_values = fc_time_values ) %>% ungroup() head(z, 10) ## # A tibble: 10 × 5 ## geo_value time_value fc_point fc_lower fc_upper ## ## 1 ca 2020-08-01 21.0 19.1 23.0 ## 2 fl 2020-08-01 44.5 38.9 50.0 ## 3 ny 2020-08-01 3.10 2.89 3.31 ## 4 tx 2020-08-01 35.5 33.6 37.4 ## 5 ca 2020-09-01 22.9 20.1 25.8 ## 6 fl 2020-09-01 15.5 10.5 20.6 ## 7 ny 2020-09-01 3.16 2.93 3.39 ## 8 tx 2020-09-01 17.5 14.3 20.7 ## 9 ca 2020-10-01 12.8 9.21 16.5 ## 10 fl 2020-10-01 14.7 8.72 20.6 x_latest <- epix_as_of(x, max_version = max(x$DT$version)) # Simple function to produce forecasts k weeks ahead k_week_ahead <- function(x, ahead = 7, as_of = TRUE) { if (as_of) { x %>% group_by(.data$geo_value) %>% epix_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, ahead = ahead), before = 119, ref_time_values = fc_time_values ) %>% mutate(target_date = .data$time_value + ahead, as_of = TRUE) %>% ungroup() } else { x_latest %>% group_by(.data$geo_value) %>% epi_slide( fc = prob_arx(.data$percent_cli, .data$case_rate_7d_av, ahead = ahead), before = 119, ref_time_values = fc_time_values ) %>% mutate(target_date = .data$time_value + ahead, as_of = FALSE) %>% ungroup() } } # Generate the forecasts, and bind them together fc <- bind_rows( k_week_ahead(x, ahead = 7, as_of = TRUE), k_week_ahead(x, ahead = 14, as_of = TRUE), k_week_ahead(x, ahead = 21, as_of = TRUE), k_week_ahead(x, ahead = 28, as_of = TRUE), k_week_ahead(x, ahead = 7, as_of = FALSE), k_week_ahead(x, ahead = 14, as_of = FALSE), k_week_ahead(x, ahead = 21, as_of = FALSE), k_week_ahead(x, ahead = 28, as_of = FALSE) ) # Plot them, on top of latest COVID-19 case rates ggplot(fc, aes(x = target_date, group = time_value, fill = as_of)) + geom_ribbon(aes(ymin = fc_lower, ymax = fc_upper), alpha = 0.4) + geom_line( data = x_latest, aes(x = time_value, y = case_rate_7d_av), inherit.aes = FALSE, color = \"gray50\" ) + geom_line(aes(y = fc_point)) + geom_point(aes(y = fc_point), size = 0.5) + geom_vline(aes(xintercept = time_value), linetype = 2, alpha = 0.5) + facet_grid(vars(geo_value), vars(as_of), scales = \"free\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 case rates\") + theme(legend.position = \"none\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/archive.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Work with archive objects and data revisions","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. percent_cli data modified part COVIDcast Epidata API Doctor Visits data. dataset licensed terms Creative Commons Attribution 4.0 International license. Copyright Delphi Research Group Carnegie Mellon University 2020.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/compactify.html","id":"removing-redundant-update-data-to-save-space","dir":"Articles","previous_headings":"","what":"Removing redundant update data to save space","title":"Compactify to remove redundant archive data","text":"need store version update rows look like last version corresponding observations carried forward (LOCF) use epiprocess‘s’ epi_archive-related functions, apply LOCF fill data explicit updates. default, even detect remove LOCF-redundant rows save space; impact results long directly work archive’s DT field way expects rows remain. three different values can assigned compactify: argument: LOCF-redundant rows, removes issues warning information rows removed TRUE: removes LOCF-redundant rows without warning feedback FALSE: keeps LOCF-redundant rows without warning feedback example, one chart using LOCF values, another doesn’t use illustrate LOCF. Notice head first dataset differs second third value included. LOCF-redundant values can mar performance dataset operations. column case_rate_7d_av many LOCF-redundant values percent_cli, omit percent_cli column comparing performance. example, huge proportion original version update data LOCF-redundant, compactifying saves large amount space. proportion data LOCF-redundant can vary widely data sets, won’t always lucky. expect, performing 1000 iterations dplyr::filter faster LOCF values omitted. also like measure speed epi_archive methods. detailed performance comparison:","code":"library(epiprocess) library(dplyr) dt <- archive_cases_dv_subset$DT locf_omitted <- as_epi_archive(dt) ## Warning: Found rows that appear redundant based on last (version of each) observation carried forward; these rows have been removed to 'compactify' and save space: ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 2: ca 2020-06-01 2020-06-23 2.498918 6.628329 ## 3: ca 2020-06-01 2020-07-23 2.698157 6.603020 ## --- ## 4793: tx 2021-10-18 2021-10-22 NA 23.819450 ## 4794: tx 2021-10-19 2021-10-22 NA 24.705959 ## 4795: tx 2021-10-20 2021-10-22 NA 16.464639 ## Built-in `epi_archive` functionality should be unaffected, but results may change if you work directly with its fields (such as `DT`). See `?as_epi_archive` for details. To silence this warning but keep compactification, you can pass `compactify=TRUE` when constructing the archive. locf_included <- as_epi_archive(dt, compactify = FALSE) head(locf_omitted$DT) ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 4: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## 5: ca 2020-06-01 2020-06-10 2.133677 6.628329 ## 6: ca 2020-06-01 2020-06-11 2.197207 6.628329 head(locf_included$DT) ## Key: ## geo_value time_value version percent_cli case_rate_7d_av ## ## 1: ca 2020-06-01 2020-06-02 NA 6.628329 ## 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 ## 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 ## 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 ## 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 ## 6: ca 2020-06-01 2020-06-10 2.133677 6.628329 dt2 <- select(dt, -percent_cli) locf_included_2 <- as_epi_archive(dt2, compactify = FALSE) locf_omitted_2 <- as_epi_archive(dt2, compactify = TRUE) nrow(locf_included_2$DT) ## [1] 129638 nrow(locf_omitted_2$DT) ## [1] 9355 # Performance of filtering iterate_filter <- function(my_ea) { for (i in 1:1000) { filter(my_ea$DT, version >= as.Date(\"2020-01-01\") + i) } } elapsed_time <- function(fx) c(system.time(fx))[[3]] speed_test <- function(f, name) { data.frame( operation = name, locf = elapsed_time(f(locf_included_2)), no_locf = elapsed_time(f(locf_omitted_2)) ) } speeds <- speed_test(iterate_filter, \"filter_1000x\") # Performance of as_of iterated 200 times iterate_as_of <- function(my_ea) { for (i in 1:1000) { my_ea %>% epix_as_of(min(my_ea$DT$time_value) + i - 1000) } } speeds <- rbind(speeds, speed_test(iterate_as_of, \"as_of_1000x\")) # Performance of slide slide_median <- function(my_ea) { my_ea %>% epix_slide(median = median(.data$case_rate_7d_av), before = 7) } speeds <- rbind(speeds, speed_test(slide_median, \"slide_median\")) speeds_tidy <- tidyr::gather(speeds, key = \"is_locf\", value = \"time_in_s\", locf, no_locf) library(ggplot2) ggplot(speeds_tidy) + geom_bar(aes(x = is_locf, y = time_in_s, fill = operation), stat = \"identity\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"correlations-grouped-by-time","dir":"Articles","previous_headings":"","what":"Correlations grouped by time","title":"Correlate signals over space and time","text":"epi_cor() function operates epi_df object, requires specification variables correlate, next two arguments (var1 var2). general, can specify grouping variable (combination variables) correlation computations call epi_cor(), via cor_by argument. potentially leads many ways compute correlations. always least two ways compute correlations epi_df: grouping time value, geo value. former obtained via cor_by = time_value. plot addresses question: “given day, case death rates linearly associated, across U.S. states?”. might interested broadening question, instead asking: “given day, higher case rates tend associate higher death rates?”, removing dependence linear relationship. latter can addressed using Spearman correlation, accomplished setting method = \"spearman\" call epi_cor(). Spearman correlation highly robust invariant monotone transformations.","code":"library(ggplot2) theme_set(theme_bw()) z1 <- epi_cor(x, case_rate, death_rate, cor_by = \"time_value\") ggplot(z1, aes(x = time_value, y = cor)) + geom_line() + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Correlation\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"lagged-correlations","dir":"Articles","previous_headings":"","what":"Lagged correlations","title":"Correlate signals over space and time","text":"might also interested case rates associate death rates future. Using dt1 parameter epi_cor(), can lag case rates back number days want, calculating correlations. , set dt1 = -10. means var1 = case_rate lagged 10 days, case rates June 1st correlated death rates June 11th. (might also help think way: death rates certain day correlated case rates offset -10 days.) Note epi_cor() takes argument shift_by determines grouping use time shifts. default geo_value, makes sense problem hand (another setting, may want group geo value another variable—say, age—time shifting). can see , generally, lagging case rates back 10 days improves correlations, confirming case rates better correlated death rates 10 days now.","code":"z2 <- epi_cor(x, case_rate, death_rate, cor_by = time_value, dt1 = -10) z <- rbind( z1 %>% mutate(lag = 0), z2 %>% mutate(lag = 10) ) %>% mutate(lag = as.factor(lag)) ggplot(z, aes(x = time_value, y = cor)) + geom_line(aes(color = lag)) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Correlation\", col = \"Lag\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"correlations-grouped-by-state","dir":"Articles","previous_headings":"","what":"Correlations grouped by state","title":"Correlate signals over space and time","text":"second option group geo value, obtained setting cor_by = geo_value. ’ll look correlations 0- 10-day lagged case rates. can see , generally speaking, lagging case rates back 10 days improves correlations.","code":"z1 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value) z2 <- epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -10) z <- rbind( z1 %>% mutate(lag = 0), z2 %>% mutate(lag = 10) ) %>% mutate(lag = as.factor(lag)) ggplot(z, aes(cor)) + geom_density(aes(fill = lag, col = lag), alpha = 0.5) + labs(x = \"Correlation\", y = \"Density\", fill = \"Lag\", col = \"Lag\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"more-systematic-lag-analysis","dir":"Articles","previous_headings":"","what":"More systematic lag analysis","title":"Correlate signals over space and time","text":"Next perform systematic investigation correlations broad range lag values. can see pretty clear curvature mean correlation case death rates (correlations come grouping geo value) function lag. maximum occurs lag somewhere around 17 days.","code":"library(purrr) lags <- 0:35 z <- map_dfr(lags, function(lag) { epi_cor(x, case_rate, death_rate, cor_by = geo_value, dt1 = -lag) %>% mutate(lag = .env$lag) }) z %>% group_by(lag) %>% summarize(mean = mean(cor, na.rm = TRUE)) %>% ggplot(aes(x = lag, y = mean)) + geom_line() + geom_point() + labs(x = \"Lag\", y = \"Mean correlation\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/correlation.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Correlate signals over space and time","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"motivation","dir":"Articles","previous_headings":"","what":"Motivation","title":"Get started with `epiprocess`","text":"{epiprocess} {epipredict} designed lower barrier entry implementation cost epidemiological time series analysis forecasting. Epidemiologists forecasting groups repeatedly separately rush implement type functionality much ad hoc manner; trying save effort future providing well-documented, tested, general packages can called many common tasks instead. {epiprocess} also provides tools help avoid particularly common pitfall analysis forecasting: ignoring reporting latency revisions data set. can, example, lead one retrospectively analyzing surveillance signal forecasting model concluding much accurate actually real time, producing always-decreasing forecasts data sets initial surveillance estimates systematically revised upward. Storing working version history can help avoid issues.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"intended-audience","dir":"Articles","previous_headings":"","what":"Intended audience","title":"Get started with `epiprocess`","text":"expect users proficient R, familiar {dplyr} {tidyr} packages.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"installing","dir":"Articles","previous_headings":"","what":"Installing","title":"Get started with `epiprocess`","text":"package CRAN yet, can installed using {devtools} package: Building vignettes, getting started guide, takes significant amount time. included package default. want include vignettes, use modified command:","code":"devtools::install_github(\"cmu-delphi/epiprocess\", ref = \"main\") devtools::install_github(\"cmu-delphi/epiprocess\", ref = \"main\", build_vignettes = TRUE, dependencies = TRUE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"getting-data-into-epi_df-format","dir":"Articles","previous_headings":"","what":"Getting data into epi_df format","title":"Get started with `epiprocess`","text":"’ll start showing get data epi_df format, just tibble bit special structure, format assumed functions epiprocess package. epi_df object (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. can number columns can serve measured variables, also broadly refer signal variables. documentation gives details data format. data frame tibble geo_value time_value columns can converted epi_df object, using function as_epi_df(). example, ’ll work daily cumulative COVID-19 cases four U.S. states: CA, FL, NY, TX, time span mid 2020 early 2022, ’ll use epidatr package fetch data COVIDcast API. can see, data frame returned epidatr::pub_covidcast() columns required epi_df object (along many others). can use as_epi_df(), specification relevant metadata, bring data frame epi_df format.","code":"library(epidatr) library(epiprocess) library(dplyr) library(tidyr) library(withr) cases <- pub_covidcast( source = \"jhu-csse\", signals = \"confirmed_cumulative_num\", geo_type = \"state\", time_type = \"day\", geo_values = \"ca,fl,ny,tx\", time_values = epirange(20200301, 20220131), ) colnames(cases) ## [1] \"geo_value\" \"signal\" \"source\" ## [4] \"geo_type\" \"time_type\" \"time_value\" ## [7] \"direction\" \"issue\" \"lag\" ## [10] \"missing_value\" \"missing_stderr\" \"missing_sample_size\" ## [13] \"value\" \"stderr\" \"sample_size\" x <- as_epi_df(cases, geo_type = \"state\", time_type = \"day\", as_of = max(cases$issue) ) %>% select(geo_value, time_value, total_cases = value) class(x) ## [1] \"epi_df\" \"tbl_df\" \"tbl\" \"data.frame\" summary(x) ## An `epi_df` x, with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2023-03-10 ## ---------- ## * min time value = 2020-03-01 ## * max time value = 2022-01-31 ## * average rows per time value = 4 head(x) ## An `epi_df` object, 6 x 3 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2023-03-10 ## ## # A tibble: 6 × 3 ## geo_value time_value total_cases ## * ## 1 ca 2020-03-01 19 ## 2 fl 2020-03-01 0 ## 3 ny 2020-03-01 0 ## 4 tx 2020-03-01 0 ## 5 ca 2020-03-02 23 ## 6 fl 2020-03-02 1 attributes(x)$metadata ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2023-03-10\" ## ## $other_keys ## character(0)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"some-details-on-metadata","dir":"Articles","previous_headings":"","what":"Some details on metadata","title":"Get started with `epiprocess`","text":"general, epi_df object following fields metadata: geo_type: type geo values. time_type: type time values. as_of: time value given data available. Metadata epi_df object x can accessed (altered) via attributes(x)$metadata. first two fields , geo_type time_type, currently used downstream functions epiprocess package, serve useful bits information convey data set hand. last field , as_of, one unique aspects epi_df object. brief, can think epi_df object single snapshot data set contains --date values signals interest, time specified as_of. example, as_of January 31, 2022, epi_df object --date version data available January 31, 2022. epiprocess package also provides companion data structure called epi_archive, stores full version history given data set. See archive vignette . geo_type, time_type, as_of arguments missing call as_epi_df(), function try infer passed object. Usually, geo_type time_type can inferred geo_value time_value columns, respectively, inferring as_of field easy. See documentation as_epi_df() details.","code":"x <- as_epi_df(cases, as_of = as.Date(\"2024-03-20\")) %>% select(geo_value, time_value, total_cases = value) attributes(x)$metadata ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2024-03-20\" ## ## $other_keys ## character(0)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"using-additional-key-columns-in-epi_df","dir":"Articles","previous_headings":"","what":"Using additional key columns in epi_df","title":"Get started with `epiprocess`","text":"following examples show create epi_df additional keys.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"converting-a-tsibble-that-has-county-code-as-an-extra-key","dir":"Articles","previous_headings":"Using additional key columns in epi_df","what":"Converting a tsibble that has county code as an extra key","title":"Get started with `epiprocess`","text":"metadata now includes county_code extra key.","code":"ex1 <- tibble( geo_value = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), county_code = c( \"06059\", \"06061\", \"06067\", \"12111\", \"12113\", \"12117\", \"42101\", \"42103\", \"42105\" ), time_value = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(geo_value)), value = seq_along(geo_value) + 0.01 * withr::with_rng_version(\"3.0.0\", withr::with_seed(42, length(geo_value))) ) %>% as_tsibble(index = time_value, key = c(geo_value, county_code)) ex1 <- as_epi_df(x = ex1, geo_type = \"state\", time_type = \"day\", as_of = \"2020-06-03\") attr(ex1, \"metadata\") ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2020-06-03\" ## ## $other_keys ## [1] \"county_code\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"dealing-with-misspecified-column-names","dir":"Articles","previous_headings":"Using additional key columns in epi_df","what":"Dealing with misspecified column names","title":"Get started with `epiprocess`","text":"epi_df requires columns geo_value time_value, exist as_epi_df() throws error. columns can renamed match epi_df format. example , notice also additional key pol.","code":"data.frame( # misnamed state = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), # extra key pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 3), # misnamed reported_date = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(geo_value)), value = seq_along(geo_value) + 0.01 * withr::with_rng_version(\"3.0.0\", withr::with_seed(42, length(geo_value))) ) %>% as_epi_df(as_of = as.Date(\"2024-03-20\")) ## Error in eval(expr, envir, enclos): object 'geo_value' not found ex2 <- tibble( # misnamed state = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), # extra key pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 3), # misnamed reported_date = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\"), length.out = length(state)), value = seq_along(state) + 0.01 * withr::with_rng_version(\"3.0.0\", withr::with_seed(42, length(state))) ) %>% data.frame() head(ex2) ## state pol reported_date value ## 1 ca blue 2020-06-01 1.09 ## 2 ca blue 2020-06-02 2.09 ## 3 ca blue 2020-06-03 3.09 ## 4 fl swing 2020-06-01 4.09 ## 5 fl swing 2020-06-02 5.09 ## 6 fl swing 2020-06-03 6.09 ex2 <- ex2 %>% rename(geo_value = state, time_value = reported_date) %>% as_epi_df( geo_type = \"state\", as_of = \"2020-06-03\", additional_metadata = list(other_keys = \"pol\") ) attr(ex2, \"metadata\") ## $geo_type ## [1] \"state\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2020-06-03\" ## ## $other_keys ## [1] \"pol\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"adding-additional-keys-to-an-epi_df-object","dir":"Articles","previous_headings":"Using additional key columns in epi_df","what":"Adding additional keys to an epi_df object","title":"Get started with `epiprocess`","text":"examples, keys added objects epi_df objects. illustrate add keys epi_df object. use toy data set included epiprocess prepared using covidcast library filtering single state simplicity. Now add state (MA) pol new columns data new keys metadata. Reminder lower case state name abbreviations expect geo_value column. Note two additional keys added, state pol, specified character vector other_keys component additional_metadata list. must specified manner downstream actions epi_df, like model fitting prediction, can recognize use keys. Currently other_keys metadata epi_df doesn’t impact epi_slide(), contrary other_keys as_epi_archive affects update data interpreted.","code":"ex3 <- jhu_csse_county_level_subset %>% filter(time_value > \"2021-12-01\", state_name == \"Massachusetts\") %>% slice_tail(n = 6) attr(ex3, \"metadata\") # geo_type is county currently ## $geo_type ## [1] \"county\" ## ## $time_type ## [1] \"day\" ## ## $as_of ## [1] \"2022-05-23 21:35:45 UTC\" ex3 <- ex3 %>% as_tibble() %>% # needed to add the additional metadata mutate( state = rep(tolower(\"MA\"), 6), pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 2) ) %>% as_epi_df(additional_metadata = list(other_keys = c(\"state\", \"pol\")), as_of = as.Date(\"2024-03-20\")) attr(ex3, \"metadata\") ## $geo_type ## [1] \"county\" ## ## $time_type ## [1] \"week\" ## ## $as_of ## [1] \"2024-03-20\" ## ## $other_keys ## [1] \"state\" \"pol\""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"working-with-epi_df-objects-downstream","dir":"Articles","previous_headings":"","what":"Working with epi_df objects downstream","title":"Get started with `epiprocess`","text":"Data epi_df format easy work downstream, since standard tabular data format; vignettes, ’ll walk basic signal processing tasks using functions provided epiprocess package. course, can also write custom code downstream uses, like plotting, pretty easy ggplot2. last couple examples, ’ll look data sets just show might get epi_df format. Data daily new (cumulative) SARS cases Canada 2003, outbreaks package: Get confirmed cases Ebola Sierra Leone 2014 2015 province date onset, prepared line list data package:","code":"library(ggplot2) theme_set(theme_bw()) ggplot(x, aes(x = time_value, y = total_cases, color = geo_value)) + geom_line() + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Cumulative COVID-19 cases\", color = \"State\") x <- outbreaks::sars_canada_2003 %>% mutate(geo_value = \"ca\") %>% select(geo_value, time_value = date, starts_with(\"cases\")) %>% as_epi_df(geo_type = \"nation\", as_of = as.Date(\"2024-03-20\")) head(x) ## An `epi_df` object, 6 x 6 with metadata: ## * geo_type = nation ## * time_type = day ## * as_of = 2024-03-20 ## ## # A tibble: 6 × 6 ## geo_value time_value cases_travel cases_household cases_healthcare cases_other ## * ## 1 ca 2003-02-23 1 0 0 0 ## 2 ca 2003-02-24 0 0 0 0 ## 3 ca 2003-02-25 0 0 0 0 ## 4 ca 2003-02-26 0 1 0 0 ## 5 ca 2003-02-27 0 0 0 0 ## 6 ca 2003-02-28 1 0 0 0 library(tidyr) x <- x %>% pivot_longer(starts_with(\"cases\"), names_to = \"type\") %>% mutate(type = substring(type, 7)) yrange <- range( x %>% group_by(time_value) %>% summarize(value = sum(value)) %>% pull(value) ) ggplot(x, aes(x = time_value, y = value)) + geom_col(aes(fill = type)) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + scale_y_continuous(breaks = yrange[1]:yrange[2]) + labs(x = \"Date\", y = \"SARS cases in Canada\", fill = \"Type\") x <- outbreaks::ebola_sierraleone_2014 %>% select(district, date_of_onset, status) %>% mutate(province = case_when( district %in% c(\"Kailahun\", \"Kenema\", \"Kono\") ~ \"Eastern\", district %in% c( \"Bombali\", \"Kambia\", \"Koinadugu\", \"Port Loko\", \"Tonkolili\" ) ~ \"Northern\", district %in% c(\"Bo\", \"Bonthe\", \"Moyamba\", \"Pujehun\") ~ \"Sourthern\", district %in% c(\"Western Rural\", \"Western Urban\") ~ \"Western\" )) %>% group_by(geo_value = province, time_value = date_of_onset) %>% summarise(cases = sum(status == \"confirmed\"), .groups = \"drop\") %>% complete(geo_value, time_value = full_seq(time_value, period = 1), fill = list(cases = 0) ) %>% as_epi_df(geo_type = \"province\", as_of = as.Date(\"2024-03-20\")) ggplot(x, aes(x = time_value, y = cases)) + geom_col(aes(fill = geo_value), show.legend = FALSE) + facet_wrap(~geo_value, scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Confirmed cases of Ebola in Sierra Leone\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/epiprocess.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Get started with `epiprocess`","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"growth-rate-basics","dir":"Articles","previous_headings":"","what":"Growth rate basics","title":"Estimate growth rates in signals","text":"growth rate function \\(f\\) defined continuously-valued parameter \\(t\\) defined \\(f'(t)/f(t)\\), \\(f'(t)\\) derivative \\(f\\) \\(t\\). estimate growth rate signal discrete-time (can thought evaluations discretizations underlying function continuous-time), can estimate derivative divide signal value (possibly smoothed version signal value). growth_rate() function takes sequence underlying design points x corresponding sequence y signal values, allows us choose following methods estimating growth rate given reference point x0, setting method argument: “rel_change”: uses \\((\\bar B/\\bar - 1) / h\\), \\(\\bar B\\) average y second half sliding window bandwidth h centered reference point x0, \\(\\bar \\) average first half. can seen using first-difference approximation derivative. “linear_reg”: uses slope linear regression y x sliding window centered reference point x0, divided fitted value linear regression x0. “smooth_spline”: uses estimated derivative x0 smoothing spline fit x y, via stats::smooth.spline(), divided fitted value spline x0. “trend_filter”: uses estimated derivative x0 polynomial trend filtering (discrete spline) fit x y, via genlasso::trendfilter(), divided fitted value discrete spline x0. default growth_rate() x0 = x, returns estimate growth rate underlying design point.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"relative-change","dir":"Articles","previous_headings":"","what":"Relative change","title":"Estimate growth rates in signals","text":"default method “rel_change”, simplest way estimate growth rates. default bandwidth h = 7, daily data, considers relative change signal adjacent weeks. can wrap growth_rate() call dplyr::mutate() append new column epi_df object computed growth rates. can visualize growth rate estimates plotting signal values highlighting periods time relative change 1% (red) -1% (blue), faceting geo value. direct visualization, plot estimated growth rates , overlaying curves two states one plot. can see estimated growth rates relative change method somewhat volatile, appears bias towards towards right boundary time span—look estimated growth rate Georgia late December 2021, takes potentially suspicious dip. general, estimation derivatives difficult near boundary, relative changes can suffer particularly noticeable boundary bias based difference averages two halves local window, simplistic approach, one halves truncated near boundary.","code":"x <- x %>% group_by(geo_value) %>% mutate(cases_gr1 = growth_rate(time_value, cases)) head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## # Groups: geo_value [1] ## geo_value time_value cases cases_gr1 ## * ## 1 ga 2020-06-01 643. 0.00601 ## 2 ga 2020-06-02 603. 0.0185 ## 3 ga 2020-06-03 608 0.0240 ## 4 ga 2020-06-04 656. 0.0218 ## 5 ga 2020-06-05 677. 0.0193 ## 6 ga 2020-06-06 718. 0.0163 ## 7 ga 2020-06-07 691. 0.0180 ## 8 ga 2020-06-08 656. 0.0234 ## 9 ga 2020-06-09 720. 0.0227 ## 10 ga 2020-06-10 727. 0.0227 library(ggplot2) theme_set(theme_bw()) upper <- 0.01 lower <- -0.01 ggplot(x, aes(x = time_value, y = cases)) + geom_tile( data = x %>% filter(cases_gr1 >= upper), aes(x = time_value, y = 0, width = 7, height = Inf), fill = 2, alpha = 0.08 ) + geom_tile( data = x %>% filter(cases_gr1 <= lower), aes(x = time_value, y = 0, width = 7, height = Inf), fill = 4, alpha = 0.08 ) + geom_line() + facet_wrap(vars(geo_value), scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 cases\") ggplot(x, aes(x = time_value, y = cases_gr1)) + geom_line(aes(col = geo_value)) + geom_hline(yintercept = upper, linetype = 2, col = 2) + geom_hline(yintercept = lower, linetype = 2, col = 4) + scale_color_manual(values = c(3, 6)) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"State\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"linear-regression","dir":"Articles","previous_headings":"","what":"Linear regression","title":"Estimate growth rates in signals","text":"second simplest method available “linear_reg”, whose default bandwidth h = 7. Compared “rel_change”, appears behave similarly overall, thankfully avoids troublesome spikes:","code":"x <- x %>% group_by(geo_value) %>% mutate(cases_gr2 = growth_rate(time_value, cases, method = \"linear_reg\")) x %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr1 = \"rel_change\", cases_gr2 = \"linear_reg\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(2, 4)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"nonparametric-estimation","dir":"Articles","previous_headings":"","what":"Nonparametric estimation","title":"Estimate growth rates in signals","text":"can also use nonparametric method estimate derivative, “smooth_spline” “trend_filter”. latter going generally computationally expensive, also able adapt better local level smoothness. (apparent efficiency actually compounded particular implementations default settings methods: “trend_filter” based full solution path algorithm provided genlasso package, performs cross-validation default order pick level regularization; read documentation growth_rate() details.) particular example, trend filtering estimates growth rate appear much stable smoothing spline, also much stable estimates local relative changes linear regressions. smoothing spline growth rate estimates based default settings stats::smooth.spline(), appear severely -regularized . arguments stats::smooth.spline() can customized passing additional arguments ... call growth_rate(); similarly, can also use additional arguments customize settings underlying trend filtering functions genlasso::trendfilter(), genlasso::cv.trendfilter(), documentation growth_rate() gives full details.","code":"x <- x %>% group_by(geo_value) %>% mutate( cases_gr3 = growth_rate(time_value, cases, method = \"smooth_spline\"), cases_gr4 = growth_rate(time_value, cases, method = \"trend_filter\") ) x %>% select(geo_value, time_value, cases_gr3, cases_gr4) %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr3 = \"smooth_spline\", cases_gr4 = \"trend_filter\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(3, 6)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"log-scale-estimation","dir":"Articles","previous_headings":"","what":"Log scale estimation","title":"Estimate growth rates in signals","text":"general, alternative view growth rate function \\(f\\) given defining \\(g(t) = \\log(f(t))\\), observing \\(g'(t) = f'(t)/f(t)\\). Therefore, method estimates derivative can simply applied log signal interest, light, method (“rel_change”, “linear_reg”, “smooth_spline”, “trend_filter”) log scale analog, can used setting argument log_scale = TRUE call growth_rate(). Comparing rel_change_log curves rel_change counterparts (shown earlier figures), see former curves appear less volatile match linear regression estimates much closely. particular, rel_change upward spikes, rel_change_log less pronounced spikes. occur? estimate \\(g'(t)\\) can expressed \\(\\mathbb E[\\log(B)-\\log()]/h = \\mathbb E[\\log(1+hR)]/h\\), \\(R = ((B-)/h) / \\), expectation refers averaging \\(h\\) observations window. Consider following two relevant inequalities, due concavity logarithm function: \\[ \\mathbb E[\\log(1+hR)]/h \\leq \\log(1+h\\mathbb E[R])/h \\leq \\mathbb E[R]. \\] first inequality Jensen’s; second inequality tangent line concave function lies . Finally, observe \\(\\mathbb E[R] \\approx ((\\bar B-\\bar )/h) / \\bar \\), rel_change estimate. explains rel_change_log curve often lies rel_change curve.","code":"x <- x %>% group_by(geo_value) %>% mutate( cases_gr5 = growth_rate(time_value, cases, method = \"rel_change\", log_scale = TRUE ), cases_gr6 = growth_rate(time_value, cases, method = \"linear_reg\", log_scale = TRUE ), cases_gr7 = growth_rate(time_value, cases, method = \"smooth_spline\", log_scale = TRUE ), cases_gr8 = growth_rate(time_value, cases, method = \"trend_filter\", log_scale = TRUE ) ) x %>% select(geo_value, time_value, cases_gr5, cases_gr6) %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr5 = \"rel_change_log\", cases_gr6 = \"linear_reg_log\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(2, 4)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\") x %>% select(geo_value, time_value, cases_gr7, cases_gr8) %>% pivot_longer( cols = starts_with(\"cases_gr\"), names_to = \"method\", values_to = \"gr\" ) %>% mutate(method = recode(method, cases_gr7 = \"smooth_spline_log\", cases_gr8 = \"trend_filter_log\" )) %>% ggplot(aes(x = time_value, y = gr)) + geom_line(aes(col = method)) + scale_color_manual(values = c(3, 6)) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Growth rate\", col = \"Method\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/growth_rate.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Estimate growth rates in signals","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/outliers.html","id":"outlier-detection","dir":"Articles","previous_headings":"","what":"Outlier detection","title":"Detect and correct outliers in signals","text":"detect_outlr() function allows us run multiple outlier detection methods given signal, (optionally) combine results methods. , ’ll investigate outlier detection results following methods. Detection based rolling median, using detect_outlr_rm(), computes rolling median default window size n time points centered time point consideration, computes thresholds based multiplier times rolling IQR computed residuals. Detection based seasonal-trend decomposition using LOESS (STL), using detect_outlr_stl(), similar rolling median method replaces rolling median fitted values STL. Detection based STL decomposition, without seasonality term, amounts smoothing using LOESS. outlier detection methods specified using tibble passed detect_outlr(), one row per method, whose columms specify outlier detection function, input arguments (nondefault values need supplied), abbreviated name method used tracking results. Abbreviations “rm” “stl” can used built-detection functions detect_outlr_rm() detect_outlr_stl(), respectively. Additionally, ’ll form combined lower upper thresholds, calculated median lower upper thresholds methods time point. Note using combined median threshold equivalent using majority vote across base methods determine whether value outlier. visualize results, first define convenience function plotting. Now produce plots state time, faceting detection method.","code":"detection_methods <- bind_rows( tibble( method = \"rm\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5 )), abbr = \"rm\" ), tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = 7 )), abbr = \"stl_seasonal\" ), tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = NULL )), abbr = \"stl_nonseasonal\" ) ) detection_methods ## # A tibble: 3 × 3 ## method args abbr ## ## 1 rm rm ## 2 stl stl_seasonal ## 3 stl stl_nonseasonal x <- x %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr( x = time_value, y = cases, methods = detection_methods, combiner = \"median\" )) %>% ungroup() %>% unnest(outlier_info) head(x) ## An `epi_df` object, 6 x 15 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2022-05-21 22:17:14.962335 ## ## # A tibble: 6 × 15 ## geo_value time_value cases rm_lower rm_upper rm_replacement stl_seasonal_lower ## * ## 1 fl 2020-06-01 667 345 2195 667 0 ## 2 nj 2020-06-01 486 64.4 926. 486 221. ## 3 fl 2020-06-02 617 406. 2169. 617 0 ## 4 nj 2020-06-02 658 140. 841. 658 245. ## 5 fl 2020-06-03 1317 468. 2142. 1317 0 ## 6 nj 2020-06-03 541 216 756 541 227. ## # ℹ 8 more variables: stl_seasonal_upper , stl_seasonal_replacement , ## # stl_nonseasonal_lower , stl_nonseasonal_upper , ## # stl_nonseasonal_replacement , combined_lower , ## # combined_upper , combined_replacement # Plot outlier detection bands and/or points identified as outliers plot_outlr <- function(x, signal, method_abbr, bands = TRUE, points = TRUE, facet_vars = vars(.data$geo_value), nrow = NULL, ncol = NULL, scales = \"fixed\") { # Convert outlier detection results to long format signal <- rlang::enquo(signal) x_long <- x %>% pivot_longer( cols = starts_with(method_abbr), names_to = c(\"method\", \".value\"), names_pattern = \"(.+)_(.+)\" ) # Start of plot with observed data p <- ggplot() + geom_line(data = x, mapping = aes(x = .data$time_value, y = !!signal)) # If requested, add bands if (bands) { p <- p + geom_ribbon( data = x_long, aes( x = .data$time_value, ymin = .data$lower, ymax = .data$upper, color = .data$method ), fill = NA ) } # If requested, add points if (points) { x_detected <- x_long %>% filter((!!signal < .data$lower) | (!!signal > .data$upper)) p <- p + geom_point( data = x_detected, aes( x = .data$time_value, y = !!signal, color = .data$method, shape = .data$method ) ) } # If requested, add faceting if (!is.null(facet_vars)) { p <- p + facet_wrap(facet_vars, nrow = nrow, ncol = ncol, scales = scales) } return(p) } method_abbr <- c(detection_methods$abbr, \"combined\") plot_outlr(x %>% filter(geo_value == \"fl\"), cases, method_abbr, facet_vars = vars(method), scales = \"free_y\", ncol = 1 ) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs( x = \"Date\", y = \"Reported COVID-19 counts\", color = \"Method\", shape = \"Method\" ) plot_outlr(x %>% filter(geo_value == \"nj\"), cases, method_abbr, facet_vars = vars(method), scales = \"free_y\", ncol = 1 ) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs( x = \"Date\", y = \"Reported COVID-19 counts\", color = \"Method\", shape = \"Method\" )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/outliers.html","id":"outlier-correction","dir":"Articles","previous_headings":"","what":"Outlier correction","title":"Detect and correct outliers in signals","text":"Finally, order correct outliers, can use posited replacement values returned outlier detection method. use replacement value combined method, defined median replacement values base methods time point. advanced correction functionality coming point future.","code":"y <- x %>% mutate(cases_corrected = combined_replacement) %>% select(geo_value, time_value, cases, cases_corrected) y %>% filter(cases != cases_corrected) ## An `epi_df` object, 22 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2022-05-21 22:17:14.962335 ## ## # A tibble: 22 × 4 ## geo_value time_value cases cases_corrected ## * ## 1 fl 2020-07-12 15300 10181 ## 2 nj 2020-07-19 -8 320. ## 3 nj 2020-08-13 694 404. ## 4 nj 2020-08-14 619 397. ## 5 nj 2020-08-16 40 366 ## 6 nj 2020-08-22 555 360 ## 7 fl 2020-09-01 7569 2861. ## 8 nj 2020-10-08 1415 873. ## 9 fl 2020-10-10 0 2660 ## 10 fl 2020-10-11 5570 2660 ## # ℹ 12 more rows ggplot(y, aes(x = time_value)) + geom_line(aes(y = cases), linetype = 2) + geom_line(aes(y = cases_corrected), col = 2) + geom_hline(yintercept = 0, linetype = 3) + facet_wrap(vars(geo_value), scales = \"free_y\", ncol = 1) + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 counts\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/outliers.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Detect and correct outliers in signals","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"optimized-rolling-mean","dir":"Articles","previous_headings":"","what":"Optimized rolling mean","title":"Slide a computation over signal values","text":"first demonstrate apply 7-day trailing average daily cases order smooth signal, passing name column(s) want average first argument epi_slide_mean(). epi_slide_mean () can used averaging. computation per state, first call group_by(). calculation done using data.table::frollmean, whose behavior can adjusted passing relevant arguments via ....","code":"x %>% group_by(geo_value) %>% epi_slide_mean(\"cases\", before = 6) %>% ungroup() %>% head(10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases slide_value_cases ## * ## 1 ca 2020-03-01 6 NA ## 2 ca 2020-03-02 4 NA ## 3 ca 2020-03-03 6 NA ## 4 ca 2020-03-04 11 NA ## 5 ca 2020-03-05 10 NA ## 6 ca 2020-03-06 18 NA ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"slide-with-a-formula","dir":"Articles","previous_headings":"","what":"Slide with a formula","title":"Slide a computation over signal values","text":"previous computation can also performed using epi_slide(), flexible quite bit slower epi_slide_mean(). recommended use epi_slide_mean() possible. 7-day trailing average daily cases can computed passing formula first argument epi_slide(). per state, first call group_by(). formula specified access non-grouping columns present original epi_df object (must refer prefix .x$). can see, function epi_slide() returns epi_df object new column appended contains results (sliding), named slide_value default. can course change post hoc, can instead specify new name front using new_col_name argument: information available additional variables: .group_key one-row tibble containing values grouping variables associated group .ref_time_value reference time value time window based Like group_modify(), alternative names variables well: . can used instead .x, .y instead .group_key, .z instead .ref_time_value.","code":"x %>% group_by(geo_value) %>% epi_slide(~ mean(.x$cases), before = 6) %>% ungroup() %>% head(10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases slide_value ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4 x <- x %>% group_by(geo_value) %>% epi_slide(~ mean(.x$cases), before = 6, new_col_name = \"cases_7dav\") %>% ungroup() head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"slide-with-a-function","dir":"Articles","previous_headings":"","what":"Slide with a function","title":"Slide a computation over signal values","text":"can also pass function first argument epi_slide(). case, passed function must accept following arguments: case, passed function f must accept following arguments: data frame column names original object, minus grouping variables, containing time window data one group-ref_time_value combination; followed one-row tibble containing values grouping variables associated group; followed associated ref_time_value. can accept additional arguments; epi_slide() forward ... args receives f. Recreating last example 7-day trailing average:","code":"x <- x %>% group_by(geo_value) %>% epi_slide(function(x, gk, rtv) mean(x$cases), before = 6, new_col_name = \"cases_7dav\") %>% ungroup() head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"slide-the-tidy-way","dir":"Articles","previous_headings":"","what":"Slide the tidy way","title":"Slide a computation over signal values","text":"Perhaps convenient way setup computation epi_slide() pass expression tidy evaluation. case, can simply define name new column directly part expression, setting equal computation can access columns x name, just call dplyr::mutate(), dplyr verbs. example: addition referring individual columns name, can refer time window data epi_df tibble using .x. Similarly, arguments function format available magic names .group_key .ref_time_value, tidyverse “pronouns” .data .env can also used. simple sanity check, visualize 7-day trailing averages computed top original counts: can see top right panel, looks like Texas moved weekly reporting COVID-19 cases summer 2021.","code":"x <- x %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% ungroup() head(x, 10) ## An `epi_df` object, 10 x 4 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 4 ## geo_value time_value cases cases_7dav ## * ## 1 ca 2020-03-01 6 6 ## 2 ca 2020-03-02 4 5 ## 3 ca 2020-03-03 6 5.33 ## 4 ca 2020-03-04 11 6.75 ## 5 ca 2020-03-05 10 7.4 ## 6 ca 2020-03-06 18 9.17 ## 7 ca 2020-03-07 26 11.6 ## 8 ca 2020-03-08 19 13.4 ## 9 ca 2020-03-09 23 16.1 ## 10 ca 2020-03-10 22 18.4 library(ggplot2) theme_set(theme_bw()) ggplot(x, aes(x = time_value)) + geom_col(aes(y = cases, fill = geo_value), alpha = 0.5, show.legend = FALSE) + geom_line(aes(y = cases_7dav, col = geo_value), show.legend = FALSE) + facet_wrap(~geo_value, scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 cases\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"running-a-local-forecaster","dir":"Articles","previous_headings":"","what":"Running a local forecaster","title":"Slide a computation over signal values","text":"complex example, create forecaster based local (time) autoregression AR model. AR models can fit numerous ways (using base R functions various packages), define “hand” provides advanced example sliding function epi_df object, allows us bit flexible defining probabilistic forecaster: one outputs just point prediction, notion uncertainty around . particular, forecaster output point prediction along 90% uncertainty band, represented predictive quantiles 5% 95% levels (lower upper endpoints uncertainty band). function defined , prob_ar(), probabilistic AR forecaster. lagsargument indicates lags use model, ahead indicates far ahead future make forecasts (encoded terms units time_value column; , days, working epi_df considered vignette). go ahead slide AR forecaster working epi_df COVID-19 cases. Note actually model cases_7dav column, operate scale smoothed COVID-19 cases. clearly equivalent, constant, modeling weekly sums COVID-19 cases. Note utilized argument ref_time_values perform sliding computation (, compute forecast) specific subset reference time values. get three columns fc_point, fc_lower, fc_upper correspond point forecast, lower upper endpoints 95% prediction band, respectively. (instead set as_list_col = TRUE call epi_slide(), gotten list column fc, element fc data frame named columns point, lower, upper.) finish , plot forecasts times (spaced months) last year, multiple horizons: 7, 14, 21, 28 days ahead. , encapsulate process generating forecasts simple function, can call times. Two points worth making. First, AR model’s performance pretty spotty. various points time, can see forecasts volatile (point predictions place), overconfident (bands narrow), time. meant simple demo entirely unexpected given way AR model set . epipredict package, companion package epiprocess, offers suite predictive modeling tools can improve shortcomings simple AR model. Second, AR forecaster using finalized data, meaning, uses latest versions signal values (reported COVID-19 cases) available, training models making predictions historically. However, reflective provisional nature data must cope true forecast task. Training making predictions finalized data can lead overly optimistic sense accuracy; see, example, McDonald et al. (2021), references therein. Fortunately, epiprocess package provides data structure called epi_archive can used store data revisions, furthermore, epi_archive object knows slide computations correct version-aware sense (computation reference time \\(t\\), uses data available \\(t\\)). revisit example archive vignette.","code":"prob_ar <- function(y, lags = c(0, 7, 14), ahead = 6, min_train_window = 20, lower_level = 0.05, upper_level = 0.95, symmetrize = TRUE, intercept = FALSE, nonneg = TRUE) { # Return NA if insufficient training data if (length(y) < min_train_window + max(lags) + ahead) { return(data.frame(point = NA, lower = NA, upper = NA)) } # Build features and response for the AR model dat <- do.call( data.frame, purrr::map(lags, function(j) lag(y, n = j)) ) names(dat) <- paste0(\"x\", seq_len(ncol(dat))) if (intercept) dat$x0 <- rep(1, nrow(dat)) dat$y <- lead(y, n = ahead) # Now fit the AR model and make a prediction obj <- lm(y ~ . + 0, data = dat) point <- predict(obj, newdata = tail(dat, 1)) # Compute a band r <- residuals(obj) s <- ifelse(symmetrize, -1, NA) # Should the residuals be symmetrized? q <- quantile(c(r, s * r), probs = c(lower_level, upper_level), na.rm = TRUE) lower <- point + q[1] upper <- point + q[2] # Clip at zero if we need to, then return if (nonneg) { point <- max(point, 0) lower <- max(lower, 0) upper <- max(upper, 0) } return(data.frame(point = point, lower = lower, upper = upper)) } fc_time_values <- seq(as.Date(\"2020-06-01\"), as.Date(\"2021-12-01\"), by = \"1 months\" ) x %>% group_by(geo_value) %>% epi_slide( fc = prob_ar(cases_7dav), before = 119, ref_time_values = fc_time_values ) %>% ungroup() %>% head(10) ## An `epi_df` object, 10 x 7 with metadata: ## * geo_type = state ## * time_type = day ## * as_of = 2024-01-26 17:27:32.755949 ## ## # A tibble: 10 × 7 ## geo_value time_value cases cases_7dav fc_point fc_lower fc_upper ## * ## 1 ca 2020-06-01 2437 2694 2973. 2566. 3380. ## 2 ca 2020-07-01 7346 6722 7892. 7321. 8462. ## 3 ca 2020-08-01 8616 8284. 7188. 6153. 8223. ## 4 ca 2020-09-01 4248 4707. 4133. 2329. 5937. ## 5 ca 2020-10-01 3504 3360. 3257. 1449. 5064. ## 6 ca 2020-11-01 4210 4441. 3840. 2258. 5422. ## 7 ca 2020-12-01 23626 15690 17699. 16082. 19316. ## 8 ca 2021-01-01 50251 41097. 45534. 38417. 52650. ## 9 ca 2021-02-01 13098 17952. 15266. 6725. 23808. ## 10 ca 2021-03-01 3031 5209 4482. 0 12982. # Note the use of all_rows = TRUE (keeps all original rows in the output) k_week_ahead <- function(x, ahead = 7) { x %>% group_by(.data$geo_value) %>% epi_slide( fc = prob_ar(.data$cases_7dav, ahead = ahead), before = 119, ref_time_values = fc_time_values, all_rows = TRUE ) %>% ungroup() %>% mutate(target_date = .data$time_value + ahead) } # First generate the forecasts, and bind them together z <- bind_rows( k_week_ahead(x, ahead = 7), k_week_ahead(x, ahead = 14), k_week_ahead(x, ahead = 21), k_week_ahead(x, ahead = 28) ) # Now plot them, on top of actual COVID-19 case counts ggplot(z) + geom_line(aes(x = time_value, y = cases_7dav), color = \"gray50\") + geom_ribbon(aes( x = target_date, ymin = fc_lower, ymax = fc_upper, group = time_value ), fill = 6, alpha = 0.4) + geom_line(aes(x = target_date, y = fc_point, group = time_value)) + geom_point(aes(x = target_date, y = fc_point, group = time_value), size = 0.5 ) + geom_vline( data = tibble(x = fc_time_values), aes(xintercept = x), linetype = 2, alpha = 0.5 ) + facet_wrap(vars(geo_value), scales = \"free_y\") + scale_x_date(minor_breaks = \"month\", date_labels = \"%b %y\") + labs(x = \"Date\", y = \"Reported COVID-19 cases\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/articles/slide.html","id":"attribution","dir":"Articles","previous_headings":"","what":"Attribution","title":"Slide a computation over signal values","text":"document contains dataset modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Jacob Bien. Contributor. Logan Brooks. Author, maintainer. Rafael Catoia. Contributor. Nat DeFries. Contributor. Daniel McDonald. Author. Rachel Lobay. Contributor. Ken Mawer. Contributor. Chloe . Contributor. Quang Nguyen. Contributor. Evan Ray. Author. Dmitry Shemetov. Contributor. Ryan Tibshirani. Author. Lionel Henry. Contributor. Author included rlang fragments Hadley Wickham. Contributor. Author included rlang fragments Posit. Copyright holder. Copyright holder included rlang fragments","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Brooks L, McDonald D, Ray E, Tibshirani R (2024). epiprocess: Tools basic signal processing epidemiology. R package version 0.7.11, https://cmu-delphi.github.io/epiprocess/.","code":"@Manual{, title = {epiprocess: Tools for basic signal processing in epidemiology}, author = {Logan Brooks and Daniel McDonald and Evan Ray and Ryan Tibshirani}, year = {2024}, note = {R package version 0.7.11}, url = {https://cmu-delphi.github.io/epiprocess/}, }"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"epiprocess","dir":"","previous_headings":"","what":"Tools for basic signal processing in epidemiology","title":"Tools for basic signal processing in epidemiology","text":"package introduces common data structure epidemiological data sets measured space time, offers associated utilities perform basic signal processing tasks. See getting started guide vignettes examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Tools for basic signal processing in epidemiology","text":"install (unless ’re making changes package, use stable version):","code":"# Stable version pak::pkg_install(\"cmu-delphi/epiprocess@main\") # Dev version pak::pkg_install(\"cmu-delphi/epiprocess@dev\")"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"epi_df-snapshot-of-a-data-set","dir":"","previous_headings":"","what":"epi_df: snapshot of a data set","title":"Tools for basic signal processing in epidemiology","text":"first main data structure epiprocess package called epi_df. simply tibble couple required columns, geo_value time_value. can number columns, can seen measured variables, also call signal variables. brief, epi_df object represents snapshot data set contains --date values signals variables, given time. convention, functions epiprocess package operate epi_df objects begin epi. example: epi_slide(), iteratively applying custom computation variable epi_df object sliding windows time; epi_cor(), computing lagged correlations variables epi_df object, (allowing grouping geo value, time value, variables). Functions package operate directly given variables begin epi. example: growth_rate(), estimating growth rate given signal given time values, using various methodologies; detect_outlr(), detecting outliers given signal time, using either built-custom methodologies.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/index.html","id":"epi_archive-full-version-history-of-a-data-set","dir":"","previous_headings":"","what":"epi_archive: full version history of a data set","title":"Tools for basic signal processing in epidemiology","text":"second main data structure package called epi_archive. special class (R6 format) wrapped around data table stores archive (version history) signal variables interest. convention, functions epiprocess package operate epi_archive objects begin epix (“x” meant remind “archive”). just wrapper functions around public methods epi_archive R6 class. example: epix_as_of(), generating snapshot epi_df format data archive, represents --date values signal variables, specified version; epix_fill_through_version(), filling fake version data following simple rules, use downstream methods expect archive --date (e.g., forecasting deadline date one data sources accessed provide latest versions data) epix_merge(), merging two data archives , support various approaches handling one archives --date version-wise ; epix_slide(), sliding custom computation data archive local windows time, much like epi_slide epi_df object, one key difference: sliding computation given reference time t performed data available t.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"data source based information outpatient visits, provided us health system partners, also contains confirmed COVID-19 cases based reports made available Center Systems Science Engineering Johns Hopkins University. example data ranges June 1, 2020 Dec 1, 2021, also limited California, Florida, Texas, New York.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"","code":"archive_cases_dv_subset"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"epi_archive data format. data table DT 129,638 rows 5 columns: geo_value geographic value associated row measurements. time_value time value associated row measurements. version time value specifying version row measurements. percent_cli percentage doctor’s visits CLI (COVID-like illness) computed medical insurance claims case_rate_7d_av 7-day average signal number new confirmed deaths due COVID-19 per 100,000 population, daily","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/archive_cases_dv_subset.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of daily doctor visits and cases in archive format — archive_cases_dv_subset","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Doctor Visits API: signal percent_cli taken directly API without changes. COVIDcast Epidata API: case_rate_7d_av signal computed Delphi original JHU-CSSE data calculating moving averages preceding 7 days, signal June 7 average underlying data June 1 7, inclusive. Furthermore, data subset full dataset, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert to epi_df format — as_epi_df","title":"Convert to epi_df format — as_epi_df","text":"Converts data frame tibble epi_df object. See getting started guide examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert to epi_df format — as_epi_df","text":"","code":"as_epi_df(x, ...) # S3 method for epi_df as_epi_df(x, ...) # S3 method for tbl_df as_epi_df(x, geo_type, time_type, as_of, additional_metadata = list(), ...) # S3 method for data.frame as_epi_df(x, geo_type, time_type, as_of, additional_metadata = list(), ...) # S3 method for tbl_ts as_epi_df(x, geo_type, time_type, as_of, additional_metadata = list(), ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert to epi_df format — as_epi_df","text":"x data.frame, tibble::tibble, tsibble::tsibble converted ... Additional arguments passed methods. geo_type Type geo values. missing, function attempt infer geo values present; fails, set \"custom\". time_type Type time values. missing, function attempt infer time values present; fails, set \"custom\". as_of Time value representing time given data available. example, as_of January 31, 2022, epi_df object created represent --date version data available January 31, 2022. as_of argument missing, current day-time used. additional_metadata List additional metadata attach epi_df object. metadata geo_type, time_type, as_of fields; named entries passed list included well. tibble additional keys, sure specify character vector other_keys component additional_metadata.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert to epi_df format — as_epi_df","text":"epi_df object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"methods-by-class-","dir":"Reference","previous_headings":"","what":"Methods (by class)","title":"Convert to epi_df format — as_epi_df","text":"as_epi_df(epi_df): Simply returns epi_df object unchanged. as_epi_df(tbl_df): input tibble x must contain columns geo_value time_value. columns preserved , treated measured variables. as_of missing, function try guess as_of, issue, version column x (present), as_of field metadata (stored attributes); fails, current day-time used. as_epi_df(data.frame): Works analogously as_epi_df.tbl_df(). as_epi_df(tbl_ts): Works analogously as_epi_df.tbl_df(), except tbl_ts class dropped, key variables (\"geo_value\") added metadata returned object, other_keys field.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_epi_df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert to epi_df format — as_epi_df","text":"","code":"# Convert a `tsibble` that has county code as an extra key # Notice that county code should be a character string to preserve any leading zeroes ex1_input <- tibble::tibble( geo_value = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), county_code = c( \"06059\", \"06061\", \"06067\", \"12111\", \"12113\", \"12117\", \"42101\", \"42103\", \"42105\" ), time_value = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\" ), length.out = length(geo_value)), value = 1:length(geo_value) + 0.01 * rnorm(length(geo_value)) ) %>% tsibble::as_tsibble(index = time_value, key = c(geo_value, county_code)) # The `other_keys` metadata (`\"county_code\"` in this case) is automatically # inferred from the `tsibble`'s `key`: ex1 <- as_epi_df(x = ex1_input, geo_type = \"state\", time_type = \"day\", as_of = \"2020-06-03\") attr(ex1, \"metadata\")[[\"other_keys\"]] #> [1] \"county_code\" # Dealing with misspecified column names: # Geographical and temporal information must be provided in columns named # `geo_value` and `time_value`; if we start from a data frame with a # different format, it must be converted to use `geo_value` and `time_value` # before calling `as_epi_df`. ex2_input <- tibble::tibble( state = rep(c(\"ca\", \"fl\", \"pa\"), each = 3), # misnamed pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 3), # extra key reported_date = rep(seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-03\"), by = \"day\" ), length.out = length(state)), # misnamed value = 1:length(state) + 0.01 * rnorm(length(state)) ) print(ex2_input) #> # A tibble: 9 × 4 #> state pol reported_date value #> #> 1 ca blue 2020-06-01 0.997 #> 2 ca blue 2020-06-02 1.99 #> 3 ca blue 2020-06-03 3.01 #> 4 fl swing 2020-06-01 4.02 #> 5 fl swing 2020-06-02 4.98 #> 6 fl swing 2020-06-03 6.01 #> 7 pa swing 2020-06-01 6.98 #> 8 pa swing 2020-06-02 7.99 #> 9 pa swing 2020-06-03 9.00 ex2 <- ex2_input %>% dplyr::rename(geo_value = state, time_value = reported_date) %>% as_epi_df( geo_type = \"state\", as_of = \"2020-06-03\", additional_metadata = list(other_keys = \"pol\") ) attr(ex2, \"metadata\") #> $geo_type #> [1] \"state\" #> #> $time_type #> [1] \"day\" #> #> $as_of #> [1] \"2020-06-03\" #> #> $other_keys #> [1] \"pol\" #> # Adding additional keys to an `epi_df` object ex3_input <- jhu_csse_county_level_subset %>% dplyr::filter(time_value > \"2021-12-01\", state_name == \"Massachusetts\") %>% dplyr::slice_tail(n = 6) ex3 <- ex3_input %>% tsibble::as_tsibble() %>% # needed to add the additional metadata # add 2 extra keys dplyr::mutate( state = rep(\"MA\", 6), pol = rep(c(\"blue\", \"swing\", \"swing\"), each = 2) ) %>% # the 2 extra keys we added have to be specified in the other_keys # component of additional_metadata. as_epi_df(additional_metadata = list(other_keys = c(\"state\", \"pol\"))) attr(ex3, \"metadata\") #> $geo_type #> [1] \"county\" #> #> $time_type #> [1] \"week\" #> #> $as_of #> [1] \"2024-06-21 11:34:26 UTC\" #> #> $other_keys #> [1] \"state\" \"pol\" #>"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tibble.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert to tibble — as_tibble.epi_df","title":"Convert to tibble — as_tibble.epi_df","text":"Converts epi_df object tibble, dropping metadata grouping.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tibble.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert to tibble — as_tibble.epi_df","text":"","code":"# S3 method for epi_df as_tibble(x, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tibble.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert to tibble — as_tibble.epi_df","text":"x epi_df ... additional arguments forward NextMethod()","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tsibble.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert to tsibble format — as_tsibble.epi_df","title":"Convert to tsibble format — as_tsibble.epi_df","text":"Converts epi_df object tsibble, index taken time_value, key variables taken geo_value along others other_keys field metadata, else explicitly set.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tsibble.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert to tsibble format — as_tsibble.epi_df","text":"","code":"# S3 method for epi_df as_tsibble(x, key, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/as_tsibble.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert to tsibble format — as_tsibble.epi_df","text":"x epi_df key Optional. additional keys (geo_value) add tsibble. ... additional arguments passed tsibble::as_tsibble()","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Automatically plot an epi_df — autoplot.epi_df","title":"Automatically plot an epi_df — autoplot.epi_df","text":"Automatically plot epi_df","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Automatically plot an epi_df — autoplot.epi_df","text":"","code":"# S3 method for epi_df autoplot( object, ..., .color_by = c(\"all_keys\", \"geo_value\", \"other_keys\", \".response\", \"all\", \"none\"), .facet_by = c(\".response\", \"other_keys\", \"all_keys\", \"geo_value\", \"all\", \"none\"), .base_color = \"#3A448F\", .max_facets = Inf )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Automatically plot an epi_df — autoplot.epi_df","text":"object epi_df ... One unquoted expressions separated commas. Variable names can used positions data frame, expressions like x:y can used select range variables. .color_by variables determine color(s) used plot lines. Options include: all_keys - default uses interaction key variables including geo_value geo_value - geo_value other_keys - available keys geo_value .response - numeric variables (y-axis) - uses interaction keys numeric variables none - coloring aesthetic applied .facet_by Similar .color_by except default display numeric variable separate facet .base_color Lines shown color. example, single numeric variable faceting geo_value, locations share color line. .max_facets Cut number facets displayed. Especially useful testing many geo_value's keys.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Automatically plot an epi_df — autoplot.epi_df","text":"ggplot object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/autoplot.epi_df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Automatically plot an epi_df — autoplot.epi_df","text":"","code":"autoplot(jhu_csse_daily_subset, cases, death_rate_7d_av) autoplot(jhu_csse_daily_subset, case_rate_7d_av, .facet_by = \"geo_value\") autoplot(jhu_csse_daily_subset, case_rate_7d_av, .color_by = \"none\", .facet_by = \"geo_value\" ) autoplot(jhu_csse_daily_subset, case_rate_7d_av, .color_by = \"none\", .base_color = \"red\", .facet_by = \"geo_value\" ) # .base_color specification won't have any effect due .color_by default autoplot(jhu_csse_daily_subset, case_rate_7d_av, .base_color = \"red\", .facet_by = \"geo_value\" )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/clone.html","id":null,"dir":"Reference","previous_headings":"","what":"Clone an epi_archive object. — clone","title":"Clone an epi_archive object. — clone","text":"Clone epi_archive object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/clone.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clone an epi_archive object. — clone","text":"","code":"clone(x) # S3 method for epi_archive clone(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/clone.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Clone an epi_archive object. — clone","text":"x epi_archive object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/compactify.html","id":null,"dir":"Reference","previous_headings":"","what":"Compactify — compactify","title":"Compactify — compactify","text":"section describes internals compactification works epi_archive(). Compactification can potentially improve code speed memory usage, depending data.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/compactify.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compactify — compactify","text":"general, last version observation carried forward (LOCF) fill data recorded versions, last recorded update versions_end. One consequence DT contain full snapshot every version (although generally works), can instead contain rows new changed previous version (see compactify, automatically). Currently, deletions must represented revising row special state (e.g., making entries NA including special column flags data removed performing kind post-processing), archive unaware state . Note NAs can introduced epi_archive methods reasons, e.g., epix_fill_through_version epix_merge, requested, represent potential update data yet access ; epix_merge represent \"value\" observation version first released, version observation appears archive data .","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":null,"dir":"Reference","previous_headings":"","what":"Detect outliers — detect_outlr","title":"Detect outliers — detect_outlr","text":"Applies one outlier detection methods given signal variable, optionally aggregates outputs create consensus result. See outliers vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Detect outliers — detect_outlr","text":"","code":"detect_outlr( x = seq_along(y), y, methods = tibble::tibble(method = \"rm\", args = list(list()), abbr = \"rm\"), combiner = c(\"median\", \"mean\", \"none\") )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Detect outliers — detect_outlr","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. methods tibble specifying method(s) use outlier detection, one row per method, following columns: method: Either \"rm\" \"stl\", custom function outlier detection; see details explanation. args: Named list arguments passed detection method. abbr: Abbreviation use naming output columns results method. combiner String, one \"median\", \"mean\", \"none\", specifying combine results different outlier detection methods thresholds determining whether particular observation classified outlier, well replacement value outliers. \"none\", summarized results calculated. Note number methods (number rows) odd, \"median\" equivalent majority vote purposes determining whether given observation outlier.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Detect outliers — detect_outlr","text":"tibble number rows equal length(y) columns giving outlier detection thresholds (lower upper) replacement values detection method (replacement).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Detect outliers — detect_outlr","text":"outlier detection method, one per row passed methods tibble, function must take first two arguments x y, number additional arguments. function must return tibble number rows equal length(y), columns lower, upper, replacement, representing lower upper bounds considered outlier, posited replacement value, respectively. convenience, outlier detection method can specified (method column methods) string \"rm\", shorthand detect_outlr_rm(), detects outliers via rolling median; \"stl\", shorthand detect_outlr_stl(), detects outliers via STL decomposition.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Detect outliers — detect_outlr","text":"","code":"detection_methods <- dplyr::bind_rows( dplyr::tibble( method = \"rm\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5 )), abbr = \"rm\" ), dplyr::tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = 7 )), abbr = \"stl_seasonal\" ), dplyr::tibble( method = \"stl\", args = list(list( detect_negatives = TRUE, detection_multiplier = 2.5, seasonal_period = NULL )), abbr = \"stl_nonseasonal\" ) ) x <- incidence_num_outlier_example %>% dplyr::select(geo_value, time_value, cases) %>% as_epi_df() %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr( x = time_value, y = cases, methods = detection_methods, combiner = \"median\" )) %>% unnest(outlier_info)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":null,"dir":"Reference","previous_headings":"","what":"Detect outliers based on a rolling median — detect_outlr_rm","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"Detects outliers based distance rolling median specified terms multiples rolling interquartile range (IQR).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"","code":"detect_outlr_rm( x = seq_along(y), y, n = 21, log_transform = FALSE, detect_negatives = FALSE, detection_multiplier = 2, min_radius = 0, replacement_multiplier = 0 )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. n Number time steps use rolling window. Default 21. value centrally aligned. n odd number, rolling window extends (n-1)/2 time steps design point (n-1)/2 time steps . n even, rolling range extends n/2-1 time steps n/2 time steps . log_transform log transform applied running outlier detection? Default FALSE. TRUE, zeros present, log transform padded 1. detect_negatives negative values automatically count outliers? Default FALSE. detection_multiplier Value determining far outlier detection thresholds rolling median, calculated (rolling median) +/- (detection multiplier) * (rolling IQR). Default 2. min_radius Minimum distance rolling median threshold, transformed scale. Default 0. replacement_multiplier Value determining far replacement values rolling median. replacement original value within detection thresholds, otherwise rounded nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default 0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"tibble number rows equal length(y) columns giving outlier detection thresholds (lower upper) replacement values detection method (replacement).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_rm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Detect outliers based on a rolling median — detect_outlr_rm","text":"","code":"# Detect outliers based on a rolling median incidence_num_outlier_example %>% dplyr::select(geo_value, time_value, cases) %>% as_epi_df() %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr_rm( x = time_value, y = cases )) %>% unnest(outlier_info) #> An `epi_df` object, 730 x 6 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2022-05-21 22:17:14.962335 #> #> # A tibble: 730 × 6 #> # Groups: geo_value [2] #> geo_value time_value cases lower upper replacement #> * #> 1 fl 2020-06-01 667 530 2010 667 #> 2 nj 2020-06-01 486 150. 840. 486 #> 3 fl 2020-06-02 617 582. 1992. 617 #> 4 nj 2020-06-02 658 210. 771. 658 #> 5 fl 2020-06-03 1317 635 1975 1317 #> 6 nj 2020-06-03 541 270 702 541 #> 7 fl 2020-06-04 1419 713 1909 1419 #> 8 nj 2020-06-04 478 174. 790. 478 #> 9 fl 2020-06-05 1305 553 2081 1305 #> 10 nj 2020-06-05 825 118. 838. 825 #> # ℹ 720 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":null,"dir":"Reference","previous_headings":"","what":"Detect outliers based on an STL decomposition — detect_outlr_stl","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"Detects outliers based seasonal-trend decomposition using LOESS (STL).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"","code":"detect_outlr_stl( x = seq_along(y), y, n_trend = 21, n_seasonal = 21, n_threshold = 21, seasonal_period = NULL, log_transform = FALSE, detect_negatives = FALSE, detection_multiplier = 2, min_radius = 0, replacement_multiplier = 0 )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. n_trend Number time steps use rolling window trend. Default 21. n_seasonal Number time steps use rolling window seasonality. Default 21. n_threshold Number time steps use rolling window IQR outlier thresholds. seasonal_period Integer specifying period seasonality. example, daily data, period 7 means weekly seasonality. default NULL, meaning seasonal term included STL decomposition. log_transform log transform applied running outlier detection? Default FALSE. TRUE, zeros present, log transform padded 1. detect_negatives negative values automatically count outliers? Default FALSE. detection_multiplier Value determining far outlier detection thresholds rolling median, calculated (rolling median) +/- (detection multiplier) * (rolling IQR). Default 2. min_radius Minimum distance rolling median threshold, transformed scale. Default 0. replacement_multiplier Value determining far replacement values rolling median. replacement original value within detection thresholds, otherwise rounded nearest (rolling median) +/- (replacement multiplier) * (rolling IQR). Default 0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"tibble number rows equal length(y) columns giving outlier detection thresholds (lower upper) replacement values detection method (replacement).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"STL decomposition computed using feasts package. computed, outlier detection method analogous rolling median method detect_outlr_rm(), except fitted values residuals STL decomposition taking place rolling median residuals rolling median, respectively. last set arguments, log_transform replacement_multiplier, exactly detect_outlr_rm().","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/detect_outlr_stl.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Detect outliers based on an STL decomposition — detect_outlr_stl","text":"","code":"# Detects outliers based on a seasonal-trend decomposition using LOESS incidence_num_outlier_example %>% dplyr::select(geo_value, time_value, cases) %>% as_epi_df() %>% group_by(geo_value) %>% mutate(outlier_info = detect_outlr_stl( x = time_value, y = cases, seasonal_period = 7 )) %>% # weekly seasonality for daily data unnest(outlier_info) #> An `epi_df` object, 730 x 6 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2022-05-21 22:17:14.962335 #> #> # A tibble: 730 × 6 #> # Groups: geo_value [2] #> geo_value time_value cases lower upper replacement #> * #> 1 fl 2020-06-01 667 -1193. 1233. 667 #> 2 nj 2020-06-01 486 281. 762. 486 #> 3 fl 2020-06-02 617 -691. 1890. 617 #> 4 nj 2020-06-02 658 317. 891. 658 #> 5 fl 2020-06-03 1317 -144. 2396. 1317 #> 6 nj 2020-06-03 541 292. 809. 541 #> 7 fl 2020-06-04 1419 260. 2696. 1419 #> 8 nj 2020-06-04 478 315. 792. 478 #> 9 fl 2020-06-05 1305 548. 2950. 1305 #> 10 nj 2020-06-05 825 382. 835. 825 #> # ℹ 720 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":null,"dir":"Reference","previous_headings":"","what":"epi_archive object — epi_archive","title":"epi_archive object — epi_archive","text":"epi_archive S3 class contains data table along several relevant pieces metadata. data table can seen full archive (version history) signal variables interest.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"epi_archive object — epi_archive","text":"","code":"new_epi_archive( x, geo_type = NULL, time_type = NULL, other_keys = NULL, additional_metadata = NULL, compactify = NULL, clobberable_versions_start = NULL, versions_end = NULL ) validate_epi_archive( x, geo_type = NULL, time_type = NULL, other_keys = NULL, additional_metadata = NULL, compactify = NULL, clobberable_versions_start = NULL, versions_end = NULL ) as_epi_archive( x, geo_type = NULL, time_type = NULL, other_keys = NULL, additional_metadata = NULL, compactify = NULL, clobberable_versions_start = NULL, versions_end = NULL )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"epi_archive object — epi_archive","text":"x data.frame, data.table, tibble, columns geo_value, time_value, version, additional number columns. geo_type Type geo values. missing, function attempt infer geo values present; fails, set \"custom\". time_type Type time values. missing, function attempt infer time values present; fails, set \"custom\". other_keys Character vector specifying names variables x considered key variables (language data.table) apart \"geo_value\", \"time_value\", \"version\". additional_metadata List additional metadata attach epi_archive object. metadata geo_type time_type fields; named entries passed list included well. compactify Optional; Boolean NULL. TRUE remove redundant rows, FALSE , missing NULL remove redundant rows, issue warning. See information compactify. clobberable_versions_start Optional; length-1; either value class typeof x$version, NA class typeof: specifically, either () earliest version subject \"clobbering\" (overwritten different update data, using version tag old update data), (b) NA, indicate versions clobberable. variety reasons versions clobberable routine circumstances, () today's version one/columns published initially filled NA LOCF, (b) buggy version today's data published fixed republished later day, (c) data pipeline delays (e.g., publisher uploading, periodic scraping, database syncing, periodic fetching, etc.) make events () (b) reflected later day (even different day) expected; potential causes vary different data pipelines. default value NA, consider versions clobberable. Another setting may appropriate pipelines max_version_with_row_in(x). versions_end Optional; length-1, class typeof x$version: last version observed? default max_version_with_row_in(x), values greater also valid, indicate observed additional versions data beyond max(x$version), contained empty updates. (default value clobberable_versions_start fully trust empty updates, assumes version >= max(x$version) clobbered.) nrow(x) == 0, argument mandatory.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"epi_archive object — epi_archive","text":"epi_archive object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"epi_archive object — epi_archive","text":"Epi Archive epi_archive contains data table DT, class data.table data.table package, (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. version: time value specifying version row measurements. example, given row version January 15, 2022 time_value January 14, 2022, row contains measurements data January 14, 2022 available one day later. data table DT key variables geo_value, time_value, version, well others (can specified instantiating epi_archive object via other_keys argument, /set operating DT directly). Refer documentation as_epi_archive() information examples relevant parameter names epi_archive object. Note can single row per unique combination key variables, thus key variables critical figuring generate snapshot data archive, given version.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"metadata","dir":"Reference","previous_headings":"","what":"Metadata","title":"epi_archive object — epi_archive","text":"following pieces metadata included fields epi_archive object: geo_type: type geo values. time_type: type time values. additional_metadata: list additional metadata data archive. Unlike epi_df object, metadata epi_archive object x can accessed (altered) directly, x$geo_type x$time_type, etc. Like epi_df object, geo_type time_type fields metadata epi_archive object currently used downstream functions epiprocess package, serve useful bits information convey data set hand.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"generating-snapshots","dir":"Reference","previous_headings":"","what":"Generating Snapshots","title":"epi_archive object — epi_archive","text":"epi_archive object can used generate snapshot data epi_df format, represents --date values signal variables, specified version. accomplished calling epix_as_of().","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"sliding-computations","dir":"Reference","previous_headings":"","what":"Sliding Computations","title":"epi_archive object — epi_archive","text":"can run sliding computation epi_archive object, much like epi_slide() epi_df object. accomplished calling slide() method epi_archive object, works similarly way epi_slide() works epi_df object, one key difference: version-aware. , epi_archive object, sliding computation given reference time point t performed data available t.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_archive.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"epi_archive object — epi_archive","text":"","code":"# Simple ex. with necessary keys tib <- tibble::tibble( geo_value = rep(c(\"ca\", \"hi\"), each = 5), time_value = rep(seq(as.Date(\"2020-01-01\"), by = 1, length.out = 5 ), times = 2), version = rep(seq(as.Date(\"2020-01-02\"), by = 1, length.out = 5 ), times = 2), value = rnorm(10, mean = 2, sd = 1) ) toy_epi_archive <- tib %>% as_epi_archive( geo_type = \"state\", time_type = \"day\" ) toy_epi_archive #> → An `epi_archive` object, with metadata: #> ℹ Min/max time values: 2020-01-01 / 2020-01-05 #> ℹ First/last version with update: 2020-01-02 / 2020-01-06 #> ℹ Versions end: 2020-01-06 #> ℹ A preview of the table (10 rows x 4 columns): #> Key: #> geo_value time_value version value #> #> 1: ca 2020-01-01 2020-01-02 2.5429963 #> 2: ca 2020-01-02 2020-01-03 1.0859252 #> 3: ca 2020-01-03 2020-01-04 2.4681544 #> 4: ca 2020-01-04 2020-01-05 2.3629513 #> 5: ca 2020-01-05 2020-01-06 0.6954565 #> 6: hi 2020-01-01 2020-01-02 2.7377763 #> 7: hi 2020-01-02 2020-01-03 3.8885049 #> 8: hi 2020-01-03 2020-01-04 1.9025549 #> 9: hi 2020-01-04 2020-01-05 1.0641526 #> 10: hi 2020-01-05 2020-01-06 1.9840497 # Ex. with an additional key for county df <- data.frame( geo_value = c(replicate(2, \"ca\"), replicate(2, \"fl\")), county = c(1, 3, 2, 5), time_value = c( \"2020-06-01\", \"2020-06-02\", \"2020-06-01\", \"2020-06-02\" ), version = c( \"2020-06-02\", \"2020-06-03\", \"2020-06-02\", \"2020-06-03\" ), cases = c(1, 2, 3, 4), cases_rate = c(0.01, 0.02, 0.01, 0.05) ) x <- df %>% as_epi_archive( geo_type = \"state\", time_type = \"day\", other_keys = \"county\" )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute correlations between variables in an epi_df object — epi_cor","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"Computes correlations variables epi_df object, allowing grouping geo value, time value, variables. See correlation vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"","code":"epi_cor( x, var1, var2, dt1 = 0, dt2 = 0, shift_by = geo_value, cor_by = geo_value, use = \"na.or.complete\", method = c(\"pearson\", \"kendall\", \"spearman\") )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"x epi_df object consideration. var1, var2 variables x correlate. dt1, dt2 Time shifts consider two variables, respectively, computing correlations. Negative shifts translate lag value positive shifts lead value; example, dt = -1, new value June 2 original value June 1; dt = 1, new value June 2 original value June 3; dt = 0, values left . Default 0 dt1 dt2. shift_by variables(s) group , time shifts. default geo_value. However, also use, example, shift_by = c(geo_value, age_group), assuming x column age_group, perform time shifts per geo value age group. omit grouping entirely, use cor_by = NULL. Note grouping always undone correlation computations. cor_by variable(s) group , correlation computations. geo_value, default, correlations computed geo value, time; time_value, correlations computed time, geo values. grouping can also specified using number columns x; example, can use cor_by = c(geo_value, age_group), assuming x column age_group, order compute correlations pair geo value age group. omit grouping entirely, use cor_by = NULL. Note grouping always done time shifts. use, method Arguments pass cor(), \"na..complete\" default use (different cor()) \"pearson\" default method (cor()).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"tibble grouping columns first (geo_value, time_value, possibly others), column cor, gives correlation.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_cor.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute correlations between variables in an epi_df object — epi_cor","text":"","code":"# linear association of case and death rates on any given day epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = \"time_value\" ) #> Warning: There were 3 warnings in `dplyr::summarize()`. #> The first warning was: #> ℹ In argument: `cor = cor(x = .data$var1, y = .data$var2, use = use, method = #> method)`. #> ℹ In group 1: `time_value = 2020-03-01`. #> Caused by warning in `cor()`: #> ! the standard deviation is zero #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings. #> # A tibble: 671 × 2 #> time_value cor #> #> 1 2020-03-01 NA #> 2 2020-03-02 NA #> 3 2020-03-03 NA #> 4 2020-03-04 0.746 #> 5 2020-03-05 0.549 #> 6 2020-03-06 0.692 #> 7 2020-03-07 0.277 #> 8 2020-03-08 -0.226 #> 9 2020-03-09 -0.195 #> 10 2020-03-10 -0.227 #> # ℹ 661 more rows # correlation of death rates and lagged case rates epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = time_value, dt1 = -2 ) #> Warning: There was 1 warning in `dplyr::summarize()`. #> ℹ In argument: `cor = cor(x = .data$var1, y = .data$var2, use = use, method = #> method)`. #> ℹ In group 3: `time_value = 2020-03-03`. #> Caused by warning in `cor()`: #> ! the standard deviation is zero #> # A tibble: 671 × 2 #> time_value cor #> #> 1 2020-03-01 NA #> 2 2020-03-02 NA #> 3 2020-03-03 NA #> 4 2020-03-04 0.989 #> 5 2020-03-05 0.907 #> 6 2020-03-06 0.746 #> 7 2020-03-07 0.549 #> 8 2020-03-08 -0.158 #> 9 2020-03-09 -0.126 #> 10 2020-03-10 -0.163 #> # ℹ 661 more rows # correlation grouped by location epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = geo_value ) #> # A tibble: 6 × 2 #> geo_value cor #> #> 1 ca 0.573 #> 2 fl 0.488 #> 3 ga 0.465 #> 4 ny 0.285 #> 5 pa 0.708 #> 6 tx 0.750 # correlation grouped by location and incorporates lagged cases rates epi_cor( x = jhu_csse_daily_subset, var1 = case_rate_7d_av, var2 = death_rate_7d_av, cor_by = geo_value, dt1 = -2 ) #> # A tibble: 6 × 2 #> geo_value cor #> #> 1 ca 0.618 #> 2 fl 0.576 #> 3 ga 0.525 #> 4 ny 0.337 #> 5 pa 0.734 #> 6 tx 0.784"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"epi_df object — epi_df","title":"epi_df object — epi_df","text":"epi_df tibble certain minimal column structure metadata. can seen snapshot data set contains --date values signal variables interest, given time.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"epi_df object — epi_df","text":"epi_df tibble (least) following columns: geo_value: geographic value associated row measurements. time_value: time value associated row measurements. columns can considered measured variables, also refer signal variables. epi_df object also metadata (least) following fields: geo_type: type geo values. time_type: type time values. as_of: time value given data available. Metadata epi_df object x can accessed (altered) via attributes(x)$metadata. first two fields list, geo_type time_type, can usually inferred geo_value time_value columns, respectively. currently used downstream functions epiprocess package, serve useful bits information convey data set hand. information coding given . last field list, as_of, one unique aspects epi_df object. brief, can think epi_df object single snapshot data set contains --date values signals variables, time specified as_of field. companion object epi_archive object, contains full version history given data set. Revisions common many types epidemiological data streams, paying attention data revisions can important sorts downstream data analysis modeling tasks. See documentation epi_archive details data versioning works epiprocess package (including generate epi_df objects, data snapshots, epi_archive object).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":"geo-types","dir":"Reference","previous_headings":"","what":"Geo Types","title":"epi_df object — epi_df","text":"following geo types recognized epi_df. \"county\": observation corresponds U.S. county; coded 5-digit FIPS code. \"hrr\": observation corresponds U.S. hospital referral region (designed represent regional healthcare markets); 306 HRRs U.S; coded number (nonconsecutive, 1 457). \"state\": observation corresponds U.S. state; coded 2-digit postal abbreviation (lowercase); note Puerto Rico \"pr\" Washington D.C. \"dc\". \"hhs\": observation corresponds U.S. HHS region; coded number (consecutive, 1 10). \"nation\": observation corresponds country; coded ISO 31661- alpha-2 country codes (lowercase). unrecognizable geo type labeled \"custom\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_df.html","id":"time-types","dir":"Reference","previous_headings":"","what":"Time Types","title":"epi_df object — epi_df","text":"following time types recognized epi_df. \"day-time\": observation corresponds time given day (measured second); coded POSIXct object, .POSIXct(\"2022-01-31 18:45:40\"). \"day\": observation corresponds day; coded Date object, .Date(\"2022-01-31\"). \"week\": observation corresponds week; alignment can arbitrary (whether week starts Monday, Tuesday); coded Date object, representing start date week. \"yearweek\": observation corresponds week; alignment can arbitrary; coded tsibble::yearweek object, alignment stored week_start field attributes. \"yearmonth\": observation corresponds month; coded tsibble::yearmonth object. \"yearquarter\": observation corresponds quarter; coded tsibble::yearquarter object. \"year\": observation corresponds year; coded integer greater equal 1582. unrecognizable time type labeled \"custom\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":null,"dir":"Reference","previous_headings":"","what":"Slide a function over variables in an epi_df object — epi_slide","title":"Slide a function over variables in an epi_df object — epi_slide","text":"Slides given function variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Slide a function over variables in an epi_df object — epi_slide","text":"","code":"epi_slide( x, f, ..., before, after, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Slide a function over variables in an epi_df object — epi_slide","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. f Function, formula, missing; together ... specifies computation slide. \"slide\" means apply computation within sliding (.k.. \"rolling\") time window data group. window determined parameters described . One time step typically one day one week; see details explanation. function, f must take data frame column names original object, minus grouping variables, containing time window data one group-ref_time_value combination; followed one-row tibble containing values grouping variables associated group; followed number named arguments. formula, f can operate directly columns accessed via .x$var .$var, ~mean(.x$var) compute mean column var ref_time_value-group combination. group key can accessed via .y. f missing, ... specify computation. ... Additional arguments pass function formula specified via f. Alternatively, f missing, ... interpreted expression tidy evaluation; addition referring columns directly name, expression access .data .env pronouns dplyr verbs, can also refer .x, .group_key, .ref_time_value. See details. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name String indicating name new column contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column name overwrite column. as_list_col slide results held list column, unchopped/unnested? Default FALSE, case list object returned f unnested (using tidyr::unnest()), , slide computations output data frames, names resulting columns given prepending new_col_name names list elements. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Slide a function over variables in an epi_df object — epi_slide","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Slide a function over variables in an epi_df object — epi_slide","text":"\"slide\" means apply function formula rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date(\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide() still attempts perform computation anyway (require complete window). issue partial computations (run incomplete windows) therefore left user, either specified function formula f, post-processing. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.) f missing, expression tidy evaluation can specified, example, : equivalent : Thus, clear, computation specified via expression tidy evaluation (first example, ), name new column inferred given expression overrides name passed explicitly new_col_name argument.","code":"epi_slide(x, cases_7dav = mean(cases), before = 6) epi_slide(x, function(x, g) mean(x$cases), before = 6, new_col_name = \"cases_7dav\")"},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Slide a function over variables in an epi_df object — epi_slide","text":"","code":"# slide a 7-day trailing average formula on cases # Simple sliding means and sums are much faster to do using # the `epi_slide_mean` and `epi_slide_sum` functions instead. jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6 #> 2 ca 2020-03-02 4 5 #> 3 ca 2020-03-03 6 5.33 #> 4 ca 2020-03-04 11 6.75 #> 5 ca 2020-03-05 10 7.4 #> 6 ca 2020-03-06 18 9.17 #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day leading average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), after = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 11.6 #> 2 ca 2020-03-02 4 13.4 #> 3 ca 2020-03-03 6 16.1 #> 4 ca 2020-03-04 11 18.4 #> 5 ca 2020-03-05 10 20.4 #> 6 ca 2020-03-06 18 25.1 #> 7 ca 2020-03-07 26 30.1 #> 8 ca 2020-03-08 19 34.4 #> 9 ca 2020-03-09 23 37.3 #> 10 ca 2020-03-10 22 56.7 #> # ℹ 4,016 more rows # slide a 7-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_7dav = mean(cases), before = 3, after = 3) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6.75 #> 2 ca 2020-03-02 4 7.4 #> 3 ca 2020-03-03 6 9.17 #> 4 ca 2020-03-04 11 11.6 #> 5 ca 2020-03-05 10 13.4 #> 6 ca 2020-03-06 18 16.1 #> 7 ca 2020-03-07 26 18.4 #> 8 ca 2020-03-08 19 20.4 #> 9 ca 2020-03-09 23 25.1 #> 10 ca 2020-03-10 22 30.1 #> # ℹ 4,016 more rows # slide a 14-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide(cases_14dav = mean(cases), before = 6, after = 7) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_14dav) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_14dav #> * #> 1 ca 2020-03-01 6 12.5 #> 2 ca 2020-03-02 4 13.7 #> 3 ca 2020-03-03 6 14.5 #> 4 ca 2020-03-04 11 15.5 #> 5 ca 2020-03-05 10 17.8 #> 6 ca 2020-03-06 18 20.5 #> 7 ca 2020-03-07 26 23 #> 8 ca 2020-03-08 19 25.4 #> 9 ca 2020-03-09 23 36.4 #> 10 ca 2020-03-10 22 42 #> # ℹ 4,016 more rows # nested new columns jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide( a = data.frame( cases_2dav = mean(cases), cases_2dma = mad(cases) ), before = 1, as_list_col = TRUE ) %>% ungroup() #> An `epi_df` object, 4,026 x 7 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 7 #> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av a #> * #> 1 ca 2020-03-01 6 1.29 0.00327 0 #> 2 ca 2020-03-02 4 1.71 0.00435 0 #> 3 ca 2020-03-03 6 2.43 0.00617 0 #> 4 ca 2020-03-04 11 3.86 0.00980 0.000363 #> 5 ca 2020-03-05 10 5.29 0.0134 0.000363 #> 6 ca 2020-03-06 18 7.86 0.0200 0.000363 #> 7 ca 2020-03-07 26 11.6 0.0294 0.000363 #> 8 ca 2020-03-08 19 13.4 0.0341 0.000363 #> 9 ca 2020-03-09 23 16.1 0.0410 0.000726 #> 10 ca 2020-03-10 22 18.4 0.0468 0.000726 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":null,"dir":"Reference","previous_headings":"","what":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"Slides n-timestep mean variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"","code":"epi_slide_mean( x, col_names, ..., before, after, ref_time_values, time_step, new_col_name = NULL, as_list_col = NULL, names_sep = NULL, all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. col_names unquoted column name(e.g., cases), multiple column names (e.g., c(cases, deaths)), tidy-select expression. Variable names can used positions data frame, expressions like x:y can used select range variables. desired column names stored vector vars, use col_names = all_of(vars). tidy-selection renaming interface supported, used provide output column names; want customize output column names, use dplyr::rename slide. ... Additional arguments pass data.table::frollmean, example, na.rm algo. data.table::frollmean automatically passed data x operate , window size n, alignment align. Providing args via ... cause error. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name Character vector indicating name(s) new column(s) contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column names overwrite columns. names_sep NULL, new_col_name must length col_names. as_list_col supported. Included match epi_slide interface. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"Wrapper around epi_slide_opt f = datatable::frollmean. \"slide\" means apply function rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date (\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide_*() fail; requires complete window perform computation. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.)","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_mean.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Optimized slide function for performing rolling averages on an epi_df object — epi_slide_mean","text":"","code":"# slide a 7-day trailing average formula on cases jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, before = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day trailing average formula on cases. Adjust `frollmean` settings for speed # and accuracy, and to allow partially-missing windows. jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean( cases, before = 6, # `frollmean` options na.rm = TRUE, algo = \"exact\", hasNA = TRUE ) %>% dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6 #> 2 ca 2020-03-02 4 5 #> 3 ca 2020-03-03 6 5.33 #> 4 ca 2020-03-04 11 6.75 #> 5 ca 2020-03-05 10 7.4 #> 6 ca 2020-03-06 18 9.17 #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day leading average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, after = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 11.6 #> 2 ca 2020-03-02 4 13.4 #> 3 ca 2020-03-03 6 16.1 #> 4 ca 2020-03-04 11 18.4 #> 5 ca 2020-03-05 10 20.4 #> 6 ca 2020-03-06 18 25.1 #> 7 ca 2020-03-07 26 30.1 #> 8 ca 2020-03-08 19 34.4 #> 9 ca 2020-03-09 23 37.3 #> 10 ca 2020-03-10 22 56.7 #> # ℹ 4,016 more rows # slide a 7-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, before = 3, after = 3) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 11.6 #> 5 ca 2020-03-05 10 13.4 #> 6 ca 2020-03-06 18 16.1 #> 7 ca 2020-03-07 26 18.4 #> 8 ca 2020-03-08 19 20.4 #> 9 ca 2020-03-09 23 25.1 #> 10 ca 2020-03-10 22 30.1 #> # ℹ 4,016 more rows # slide a 14-day centre-aligned average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_mean(cases, before = 6, after = 7) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_14dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_14dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 23 #> 8 ca 2020-03-08 19 25.4 #> 9 ca 2020-03-09 23 36.4 #> 10 ca 2020-03-10 22 42 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":null,"dir":"Reference","previous_headings":"","what":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"Slides n-timestep data.table::froll slider::summary-slide function variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"","code":"epi_slide_opt( x, col_names, f, ..., before, after, ref_time_values, time_step, new_col_name = NULL, as_list_col = NULL, names_sep = NULL, all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. col_names unquoted column name(e.g., cases), multiple column names (e.g., c(cases, deaths)), tidy-select expression. Variable names can used positions data frame, expressions like x:y can used select range variables. desired column names stored vector vars, use col_names = all_of(vars). tidy-selection renaming interface supported, used provide output column names; want customize output column names, use dplyr::rename slide. f Function; together ... specifies computation slide. f must one data.table's rolling functions (frollmean, frollsum, frollapply. See data.table::roll) one slider's specialized sliding functions (slide_mean, slide_sum, etc. See slider::summary-slide). \"slide\" means apply computation within sliding (.k.. \"rolling\") time window data group. window determined parameters described . One time step typically one day one week; see details explanation. optimized data.table slider functions directly passed computation function epi_slide without careful handling make sure computation group made n dates rather n points. epi_slide_opt (wrapper functions epi_slide_mean epi_slide_sum) take care window completion automatically prevent associated errors. ... Additional arguments pass slide computation f, example, na.rm algo f data.table function. f data.table function, automatically passed data x operate , window size n, alignment align. Providing args via ... cause error. f slider function, automatically passed data x operate , number points use computation. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name Character vector indicating name(s) new column(s) contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column names overwrite columns. names_sep NULL, new_col_name must length col_names. as_list_col supported. Included match epi_slide interface. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"\"slide\" means apply function rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date (\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide_*() fail; requires complete window perform computation. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.)","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_opt.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Optimized slide function for performing common rolling computations on an epi_df object — epi_slide_opt","text":"","code":"# slide a 7-day trailing average formula on cases. This can also be done with `epi_slide_mean` jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = data.table::frollmean, before = 6 ) %>% # Remove a nonessential var. to ensure new col is printed, and rename new col dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day trailing average formula on cases. Adjust `frollmean` settings for speed # and accuracy, and to allow partially-missing windows. jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = data.table::frollmean, before = 6, # `frollmean` options na.rm = TRUE, algo = \"exact\", hasNA = TRUE ) %>% dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 6 #> 2 ca 2020-03-02 4 5 #> 3 ca 2020-03-03 6 5.33 #> 4 ca 2020-03-04 11 6.75 #> 5 ca 2020-03-05 10 7.4 #> 6 ca 2020-03-06 18 9.17 #> 7 ca 2020-03-07 26 11.6 #> 8 ca 2020-03-08 19 13.4 #> 9 ca 2020-03-09 23 16.1 #> 10 ca 2020-03-10 22 18.4 #> # ℹ 4,016 more rows # slide a 7-day leading average jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = slider::slide_mean, after = 6 ) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 11.6 #> 2 ca 2020-03-02 4 13.4 #> 3 ca 2020-03-03 6 16.1 #> 4 ca 2020-03-04 11 18.4 #> 5 ca 2020-03-05 10 20.4 #> 6 ca 2020-03-06 18 25.1 #> 7 ca 2020-03-07 26 30.1 #> 8 ca 2020-03-08 19 34.4 #> 9 ca 2020-03-09 23 37.3 #> 10 ca 2020-03-10 22 56.7 #> # ℹ 4,016 more rows # slide a 7-day centre-aligned sum. This can also be done with `epi_slide_sum` jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_opt( cases, f = data.table::frollsum, before = 3, after = 3 ) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dav = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dav #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 81 #> 5 ca 2020-03-05 10 94 #> 6 ca 2020-03-06 18 113 #> 7 ca 2020-03-07 26 129 #> 8 ca 2020-03-08 19 143 #> 9 ca 2020-03-09 23 176 #> 10 ca 2020-03-10 22 211 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":null,"dir":"Reference","previous_headings":"","what":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"Slides n-timestep sum variables epi_df object. See slide vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"","code":"epi_slide_sum( x, col_names, ..., before, after, ref_time_values, time_step, new_col_name = NULL, as_list_col = NULL, names_sep = NULL, all_rows = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"x epi_df object consideration, grouped ungrouped. ungrouped, data x treated part single data group. col_names unquoted column name(e.g., cases), multiple column names (e.g., c(cases, deaths)), tidy-select expression. Variable names can used positions data frame, expressions like x:y can used select range variables. desired column names stored vector vars, use col_names = all_of(vars). tidy-selection renaming interface supported, used provide output column names; want customize output column names, use dplyr::rename slide. ... Additional arguments pass data.table::frollsum, example, na.rm algo. data.table::frollsum automatically passed data x operate , window size n, alignment align. Providing args via ... cause error. , far ref_time_value sliding window extend? least one two arguments must provided; 's default 0. value provided either argument must single, non-NA, non-negative, integer-compatible number time steps. Endpoints window inclusive. Common settings: trailing/right-aligned windows ref_time_value - time_step (k) ref_time_value: either pass =k , pass =k, =0. center-aligned windows ref_time_value - time_step(k) ref_time_value + time_step(k): pass =k, =k. leading/left-aligned windows ref_time_value ref_time_value + time_step(k): either pass pass =k , pass =0, =k. See \"Details:\" definition time step,(non)treatment missing rows within window, avoiding warnings &settings certain uncommon use case. ref_time_values Time values sliding computations, meaning, element vector serves reference time point one sliding window. missing, set unique time values underlying data table, default. time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take non-negative integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name Character vector indicating name(s) new column(s) contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column names overwrite columns. names_sep NULL, new_col_name must length col_names. as_list_col supported. Included match epi_slide interface. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_rows all_rows = TRUE, rows x kept output even ref_time_values provided, type missing value marker slide computation output column(s) time_values outside ref_time_values; otherwise, one row row x time_value ref_time_values. Default FALSE. missing value marker result vctrs::vec_casting NA type slide computation output. using as_list_col = TRUE, note missing marker NULL entry list column; certain operations, might want replace NULL entries different NA marker.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"epi_df object given appending one new columns x, named according new_col_name argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"Wrapper around epi_slide_opt f = datatable::frollsum. \"slide\" means apply function rolling window time steps data group, window centered reference time left right endpoints given arguments. unit (meaning one time step) implicitly defined way time_value column treats addition subtraction; example, time values coded Date objects, one time step one day, since .Date(\"2022-01-01\") + 1 equals .Date (\"2022-01-02\"). Alternatively, time step can set explicitly using time_step argument (specified override default choice based time_value column). enough time steps available complete window given reference time, epi_slide_*() fail; requires complete window perform computation. centrally-aligned slide n time_values sliding window, set = (n-1)/2 = (n-1)/2 number time_values sliding window odd = n/2-1 = n/2 n even. Sometimes, want experiment various trailing leading window widths compare slide outputs. (uncommon) case zero-width windows considered, manually pass arguments order prevent potential warnings. (E.g., =k k=0 missing may produce warning. avoid warnings, use =k, =0 instead; otherwise, looks much like leading window intended, argument forgotten misspelled.)","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epi_slide_sum.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Optimized slide function for performing rolling sums on an epi_df object — epi_slide_sum","text":"","code":"# slide a 7-day trailing sum formula on cases jhu_csse_daily_subset %>% group_by(geo_value) %>% epi_slide_sum(cases, before = 6) %>% # Remove a nonessential var. to ensure new col is printed dplyr::select(geo_value, time_value, cases, cases_7dsum = slide_value_cases) %>% ungroup() #> An `epi_df` object, 4,026 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 4 #> geo_value time_value cases cases_7dsum #> * #> 1 ca 2020-03-01 6 NA #> 2 ca 2020-03-02 4 NA #> 3 ca 2020-03-03 6 NA #> 4 ca 2020-03-04 11 NA #> 5 ca 2020-03-05 10 NA #> 6 ca 2020-03-06 18 NA #> 7 ca 2020-03-07 26 81 #> 8 ca 2020-03-08 19 94 #> 9 ca 2020-03-09 23 113 #> 10 ca 2020-03-10 22 129 #> # ℹ 4,016 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epiprocess.html","id":null,"dir":"Reference","previous_headings":"","what":"epiprocess: Tools for basic signal processing in epidemiology — epiprocess","title":"epiprocess: Tools for basic signal processing in epidemiology — epiprocess","text":"package introduces common data structure epidemiological data sets measured space time, offers associated utilities perform basic signal processing tasks.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epiprocess.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"epiprocess: Tools for basic signal processing in epidemiology — epiprocess","text":"Maintainer: Logan Brooks lcbrooks@andrew.cmu.edu Authors: Daniel McDonald Evan Ray Ryan Tibshirani contributors: Jacob Bien [contributor] Rafael Catoia [contributor] Nat DeFries [contributor] Rachel Lobay [contributor] Ken Mawer [contributor] Chloe [contributor] Quang Nguyen [contributor] Dmitry Shemetov [contributor] Lionel Henry (Author included rlang fragments) [contributor] Hadley Wickham (Author included rlang fragments) [contributor] Posit (Copyright holder included rlang fragments) [copyright holder]","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate a snapshot from an epi_archive object — epix_as_of","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"Generates snapshot epi_df format epi_archive object, given version. See archive vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"","code":"epix_as_of(x, max_version, min_time_value = -Inf, all_versions = FALSE)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"x epi_archive object max_version Time value specifying max version permit snapshot. , snapshot comprise unique rows current archive data represent --date signal values, specified max_version (whose time values least min_time_value.) min_time_value Time value specifying min time value permit snapshot. Default -Inf, effectively means minimum considered. all_versions all_versions = TRUE, output epi_archive format, contain rows specified time_value range version <= max_version. resulting object cover potentially narrower version time_value range x, depending user-provided arguments. Otherwise, one row output max_version time_value. Default FALSE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"epi_df object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_as_of.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate a snapshot from an epi_archive object — epix_as_of","text":"","code":"epix_as_of( archive_cases_dv_subset, max_version = max(archive_cases_dv_subset$DT$version) ) #> An `epi_df` object, 2,192 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2021-12-01 #> #> # A tibble: 2,192 × 4 #> geo_value time_value percent_cli case_rate_7d_av #> * #> 1 ca 2020-06-01 2.75 6.84 #> 2 ca 2020-06-02 2.57 6.82 #> 3 ca 2020-06-03 2.48 6.66 #> 4 ca 2020-06-04 2.41 6.98 #> 5 ca 2020-06-05 2.57 6.97 #> 6 ca 2020-06-06 2.63 6.66 #> 7 ca 2020-06-07 2.73 6.74 #> 8 ca 2020-06-08 3.04 6.67 #> 9 ca 2020-06-09 2.97 6.81 #> 10 ca 2020-06-10 2.99 7.13 #> # ℹ 2,182 more rows range(archive_cases_dv_subset$DT$version) # 2020-06-02 -- 2021-12-01 #> [1] \"2020-06-02\" \"2021-12-01\" epix_as_of(archive_cases_dv_subset, as.Date(\"2020-06-12\")) #> An `epi_df` object, 44 x 4 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2020-06-12 #> #> # A tibble: 44 × 4 #> geo_value time_value percent_cli case_rate_7d_av #> * #> 1 ca 2020-06-01 2.23 6.63 #> 2 ca 2020-06-02 2.06 6.45 #> 3 ca 2020-06-03 1.90 6.62 #> 4 ca 2020-06-04 1.79 6.64 #> 5 ca 2020-06-05 1.83 6.91 #> 6 ca 2020-06-06 1.86 6.76 #> 7 ca 2020-06-07 1.78 6.75 #> 8 ca 2020-06-08 1.90 6.90 #> 9 ca 2020-06-09 NA 7.02 #> 10 ca 2020-06-10 NA 7.36 #> # ℹ 34 more rows # --- Advanced: --- # When requesting recent versions of a data set, there can be some # reproducibility issues. For example, requesting data as of the current date # may return different values based on whether today's data is available yet # or not. Other factors include the time it takes between data becoming # available and when you download the data, and whether the data provider # will overwrite (\"clobber\") version data rather than just publishing new # versions. You can include information about these factors by setting the # `clobberable_versions_start` and `versions_end` of an `epi_archive`, in # which case you will get warnings about potential reproducibility issues: archive_cases_dv_subset2 <- as_epi_archive( archive_cases_dv_subset$DT, # Suppose last version with an update could potentially be rewritten # (a.k.a. \"hotfixed\", \"clobbered\", etc.): clobberable_versions_start = max(archive_cases_dv_subset$DT$version), # Suppose today is the following day, and there are no updates out yet: versions_end <- max(archive_cases_dv_subset$DT$version) + 1L, compactify = TRUE ) epix_as_of(archive_cases_dv_subset2, max(archive_cases_dv_subset$DT$version)) #> Warning: Getting data as of some recent version which could still be overwritten (under #> routine circumstances) without assigning a new version number (a.k.a. #> \"clobbered\"). Thus, the snapshot that we produce here should not be expected #> to be reproducible later. See `?epi_archive` for more info and `?epix_as_of` on #> how to muffle. #> An `epi_df` object, 2,192 x 4 with metadata: #> * geo_type = 2021-12-02 #> * time_type = day #> * as_of = 2021-12-01 #> #> # A tibble: 2,192 × 4 #> geo_value time_value percent_cli case_rate_7d_av #> * #> 1 ca 2020-06-01 2.75 6.84 #> 2 ca 2020-06-02 2.57 6.82 #> 3 ca 2020-06-03 2.48 6.66 #> 4 ca 2020-06-04 2.41 6.98 #> 5 ca 2020-06-05 2.57 6.97 #> 6 ca 2020-06-06 2.63 6.66 #> 7 ca 2020-06-07 2.73 6.74 #> 8 ca 2020-06-08 3.04 6.67 #> 9 ca 2020-06-09 2.97 6.81 #> 10 ca 2020-06-10 2.99 7.13 #> # ℹ 2,182 more rows"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":null,"dir":"Reference","previous_headings":"","what":"Fill epi_archive unobserved history — epix_fill_through_version","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"Sometimes, due upstream data pipeline issues, work version history completely date, functions expect archives completely date, equally --date another archive. function provides one way approach mismatches: pretend \"observed\" additional versions, filling versions NAs extrapolated values.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"","code":"epix_fill_through_version(x, fill_versions_end, how = c(\"na\", \"locf\"))"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"x epi_archive fill_versions_end Length-1, class&type x$version: version fill missing version history; result's $versions_end unless already later $versions_end. Optional; \"na\" \"locf\": \"na\" fill missing required version history NAs, inserting (necessary) update immediately current $versions_end revises existing measurements NA (supported version classes next_after implementation); \"locf\" fill missing version history last version observation carried forward (LOCF), leaving update $DT alone (epi_archive methods based LOCF). Default \"na\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_fill_through_version.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fill epi_archive unobserved history — epix_fill_through_version","text":"epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":null,"dir":"Reference","previous_headings":"","what":"Merge two epi_archive objects — epix_merge","title":"Merge two epi_archive objects — epix_merge","text":"Merges two epi_archives share common geo_value, time_value, set key columns. also share common versions_end, using epix_as_of result using epix_as_of x y individually, performing full join DTs non-version key columns (potentially consolidating multiple warnings clobberable versions). versions_end values differ, sync parameter controls done.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Merge two epi_archive objects — epix_merge","text":"","code":"epix_merge( x, y, sync = c(\"forbid\", \"na\", \"locf\", \"truncate\"), compactify = TRUE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Merge two epi_archive objects — epix_merge","text":"x, y Two epi_archive objects join together. sync Optional; \"forbid\", \"na\", \"locf\", \"truncate\"; case x$versions_end match y$versions_end, ?: \"forbid\": emit error; \"na\": use max(x$versions_end, y$versions_end) result's versions_end, ensure , request snapshot version min(x$versions_end, y$versions_end), observation columns less --date archive NAs (.e., imagine update immediately versions_end revised observations NA); \"locf\": use max(x$versions_end, y$versions_end) result's versions_end, allowing last version observation carried forward extrapolate unavailable versions less --date input archive (.e., imagining less --date archive's data set remained unchanged actual versions_end archive's versions_end); \"truncate\": use min(x$versions_end, y$versions_end) result's versions_end, discard rows containing update rows later versions. compactify Optional; TRUE, FALSE, NULL; result compactified? See as_epi_archive() explanation means. Default TRUE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Merge two epi_archive objects — epix_merge","text":"resulting epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Merge two epi_archive objects — epix_merge","text":"cases, additional_metadata empty list, clobberable_versions_start set earliest version clobbered either input archive.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_merge.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Merge two epi_archive objects — epix_merge","text":"","code":"# create two example epi_archive datasets x <- archive_cases_dv_subset$DT %>% dplyr::select(geo_value, time_value, version, case_rate_7d_av) %>% as_epi_archive(compactify = TRUE) y <- archive_cases_dv_subset$DT %>% dplyr::select(geo_value, time_value, version, percent_cli) %>% as_epi_archive(compactify = TRUE) # merge results stored in a third object: xy <- epix_merge(x, y)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":null,"dir":"Reference","previous_headings":"","what":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"Slides given function variables epi_archive object. behaves similarly epi_slide(), key exception version-aware: sliding computation given reference time t performed data available t. See archive vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"","code":"epix_slide( x, f, ..., before, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_versions = FALSE ) # S3 method for epi_archive epix_slide( x, f, ..., before, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_versions = FALSE ) # S3 method for grouped_epi_archive epix_slide( x, f, ..., before, ref_time_values, time_step, new_col_name = \"slide_value\", as_list_col = FALSE, names_sep = \"_\", all_versions = FALSE )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"x epi_archive grouped_epi_archive object. ungrouped, data x treated part single data group. f Function, formula, missing; together ... specifies computation slide. \"slide\" means apply computation sliding (.k.. \"rolling\") time window data group. window determined parameter described . One time step typically one day one week; see epi_slide details explanation. function, f must take epi_df column names archive's DT, minus version column; followed one-row tibble containing values grouping variables associated group; followed reference time value, usually Date object; followed number named arguments. formula, f can operate directly columns accessed via .x$var .$var, ~ mean (.x$var) compute mean column var group-ref_time_value combination. group key can accessed via .y .group_key, reference time value can accessed via .z .ref_time_value. f missing, ... specify computation. ... Additional arguments pass function formula specified via f. Alternatively, f missing, ... interpreted expression tidy evaluation; addition referring columns directly name, expression access .data .env pronouns dplyr verbs, can also refer .group_key .ref_time_value. See details epi_slide. far ref_time_value sliding window extend? provided, single, non-NA, integer-compatible number time steps. window endpoint inclusive. example, = 7, one time step one day, produce value ref_time_value January 8, apply given function formula data (group present) time_values January 1 onward, reported January 8. typical disease surveillance sources, include data time_value January 8, , depending amount reporting latency, may include January 7 even earlier time_values. (instead archive hold nowcasts instead regular surveillance data, indeed expect data time_value January 8. hold forecasts, expect data time_values January 8, sliding window extend far ref_time_value needed include time_values.) ref_time_values Reference time values / versions sliding computations; element vector serves anchor point time_value window computation max_version epix_as_of fetch data window. missing, set regularly-spaced sequence values set cover range versions DT plus versions_end; spacing values guessed (using GCD skips values). time_step Optional function used define meaning one time step, specified, overrides default choice based time_value column. function must take positive integer return object class lubridate::period. example, can use time_step = lubridate::hours order set time step one hour (meaningful time_value class POSIXct). new_col_name String indicating name new column contain derivative values. Default \"slide_value\"; note setting new_col_name equal existing column name overwrite column. as_list_col slide results held list column, unchopped/unnested? Default FALSE, case list object returned f unnested (using tidyr::unnest()), , slide computations output data frames, names resulting columns given prepending new_col_name names list elements. names_sep String specifying separator use tidyr::unnest() as_list_col = FALSE. Default \"_\". Using NULL drops prefix new_col_name entirely. all_versions (all_rows parameter epi_slide.) all_versions = TRUE, f passed version history (version <= ref_time_value) rows time_value ref_time_value - ref_time_value. Otherwise, f passed recent version every unique time_value. Default FALSE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"tibble whose columns : grouping variables, time_value, containing reference time values slide computation, column named according new_col_name argument, containing slide values.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"key distinctions current function epi_slide(): f functions epix_slide, one assume input data contain rows time_value matching computation's ref_time_value (accessible via attributes()$metadata$as_of); typical epidemiological surveillance data, observations pertaining particular time period (time_value) first reported as_of instant time period ended. epix_slide() accept argument; windows extend time steps given ref_time_value last time_value available version ref_time_value (typically, include ref_time_value , observations particular time interval (e.g., day) published time interval ends); epi_slide windows extend time steps ref_time_value time steps ref_time_value. input class columns similar different: epix_slide (default all_versions=FALSE) keeps columns epi_df-ness first argument computation; epi_slide provides grouping variables second input, convert first input regular tibble grouping variables include essential geo_value column. (all_versions=TRUE, epix_slide provide epi_archive rather epi-df computation.) output class columns similar different: epix_slide() returns tibble containing grouping variables, time_value, new column(s) slide computations, whereas epi_slide() returns epi_df original variables plus new columns slide computations. (mirror grouping ungroupedness input, one exception: epi_archives can trivial (zero-variable) groupings, dropped epix_slide results supported tibbles.) size stability checks element/row recycling maintain size stability epix_slide, unlike epi_slide. (epix_slide roughly analogous dplyr::group_modify, epi_slide roughly analogous dplyr::mutate followed dplyr::arrange) detailed \"advanced\" vignette. all_rows supported epix_slide; since slide computations allowed flexibility outputs epi_slide, guess good representation missing computations excluded group-ref_time_value pairs. ref_time_values default epix_slide based making evenly-spaced sequence versions DT plus versions_end, rather time_values. Apart distinctions, interfaces epix_slide() epi_slide() . Furthermore, current function can considerably slower epi_slide(), two reasons: (1) must repeatedly fetch properly-versioned snapshots data archive (via epix_as_of()), (2) performs \"manual\" sliding sorts, benefit highly efficient slider package. reason, never used place epi_slide(), used version-aware sliding necessary (purpose).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_slide.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Slide a function over variables in an epi_archive or grouped_epi_archive — epix_slide","text":"","code":"library(dplyr) #> #> Attaching package: ‘dplyr’ #> The following objects are masked from ‘package:stats’: #> #> filter, lag #> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union # Reference time points for which we want to compute slide values: ref_time_values <- seq(as.Date(\"2020-06-01\"), as.Date(\"2020-06-15\"), by = \"1 day\" ) # A simple (but not very useful) example (see the archive vignette for a more # realistic one): archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( f = ~ mean(.x$case_rate_7d_av), before = 2, ref_time_values = ref_time_values, new_col_name = \"case_rate_7d_av_recent_av\" ) %>% ungroup() #> # A tibble: 57 × 3 #> geo_value time_value case_rate_7d_av_recent_av #> #> 1 NA 2020-06-01 NaN #> 2 ca 2020-06-02 6.63 #> 3 fl 2020-06-02 3.38 #> 4 ny 2020-06-02 6.57 #> 5 tx 2020-06-02 4.52 #> 6 ca 2020-06-03 6.54 #> 7 fl 2020-06-03 3.42 #> 8 ny 2020-06-03 6.66 #> 9 tx 2020-06-03 4.75 #> 10 ca 2020-06-04 6.53 #> # ℹ 47 more rows # We requested time windows that started 2 days before the corresponding time # values. The actual number of `time_value`s in each computation depends on # the reporting latency of the signal and `time_value` range covered by the # archive (2020-06-01 -- 2021-11-30 in this example). In this case, we have # * 0 `time_value`s, for ref time 2020-06-01 --> the result is automatically # discarded # * 1 `time_value`, for ref time 2020-06-02 # * 2 `time_value`s, for the rest of the results # * never the 3 `time_value`s we would get from `epi_slide`, since, because # of data latency, we'll never have an observation # `time_value == ref_time_value` as of `ref_time_value`. # The example below shows this type of behavior in more detail. # Examining characteristics of the data passed to each computation with # `all_versions=FALSE`. archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( function(x, gk, rtv) { tibble( time_range = if (nrow(x) == 0L) { \"0 `time_value`s\" } else { sprintf(\"%s -- %s\", min(x$time_value), max(x$time_value)) }, n = nrow(x), class1 = class(x)[[1L]] ) }, before = 5, all_versions = FALSE, ref_time_values = ref_time_values, names_sep = NULL ) %>% ungroup() %>% arrange(geo_value, time_value) #> # A tibble: 57 × 5 #> geo_value time_value time_range n class1 #> #> 1 ca 2020-06-02 2020-06-01 -- 2020-06-01 1 epi_df #> 2 ca 2020-06-03 2020-06-01 -- 2020-06-02 2 epi_df #> 3 ca 2020-06-04 2020-06-01 -- 2020-06-03 3 epi_df #> 4 ca 2020-06-05 2020-06-01 -- 2020-06-04 4 epi_df #> 5 ca 2020-06-06 2020-06-01 -- 2020-06-05 5 epi_df #> 6 ca 2020-06-07 2020-06-02 -- 2020-06-06 5 epi_df #> 7 ca 2020-06-08 2020-06-03 -- 2020-06-07 5 epi_df #> 8 ca 2020-06-09 2020-06-04 -- 2020-06-08 5 epi_df #> 9 ca 2020-06-10 2020-06-05 -- 2020-06-09 5 epi_df #> 10 ca 2020-06-11 2020-06-06 -- 2020-06-10 5 epi_df #> # ℹ 47 more rows # --- Advanced: --- # `epix_slide` with `all_versions=FALSE` (the default) applies a # version-unaware computation to several versions of the data. We can also # use `all_versions=TRUE` to apply a version-*aware* computation to several # versions of the data, again looking at characteristics of the data passed # to each computation. In this case, each computation should expect an # `epi_archive` containing the relevant version data: archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( function(x, gk, rtv) { tibble( versions_start = if (nrow(x$DT) == 0L) { \"NA (0 rows)\" } else { toString(min(x$DT$version)) }, versions_end = x$versions_end, time_range = if (nrow(x$DT) == 0L) { \"0 `time_value`s\" } else { sprintf(\"%s -- %s\", min(x$DT$time_value), max(x$DT$time_value)) }, n = nrow(x$DT), class1 = class(x)[[1L]] ) }, before = 5, all_versions = TRUE, ref_time_values = ref_time_values, names_sep = NULL ) %>% ungroup() %>% # Focus on one geo_value so we can better see the columns above: filter(geo_value == \"ca\") %>% select(-geo_value) #> # A tibble: 14 × 6 #> time_value versions_start versions_end time_range n class1 #> #> 1 2020-06-02 2020-06-02 2020-06-02 2020-06-01 -- 2020-06-01 1 epi_ar… #> 2 2020-06-03 2020-06-02 2020-06-03 2020-06-01 -- 2020-06-02 2 epi_ar… #> 3 2020-06-04 2020-06-02 2020-06-04 2020-06-01 -- 2020-06-03 3 epi_ar… #> 4 2020-06-05 2020-06-02 2020-06-05 2020-06-01 -- 2020-06-04 4 epi_ar… #> 5 2020-06-06 2020-06-02 2020-06-06 2020-06-01 -- 2020-06-05 8 epi_ar… #> 6 2020-06-07 2020-06-03 2020-06-07 2020-06-02 -- 2020-06-06 9 epi_ar… #> 7 2020-06-08 2020-06-04 2020-06-08 2020-06-03 -- 2020-06-07 9 epi_ar… #> 8 2020-06-09 2020-06-05 2020-06-09 2020-06-04 -- 2020-06-08 8 epi_ar… #> 9 2020-06-10 2020-06-06 2020-06-10 2020-06-05 -- 2020-06-09 8 epi_ar… #> 10 2020-06-11 2020-06-07 2020-06-11 2020-06-06 -- 2020-06-10 8 epi_ar… #> 11 2020-06-12 2020-06-08 2020-06-12 2020-06-07 -- 2020-06-11 8 epi_ar… #> 12 2020-06-13 2020-06-09 2020-06-13 2020-06-08 -- 2020-06-12 8 epi_ar… #> 13 2020-06-14 2020-06-10 2020-06-14 2020-06-09 -- 2020-06-13 8 epi_ar… #> 14 2020-06-15 2020-06-11 2020-06-15 2020-06-10 -- 2020-06-14 8 epi_ar…"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":null,"dir":"Reference","previous_headings":"","what":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"Generates filtered epi_archive epi_archive object, keeping rows version falling specified date.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"","code":"epix_truncate_versions_after(x, max_version) # S3 method for epi_archive epix_truncate_versions_after(x, max_version) # S3 method for grouped_epi_archive epix_truncate_versions_after(x, max_version)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"x epi_archive object. max_version latest version include archive.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/epix_truncate_versions_after.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Filter an epi_archive object to keep only older versions — epix_truncate_versions_after","text":"epi_archive object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":null,"dir":"Reference","previous_headings":"","what":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"group_by related methods epi_archive, grouped_epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"","code":"# S3 method for epi_archive group_by(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data)) # S3 method for grouped_epi_archive group_by(.data, ..., .add = FALSE, .drop = dplyr::group_by_drop_default(.data)) # S3 method for grouped_epi_archive group_by_drop_default(.tbl) # S3 method for grouped_epi_archive groups(x) # S3 method for grouped_epi_archive ungroup(x, ...) is_grouped_epi_archive(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":".data epi_archive grouped_epi_archive ... Similar dplyr::group_by (see \"Details:\" edge cases); group_by: unquoted variable name(s) \"data masking\" expression(s). possible use dplyr::mutate-like syntax calculate new columns perform grouping, note , regrouping already-grouped .data object, calculations carried ignoring grouping (dplyr). ungroup: either empty, order remove grouping output epi_archive; variable name(s) \"tidy-select\" expression(s), order remove matching variables list grouping variables, output another grouped_epi_archive. .add Boolean. FALSE, default, output grouped variable selection ... ; TRUE, output grouped current grouping variables plus variable selection .... .drop described dplyr::group_by; determines treatment factor columns. .tbl grouped_epi_archive object. x groups ungroup: grouped_epi_archive; is_grouped_epi_archive: object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"match dplyr, group_by allows \"data masking\" (also referred \"tidy evaluation\") expressions ..., just column names, way similar mutate. Note replacing removing key columns expressions disabled. archive %>% group_by() expressions group regroup zero columns (indicating rows treated part one large group) output grouped_epi_archive, order enable use grouped_epi_archive methods result. slight contrast operations tibbles grouped tibbles, output grouped_df circumstances. Using group_by .add=FALSE override existing grouping disabled; instead, ungroup first group_by. group_by_drop_default (ungrouped) epi_archives expected dispatch group_by_drop_default.default (dedicated method grouped_epi_archives).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/group_by.epi_archive.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"group_by and related methods for epi_archive, grouped_epi_archive — group_by.epi_archive","text":"","code":"grouped_archive <- archive_cases_dv_subset %>% group_by(geo_value) # `print` for metadata and method listing: grouped_archive %>% print() #> A `grouped_epi_archive` object: #> * Groups: geo_value #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Min/max time values: 2020-06-01 / 2021-11-30 #> ℹ First/last version with update: 2020-06-02 / 2021-12-01 #> ℹ Versions end: 2021-12-01 #> ℹ A preview of the table (129638 rows x 5 columns): #> Key: #> geo_value time_value version percent_cli case_rate_7d_av #> #> 1: ca 2020-06-01 2020-06-02 NA 6.628329 #> 2: ca 2020-06-01 2020-06-06 2.140116 6.628329 #> 3: ca 2020-06-01 2020-06-07 2.140116 6.628329 #> 4: ca 2020-06-01 2020-06-08 2.140379 6.628329 #> 5: ca 2020-06-01 2020-06-09 2.114430 6.628329 #> --- #> 129634: tx 2021-11-26 2021-11-29 1.858596 7.957657 #> 129635: tx 2021-11-27 2021-11-28 NA 7.174299 #> 129636: tx 2021-11-28 2021-11-29 NA 6.834681 #> 129637: tx 2021-11-29 2021-11-30 NA 8.841247 #> 129638: tx 2021-11-30 2021-12-01 NA 9.566218 # The primary use for grouping is to perform a grouped `epix_slide`: archive_cases_dv_subset %>% group_by(geo_value) %>% epix_slide( f = ~ mean(.x$case_rate_7d_av), before = 2, ref_time_values = as.Date(\"2020-06-11\") + 0:2, new_col_name = \"case_rate_3d_av\" ) %>% ungroup() #> # A tibble: 12 × 3 #> geo_value time_value case_rate_3d_av #> #> 1 ca 2020-06-11 7.19 #> 2 fl 2020-06-11 5.71 #> 3 ny 2020-06-11 4.59 #> 4 tx 2020-06-11 5.62 #> 5 ca 2020-06-12 7.52 #> 6 fl 2020-06-12 5.82 #> 7 ny 2020-06-12 4.34 #> 8 tx 2020-06-12 5.91 #> 9 ca 2020-06-13 7.62 #> 10 fl 2020-06-13 6.11 #> 11 ny 2020-06-13 4.14 #> 12 tx 2020-06-13 6.03 # ----------------------------------------------------------------- # Advanced: some other features of dplyr grouping are implemented: library(dplyr) toy_archive <- tribble( ~geo_value, ~age_group, ~time_value, ~version, ~value, \"us\", \"adult\", \"2000-01-01\", \"2000-01-02\", 121, \"us\", \"pediatric\", \"2000-01-02\", \"2000-01-03\", 5, # (addition) \"us\", \"adult\", \"2000-01-01\", \"2000-01-03\", 125, # (revision) \"us\", \"adult\", \"2000-01-02\", \"2000-01-03\", 130 # (addition) ) %>% mutate( age_group = ordered(age_group, c(\"pediatric\", \"adult\")), time_value = as.Date(time_value), version = as.Date(version) ) %>% as_epi_archive(other_keys = \"age_group\") # The following are equivalent: toy_archive %>% group_by(geo_value, age_group) #> A `grouped_epi_archive` object: #> * Groups: geo_value, age_group #> * Drops groups formed by factor levels that don't appear in the data #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 toy_archive %>% group_by(geo_value) %>% group_by(age_group, .add = TRUE) #> A `grouped_epi_archive` object: #> * Groups: geo_value, age_group #> * Drops groups formed by factor levels that don't appear in the data #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 grouping_cols <- c(\"geo_value\", \"age_group\") toy_archive %>% group_by(across(all_of(grouping_cols))) #> A `grouped_epi_archive` object: #> * Groups: geo_value, age_group #> * Drops groups formed by factor levels that don't appear in the data #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 # And these are equivalent: toy_archive %>% group_by(geo_value) #> A `grouped_epi_archive` object: #> * Groups: geo_value #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 toy_archive %>% group_by(geo_value, age_group) %>% ungroup(age_group) #> A `grouped_epi_archive` object: #> * Groups: geo_value #> It wraps an ungrouped `epi_archive`, with metadata: #> ℹ Non-standard DT keys: age_group #> ℹ Min/max time values: 2000-01-01 / 2000-01-02 #> ℹ First/last version with update: 2000-01-02 / 2000-01-03 #> ℹ Versions end: 2000-01-03 #> ℹ A preview of the table (4 rows x 5 columns): #> Key: #> geo_value age_group time_value version value #> #> 1: us adult 2000-01-01 2000-01-02 121 #> 2: us adult 2000-01-01 2000-01-03 125 #> 3: us pediatric 2000-01-02 2000-01-03 5 #> 4: us adult 2000-01-02 2000-01-03 130 # To get the grouping variable names as a `list` of `name`s (a.k.a. symbols): toy_archive %>% group_by(geo_value) %>% groups() #> [[1]] #> geo_value #> toy_archive %>% group_by(geo_value, age_group, .drop = FALSE) %>% epix_slide(f = ~ sum(.x$value), before = 20) %>% ungroup() #> # A tibble: 4 × 4 #> geo_value age_group time_value slide_value #> #> 1 us pediatric 2000-01-02 0 #> 2 us adult 2000-01-02 121 #> 3 us pediatric 2000-01-03 5 #> 4 us adult 2000-01-03 255"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":null,"dir":"Reference","previous_headings":"","what":"Estimate growth rate — growth_rate","title":"Estimate growth rate — growth_rate","text":"Estimates growth rate signal given points along underlying sequence. Several methodologies available; see growth rate vignette examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Estimate growth rate — growth_rate","text":"","code":"growth_rate( x = seq_along(y), y, x0 = x, method = c(\"rel_change\", \"linear_reg\", \"smooth_spline\", \"trend_filter\"), h = 7, log_scale = FALSE, dup_rm = FALSE, na_rm = FALSE, ... )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Estimate growth rate — growth_rate","text":"x Design points corresponding signal values y. Default seq_along(y) (, equally-spaced points 1 length y). y Signal values. x0 Points estimate growth rate. Must subset x (extrapolation allowed). Default x. method Either \"rel_change\", \"linear_reg\", \"smooth_spline\", \"trend_filter\", indicating method use growth rate calculation. first two local methods: run sliding fashion sequence (order estimate derivatives hence growth rates); latter two global methods: run entire sequence. See details explanation. h Bandwidth sliding window, method \"rel_change\" \"linear_reg\". See details explanation. log_scale growth rates estimated using parametrization log scale? See details explanation. Default FALSE. dup_rm check remove duplicates x (corresponding elements y) computation? methods might handle duplicate x values gracefully, whereas others might fail (either quietly loudly). Default FALSE. na_rm missing values removed computation? Default FALSE. ... Additional arguments pass method used estimate derivative.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Estimate growth rate — growth_rate","text":"Vector growth rate estimates specified points x0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Estimate growth rate — growth_rate","text":"growth rate function f defined continuously-valued parameter t defined f'(t) / f(t), f'(t) derivative f t. estimate growth rate signal discrete-time (can thought evaluations discretizations underlying function continuous-time), can therefore estimate derivative divide signal value (possibly smoothed version signal value). following methods available estimating growth rate: \"rel_change\": uses (B/- 1) / h, B average y second half sliding window bandwidth h centered reference point x0, average first half. can seen using first-difference approximation derivative. \"linear_reg\": uses slope linear regression y x sliding window centered reference point x0, divided fitted value linear regression x0. \"smooth_spline\": uses estimated derivative x0 smoothing spline fit x y, via stats::smooth.spline(), divided fitted value spline x0. \"trend_filter\": uses estimated derivative x0 polynomial trend filtering (discrete spline) fit x y, via genlasso::trendfilter(), divided fitted value discrete spline x0.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"log-scale","dir":"Reference","previous_headings":"","what":"Log Scale","title":"Estimate growth rate — growth_rate","text":"alternative view growth rate function f general given defining g(t) = log(f(t)), observing g'(t) = f'(t) / f(t). Therefore, method estimates derivative can simply applied log signal interest, light, method (\"rel_change\", \"linear_reg\", \"smooth_spline\", \"trend_filter\") log scale analog, can used setting log_scale = TRUE.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"sliding-windows","dir":"Reference","previous_headings":"","what":"Sliding Windows","title":"Estimate growth rate — growth_rate","text":"local methods, \"rel_change\" \"linear_reg\", use sliding window centered reference point bandiwidth h. words, sliding window consists points x whose distance reference point h. Note unit distance implicitly defined x variable; example, x vector Date objects, h = 7, reference point January 7, sliding window contains data January 1 14 (matching behavior epi_slide() = h - 1 = h).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"additional-arguments","dir":"Reference","previous_headings":"","what":"Additional Arguments","title":"Estimate growth rate — growth_rate","text":"global methods, \"smooth_spline\" \"trend_filter\", additional arguments can specified via ... underlying estimation function. smoothing spline case, additional arguments passed directly stats::smooth.spline() (defaults exactly function). trend filtering case works bit differently: , custom set arguments allowed (distributed internally genlasso::trendfilter() genlasso::cv.trendfilter()): ord: order piecewise polynomial trend filtering fit. Default 3. maxsteps: maximum number steps take solution path terminating. Default 1000. cv: cross-validation used choose effective degrees freedom fit? Default TRUE. k: number folds cross-validation used. Default 3. df: desired effective degrees freedom trend filtering fit. cv = FALSE, df must positive integer; cv = TRUE, df must one \"min\" \"1se\" indicating selection rule use based cross-validation error curve: minimum 1-standard-error rule, respectively. Default \"min\" (going along default cv = TRUE). Note cv = FALSE, require df set user.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/growth_rate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Estimate growth rate — growth_rate","text":"","code":"# COVID cases growth rate by state using default method relative change jhu_csse_daily_subset %>% group_by(geo_value) %>% mutate(cases_gr = growth_rate(x = time_value, y = cases)) #> An `epi_df` object, 4,026 x 7 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 7 #> # Groups: geo_value [6] #> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av #> * #> 1 ca 2020-03-01 6 1.29 0.00327 0 #> 2 ca 2020-03-02 4 1.71 0.00435 0 #> 3 ca 2020-03-03 6 2.43 0.00617 0 #> 4 ca 2020-03-04 11 3.86 0.00980 0.000363 #> 5 ca 2020-03-05 10 5.29 0.0134 0.000363 #> 6 ca 2020-03-06 18 7.86 0.0200 0.000363 #> 7 ca 2020-03-07 26 11.6 0.0294 0.000363 #> 8 ca 2020-03-08 19 13.4 0.0341 0.000363 #> 9 ca 2020-03-09 23 16.1 0.0410 0.000726 #> 10 ca 2020-03-10 22 18.4 0.0468 0.000726 #> # ℹ 4,016 more rows #> # ℹ 1 more variable: cases_gr # Log scale, degree 4 polynomial and 6-fold cross validation jhu_csse_daily_subset %>% group_by(geo_value) %>% mutate(gr_poly = growth_rate(x = time_value, y = cases, log_scale = TRUE, ord = 4, k = 6)) #> Warning: There were 3 warnings in `mutate()`. #> The first warning was: #> ℹ In argument: `gr_poly = growth_rate(...)`. #> ℹ In group 1: `geo_value = \"ca\"`. #> Caused by warning in `log()`: #> ! NaNs produced #> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings. #> An `epi_df` object, 4,026 x 7 with metadata: #> * geo_type = state #> * time_type = day #> * as_of = 2024-01-26 17:27:32.755949 #> #> # A tibble: 4,026 × 7 #> # Groups: geo_value [6] #> geo_value time_value cases cases_7d_av case_rate_7d_av death_rate_7d_av #> * #> 1 ca 2020-03-01 6 1.29 0.00327 0 #> 2 ca 2020-03-02 4 1.71 0.00435 0 #> 3 ca 2020-03-03 6 2.43 0.00617 0 #> 4 ca 2020-03-04 11 3.86 0.00980 0.000363 #> 5 ca 2020-03-05 10 5.29 0.0134 0.000363 #> 6 ca 2020-03-06 18 7.86 0.0200 0.000363 #> 7 ca 2020-03-07 26 11.6 0.0294 0.000363 #> 8 ca 2020-03-08 19 13.4 0.0341 0.000363 #> 9 ca 2020-03-09 23 16.1 0.0410 0.000726 #> 10 ca 2020-03-10 22 18.4 0.0468 0.000726 #> # ℹ 4,016 more rows #> # ℹ 1 more variable: gr_poly "},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":null,"dir":"Reference","previous_headings":"","what":"Use max valid period as guess for period of ref_time_values — guess_period","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":"Use max valid period guess period ref_time_values","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":"","code":"guess_period( ref_time_values, ref_time_values_arg = rlang::caller_arg(ref_time_values) )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":"ref_time_values Vector containing time-interval-like time-like data, least two distinct values, diff-able (e.g., time_value version column), sensible result adding .numeric versions diff result (via .integer typeof \"integer\", otherwise via .numeric). ref_time_values_arg Optional, string; name give ref_time_values error messages. Defaults quoting expression caller fed ref_time_values argument.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/guess_period.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Use max valid period as guess for period of ref_time_values — guess_period","text":".numeric, length 1; attempts match typeof(ref_time_values)","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"data source confirmed COVID-19 cases based reports made available Center Systems Science Engineering Johns Hopkins University. example data snapshot Oct 28, 2021 captures cases June 1, 2020 May 31, 2021 limited California Florida.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"","code":"incidence_num_outlier_example"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"tibble 730 rows 3 variables: geo_value geographic value associated row measurements. time_value time value associated row measurements. cases Number new confirmed COVID-19 cases, daily","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/incidence_num_outlier_example.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of JHU daily cases from California and Florida — incidence_num_outlier_example","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes. Furthermore, data limited small number rows, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Test for epi_df format — is_epi_df","title":"Test for epi_df format — is_epi_df","text":"Test epi_df format","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Test for epi_df format — is_epi_df","text":"","code":"is_epi_df(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Test for epi_df format — is_epi_df","text":"x object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/is_epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Test for epi_df format — is_epi_df","text":"TRUE object inherits epi_df.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"data source confirmed COVID-19 cases deaths based reports made available Center Systems Science Engineering Johns Hopkins University. example data ranges Mar 1, 2020 Dec 31, 2021, limited Massachusetts Vermont.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"","code":"jhu_csse_county_level_subset"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"tibble 16,212 rows 5 variables: geo_value geographic value associated row measurements. time_value time value associated row measurements. cases Number new confirmed COVID-19 cases, daily county_name name county state_name full name state","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_county_level_subset.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of JHU daily cases from counties in Massachusetts and Vermont — jhu_csse_county_level_subset","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Epidata API: signals taken directly JHU CSSE COVID-19 GitHub repository without changes. 7-day average signals computed Delphi calculating moving averages preceding 7 days, signal June 7 average underlying data June 1 7, inclusive. Furthermore, data limited small number rows, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"data source confirmed COVID-19 cases deaths based reports made available Center Systems Science Engineering Johns Hopkins University. example data ranges Mar 1, 2020 Dec 31, 2021, limited California, Florida, Texas, New York, Georgia, Pennsylvania.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"","code":"jhu_csse_daily_subset"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":"format","dir":"Reference","previous_headings":"","what":"Format","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"tibble 4026 rows 6 variables: geo_value geographic value associated row measurements. time_value time value associated row measurements. case_rate_7d_av 7-day average signal number new confirmed COVID-19 cases per 100,000 population, daily death_rate_7d_av 7-day average signal number new confirmed deaths due COVID-19 per 100,000 population, daily cases Number new confirmed COVID-19 cases, daily cases_7d_av 7-day average signal number new confirmed COVID-19 cases, daily","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/jhu_csse_daily_subset.html","id":"source","dir":"Reference","previous_headings":"","what":"Source","title":"Subset of JHU daily state cases and deaths — jhu_csse_daily_subset","text":"object contains modified part COVID-19 Data Repository Center Systems Science Engineering (CSSE) Johns Hopkins University republished COVIDcast Epidata API. data set licensed terms Creative Commons Attribution 4.0 International license Johns Hopkins University behalf Center Systems Science Engineering. Copyright Johns Hopkins University 2020. Modifications: COVIDcast Epidata API: case signal taken directly JHU CSSE COVID-19 GitHub repository. rate signals computed Delphi using Census population data. 7-day average signals computed Delphi calculating moving averages preceding 7 days, signal June 7 average underlying data June 1 7, inclusive. Furthermore, data limited small number rows, signal names slightly altered, formatted tibble.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":null,"dir":"Reference","previous_headings":"","what":"Grab any keys associated to an epi_df — key_colnames","title":"Grab any keys associated to an epi_df — key_colnames","text":"Grab keys associated epi_df","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Grab any keys associated to an epi_df — key_colnames","text":"","code":"key_colnames(x, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Grab any keys associated to an epi_df — key_colnames","text":"x data.frame, tibble, epi_df ... additional arguments passed methods","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/key_colnames.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Grab any keys associated to an epi_df — key_colnames","text":"epi_df, returns \"keys\". Otherwise NULL","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":null,"dir":"Reference","previous_headings":"","what":"max(x$version), with error if x has 0 rows — max_version_with_row_in","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"Exported make defaults easily copyable.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"","code":"max_version_with_row_in(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"x x argument as_epi_archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/max_version_with_row_in.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"max(x$version), with error if x has 0 rows — max_version_with_row_in","text":"max(x$version) rows; raises error 0 rows NA version value","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates an epi_df object — new_epi_df","title":"Creates an epi_df object — new_epi_df","text":"Creates new epi_df object. default, builds empty tibble correct metadata epi_df object (ie. geo_type, time_type, as_of). Refer info. arguments details.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates an epi_df object — new_epi_df","text":"","code":"new_epi_df( x = tibble::tibble(), geo_type, time_type, as_of, additional_metadata = list(), ... )"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates an epi_df object — new_epi_df","text":"x data.frame, tibble::tibble, tsibble::tsibble converted geo_type Type geo values. missing, function attempt infer geo values present; fails, set \"custom\". time_type Type time values. missing, function attempt infer time values present; fails, set \"custom\". as_of Time value representing time given data available. example, as_of January 31, 2022, epi_df object created represent --date version data available January 31, 2022. as_of argument missing, current day-time used. additional_metadata List additional metadata attach epi_df object. metadata geo_type, time_type, as_of fields; named entries passed list included well. tibble additional keys, sure specify character vector other_keys component additional_metadata. ... Additional arguments passed methods.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/new_epi_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Creates an epi_df object — new_epi_df","text":"epi_df object.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the next possible value greater than x of the same type — next_after","title":"Get the next possible value greater than x of the same type — next_after","text":"Get next possible value greater x type","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the next possible value greater than x of the same type — next_after","text":"","code":"next_after(x)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the next possible value greater than x of the same type — next_after","text":"x starting \"value\"(s)","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/next_after.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the next possible value greater than x of the same type — next_after","text":"class, typeof, length x","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/pipe.html","id":null,"dir":"Reference","previous_headings":"","what":"Pipe operator — %>%","title":"Pipe operator — %>%","text":"See magrittr::%>% details.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/pipe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pipe operator — %>%","text":"","code":"lhs %>% rhs"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_archive.html","id":null,"dir":"Reference","previous_headings":"","what":"Print information about an epi_archive object — print.epi_archive","title":"Print information about an epi_archive object — print.epi_archive","text":"Print information epi_archive object","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_archive.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print information about an epi_archive object — print.epi_archive","text":"","code":"# S3 method for epi_archive print(x, ..., class = TRUE, methods = TRUE)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_archive.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print information about an epi_archive object — print.epi_archive","text":"x epi_archive object. ... empty, satisfy S3 generic. class Boolean; whether print class label header methods Boolean; whether print available methods archive","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_df.html","id":null,"dir":"Reference","previous_headings":"","what":"Base S3 methods for an epi_df object — print.epi_df","title":"Base S3 methods for an epi_df object — print.epi_df","text":"Print summary functions epi_df object. Prints variety summary statistics epi_df object, time range included geographic coverage.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Base S3 methods for an epi_df object — print.epi_df","text":"","code":"# S3 method for epi_df print(x, ...) # S3 method for epi_df summary(object, ...) # S3 method for epi_df group_by(.data, ...) # S3 method for epi_df ungroup(x, ...) # S3 method for epi_df group_modify(.data, .f, ..., .keep = FALSE) # S3 method for epi_df unnest(data, ...)"},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/print.epi_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Base S3 methods for an epi_df object — print.epi_df","text":"x epi_df ... Additional arguments, compatibility summary(). Currently unused. object epi_df .data epi_df .f function formula; see dplyr::group_modify .keep Boolean; see dplyr::group_modify data epi_df","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. dplyr arrange, filter, group_by, group_modify, mutate, relocate, rename, slice, ungroup ggplot2 autoplot tidyr unnest tsibble as_tsibble","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-8","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.8","text":"epi_slide computations now 2-4 times faster changing reference time values, made accessible within sliding functions, calculated (#397). Add new epi_slide_mean function allow much (~30x) faster rolling average computations cases (#400). Add new epi_slide_sum function allow much faster rolling sum computations cases (#433). Add new epi_slide_opt function allow much faster rolling computations cases, using data.table slider optimized rolling functions (#433). Add tidyselect interface epi_slide_opt derivatives (#452). regenerated jhu_csse_daily_subset dataset latest versions data API changed approach versioning, see DEVELOPMENT.md details select grouped epi_dfs now drops epi_dfness makes sense; PR #390 Minor documentation updates; PR #393 Improved epi_archive print method. Compactified metadata shows snippet underlying DT (#341). Added autoplot method epi_df objects, creates ggplot2 plot epi_df (#382). Refactored internals use cli warnings/errors checkmate argument checking (#413). Fix logic auto-assign epi_df time_type week (#416) year (#441). Clarified “Get started” example getting Ebola line list data epi_df format. Improved documentation web site landing page’s introduction. Fixed documentation referring old epi_slide() interface (#466, thanks @XuedaShen!).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-8","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.8","text":"Resolved linting messages package checks (#468).","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-7-0","dir":"Changelog","previous_headings":"","what":"Breaking changes:","title":"epiprocess 0.7.0","text":"Switched epi_df’s other_keys default NULL character(0); PR #390 Refactored epi_archive use S3 instead R6 object model. functionality stay , break member function interface. migration, can usually just convert epi_archive$merge(...) epi_archive <- epi_archive %>% epix_merge(...) (fill_through_version truncate_after_version) epi_archive$slide(...) epi_archive %>% epix_slide(...) (as_of, group_by, slide, etc.) (#340). limited situations, helper function calls epi_archive$merge etc. one arguments, may need carefully refactor .","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-7-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.7.0","text":"Updated vignettes compatibility epidatr 1.0.0 PR #377.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-7-0-1","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.7.0","text":"make existing slide computations work, add third argument f function accept new input: e.g., change f = function(x, g, ) { } f = function(x, g, rt, ) { }.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-7-0","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.7.0","text":"f formula, can now access reference time value via .z .ref_time_value. f missing, tidy evaluation expression ... can now refer window data epi_df tibble .x, group key .group_key, reference time value .ref_time_value. usual .data .env pronouns also work, butpick() cur_data() ; work .x instead. keep old behavior, manually perform row recycling within f computations, /left_join data frame representing desired output structure current epix_slide() result obtain desired repetitions completions expected all_rows = TRUE. keep old behavior, convert output epix_slide() epi_df desired set metadata appropriately.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-7-0-1","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.7.0","text":"epi_slide epix_slide now support as_list_col = TRUE slide computations output atomic vectors, output list column “chopped” format (see tidyr::chop). epi_slide now works properly slide computations output just Date vector, rather converting slide_value numeric column. Fix ?archive_cases_dv_subset information regarding modifications upstream data @brookslogan (#299). Update use updated epidatr (fetch_tbl -> fetch) @brookslogan (#319).","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-6-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.6.0","text":"epi_slide’s time windows now extend time steps time steps corresponding ref_time_values. See ?epi_slide details matching old alignments. epix_slide’s time windows now extend time steps corresponding ref_time_values way latest data available corresponding ref_time_values. obtain old behavior, dplyr::ungroup slide results immediately. using as_list_col = TRUE together ref_time_values all_rows=TRUE, marker excluded computations now NULL entry list column, rather NA; using tidyr::unnest() afterward want keep missing data markers, need replace NULL entries NAs. Skipped computations now uniformly detectable using vctrs methods. x %>% epix_slide(, group_by=c(col1, col2)) x %>% epix_slide(, group_by=all_of(colname_vector)) x %>% group_by(col1, col2) %>% epix_slide() x %>% group_by(across(all_of(colname_vector))) %>% epix_slide() obtain old behavior, precede epix_slide call lacking group_by argument appropriate group_by call. epix_slide now guesses ref_time_values regularly spaced sequence covering DT$version values version_end, rather distinct DT$time_values. obtain old behavior, pass ref_time_values = unique($DT$time_value). epi_archive’s clobberable_versions_start’s default now NA, warnings default potential nonreproducibility. obtain old behavior, pass clobberable_versions_start = max_version_with_row_in(x).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"potentially-breaking-changes-0-6-0","dir":"Changelog","previous_headings":"","what":"Potentially-breaking changes","title":"epiprocess 0.6.0","text":"Fixed [ grouped epi_dfs maintain grouping possible dropping epi_df class (e.g., removing time_value column). Fixed epi_df operations consistent decaying non-epi_dfs result operation doesn’t make sense epi_df (e.g., removing time_value column). Changed bind_rows grouped epi_dfs drop epi_df class. Like ungrouped epi_dfs, metadata result still simply taken first result, may inappropriate (#242). epi_slide epix_slide now raise error rather silently filtering ref_time_values don’t meet expectations.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-6-0","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.6.0","text":"epix_slide, $slide new parameter all_versions. all_versions=TRUE, epix_slide pass filtered epi_archive computation rather epi_df snapshot. enables, e.g., performing pseudoprospective forecasts revision-aware forecaster using nested epix_slide operations.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-6-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.6.0","text":"Added dplyr::group_by dplyr::ungroup S3 methods epi_archive objects, plus corresponding $group_by $ungroup R6 methods. group_by implementation supports .add .drop arguments, ungroup supports partial ungrouping .... as_epi_archive, epi_archive$new now perform checks key uniqueness requirement (part #154).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-6-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.6.0","text":"Added NEWS.md file track changes package. Implemented ?dplyr::dplyr_extending epi_dfs (#223). Fixed various small documentation issues (#217).","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"potentially-breaking-changes-0-5-0","dir":"Changelog","previous_headings":"","what":"Potentially-breaking changes","title":"epiprocess 0.5.0","text":"epix_slide, $slide now feed f epi_df rather converting tibble/tbl_df first, allowing use epi_df methods metadata, often yielding epi_dfs slide result. obtain old behavior, convert tibble within f.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-5-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.5.0","text":"Fixed epix_merge, $merge always raising error sync=\"truncate\".","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-5-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.5.0","text":"Added Remotes: entry genlasso, removed CRAN. Added as_epi_archive tests. Added missing epix_merge test sync=\"truncate\".","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"potentially-breaking-changes-0-4-0","dir":"Changelog","previous_headings":"","what":"Potentially-breaking changes","title":"epiprocess 0.4.0","text":"Fixed [.epi_df reorder columns, incompatible downstream packages. Changed [.epi_df decay--tibble logic coherent epi_dfs current tolerance nonunique keys: stopped decaying tibble cases unique key wouldn’t preserved, since don’t enforce unique key elsewhere. Fixed [.epi_df adjust \"other_keys\" metadata corresponding columns selected . Fixed [.epi_df raise error resulting column names nonunique. Fixed [.epi_df drop metadata decaying tibble (due removal essential columns).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-4-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.4.0","text":"Added check epi_df additional_metadata list. Fixed incorrect as_epi_df examples.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-4-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.4.0","text":"Applied rename upstream package examples: delphi.epidata -> epidatr. Rounded [.epi_df tests.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-3-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.3.0","text":"Compactification (see ) default may change results working directly epi_archive’s DT field; disable, pass compactify=FALSE. epix_ mutate input epi_archives, may alias alias fields (worry user sticks epix_* functions “regular” R functions copy--write-like behavior, avoiding mutating functions [.data.table). x$ may mutate x; mutates x, return x invisibly (makes sense), , fields, may either mutate object refers reseat reference (); x$ mutate x, result may contain aliases x fields. Removed ..., locf, nan parameters. Changed default behavior, now corresponds using =key(x$DT) (demanding set column names key(y$DT)), =TRUE, locf=TRUE, nan=NaN (post-filling step fixed apply gaps, longer fill NAs originating x$DT y$DT). x y longer allowed share names non-columns. epix_merge longer mutates x argument ($merge continues ). Removed (undocumented) capability passing data.table y. Removed inappropriate/misleading n=7 default argument (due reporting latency, n=7 yield 7 days data typical daily-reporting surveillance data source, one might assumed).","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-3-0","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.3.0","text":"New compactify parameter allows removal rows redundant purposes epi_archive’s methods, use last version observation carried forward. New clobberable_versions_start field allows marking range versions “clobbered” (rewritten without assigning new version tags); previously, hard-coded max($DT$version). New versions_end field allows marking range versions beyond max($DT$version) observed, contained changes. New sync parameter controls x y aren’t equally date (.e., x$versions_end y$versions_end different). New function epix_fill_through_version, method $fill_through_version: non-mutating & mutating way ensure archive contains versions least fill_versions_end, extrapolating according necessary. Example archive data object now constructed demand underlying data, based user’s version epi_archive rather outdated R6 implementation whenever data object generated.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"breaking-changes-0-2-0","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"epiprocess 0.2.0","text":"Removed default n=7 argument epix_slide.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-2-0","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.2.0","text":"Ignore NAs printing time_value range epi_archive. Fixed misleading column naming epix_slide example. Trimmed epi_slide examples. Synced --date docs.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-2-0","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.2.0","text":"Removed dependency epi_archive tests example archive. object, made understandable reading without running. Fixed epi_df tests relying S3 method epi_df implemented externally epiprocess. Added tests epi_archive methods wrapper functions. Removed dead code. Made .{Rbuild,git}ignore files comprehensive.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-1-2","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.1.2","text":"treats x optional, constructing empty epi_df default.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-1-2","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.1.2","text":"Fixed geo_type guessing alphabetical strings 2 characters yield \"custom\", US \"nation\". Fixed time_type guessing actually detect Date-class time_values regularly spaced 7 days apart \"week\"-type intended. Improved printing epi_dfs, epi_archivess. Fixed as_of cut (forecast-like) data time_value > max_version. Expanded epi_df docs include conversion tsibble/tbl_ts objects, usage other_keys, pre-processing objects following geo_value, time_value naming scheme. Expanded epi_slide examples show use f argument named parameters. Updated examples print relevant columns given common 80-column terminal width. Added growth rate examples. Improved as_epi_archive epi_archive$new/$initialize documentation, including constructing toy archive.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-1-2","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.1.2","text":"Added tests epi_slide, epi_cor, internal utility functions. Fixed currently-unused internal utility functions MiddleL, MiddleR yield correct results odd-length vectors.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"new-features-0-1-1","dir":"Changelog","previous_headings":"","what":"New features","title":"epiprocess 0.1.1","text":"New example data objects allow one quickly experiment epi_dfs epi_archives without relying/waiting API fetch data.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"improvements-0-1-1","dir":"Changelog","previous_headings":"","what":"Improvements","title":"epiprocess 0.1.1","text":"Improved epi_slide error messaging. Fixed description appropriate parameters f argument epi_slide; previous description give incorrect behavior f named parameters receive values epi_slide’s .... Added examples throughout package. Using example data objects vignettes also speeds vignette compilation.","code":""},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"cleanup-0-1-1","dir":"Changelog","previous_headings":"","what":"Cleanup","title":"epiprocess 0.1.1","text":"Set gh-actions CI. Added tests epi_dfs.","code":""},{"path":[]},{"path":"https://cmu-delphi.github.io/epiprocess/dev/news/index.html","id":"implemented-core-functionality-vignettes-0-1-0","dir":"Changelog","previous_headings":"","what":"Implemented core functionality, vignettes","title":"epiprocess 0.1.0","text":"as_epi_df converts epi_df, guessing geo_type, time_type, other_keys, as_of specified. as_epi_df.tbl_ts as_tsibble.epi_df automatically set other_keys key&index, respectively. epi_slide applies user-supplied computation sliding/rolling time window user-specified groups, adding results new columns, recycling/broadcasting results keep result size stable. Allows computation provided function, purrr-style formula, tidyeval dots. Uses slider underneath efficiency. epi_cor calculates Pearson, Kendall, Spearman correlations two (optionally time-shifted) variables epi_df within user-specified groups. Convenience function: is_epi_df. as_epi_archive: prepares epi_archive object data frame containing snapshots /patch data every available version data set. as_of: extracts snapshot data set requested version, epi_df format. epix_slide, $slide: similar epi_slide, epi_archives; requested ref_time_value group, applies time window user-specified computation snapshot data ref_time_value. epix_merge, $merge: like merge epi_archives, allowing last version observation carried forward fill gaps x y. Convenience function: is_epi_archive. growth_rate: estimates growth rate time series using one built-methods based relative change, linear regression, smoothing splines, trend filtering. detect_outlr: applies one outlier detection methods given signal variable, optionally aggregates outputs create consensus result. detect_outlr_rm: outlier detection function based rolling-median-based outlier detection function; one methods included detect_outlr. detect_outlr_stl: outlier detection function based seasonal-trend decomposition using LOESS (STL); one methods included detect_outlr.","code":""}]