Skip to content

Commit

Permalink
Clarify time_near_latest -> lag_near_latest
Browse files Browse the repository at this point in the history
So it is not misinterpreted as "the amount of time that it has been near the latest".
  • Loading branch information
brookslogan committed Oct 10, 2024
1 parent 9ac813d commit 97fdc29
Show file tree
Hide file tree
Showing 5 changed files with 59 additions and 53 deletions.
30 changes: 16 additions & 14 deletions R/revision_analysis.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@
#' 8. `rel_spread`: `spread` divided by the largest value (so it will
#' always be less than 1). Note that this need not be the final value. It will
#' be `NA` whenever `spread` is 0.
#' 9. `time_near_latest`: This gives the lag when the value is within
#' `within_latest` (default 20%) of the value at the latest time. For example,
#' consider the series (0, 20, 99, 150, 102, 100); then `time_near_latest` is
#' the 5th index, since even though 99 is within 20%, it is outside the window
#' afterwards at 150.
#' 9. `lag_near_latest`: This gives the lag when the value is within and
#' remains `within_latest` (default 20%) of the value at the latest time. For
#' example, consider the series (0, 20, 99, 150, 102, 100); then
#' `lag_near_latest` is the 5th index, since even though 99 is within 20%, it
#' is outside the window afterwards at 150.
#' @param epi_arch an epi_archive to be analyzed
#' @param ... <[`tidyselect`][dplyr_tidy_select]>, used to choose the column to
#' summarize. If empty, it chooses the first. Currently only implemented for
Expand All @@ -39,7 +39,7 @@
#' final value for case counts as reported in the context of insurance. To
#' avoid this filtering, either set to `NULL` or 0.
#' @param within_latest double between 0 and 1. Determines the threshold
#' used for the `time_to`
#' used for the `lag_to`
#' @param quick_revision difftime or integer (integer is treated as days), for
#' the printed summary, the amount of time between the final revision and the
#' actual time_value to consider the revision quickly resolved. Default of 3
Expand Down Expand Up @@ -140,7 +140,7 @@ revision_summary <- function(epi_arch,
min_value = f_no_na(min, .data[[arg]]),
max_value = f_no_na(max, .data[[arg]]),
median_value = f_no_na(median, .data[[arg]]),
time_to = time_within_x_latest(lag, .data[[arg]], prop = within_latest),
lag_to = lag_within_x_latest(lag, .data[[arg]], prop = within_latest),
.groups = "drop"
) %>%
mutate(
Expand All @@ -149,12 +149,12 @@ revision_summary <- function(epi_arch,
# TODO the units here may be a problem
min_lag = as.difftime(min_lag, units = "days"), # nolint: object_usage_linter
max_lag = as.difftime(max_lag, units = "days"), # nolint: object_usage_linter
time_near_latest = as.difftime(time_to, units = "days") # nolint: object_usage_linter
lag_near_latest = as.difftime(lag_to, units = "days") # nolint: object_usage_linter
) %>%
select(-time_to) %>%
select(-lag_to) %>%
relocate(
time_value, geo_value, all_of(epikey_names), n_revisions, min_lag, max_lag, # nolint: object_usage_linter
time_near_latest, spread, rel_spread, min_value, max_value, median_value # nolint: object_usage_linter
lag_near_latest, spread, rel_spread, min_value, max_value, median_value # nolint: object_usage_linter
)
if (print_inform) {
cli_inform("Min lag (time to first version):")
Expand Down Expand Up @@ -205,15 +205,17 @@ revision_summary <- function(epi_arch,
cli_li(num_percent(abs_spread, n_real_revised, ""))

cli_inform("{units(quick_revision)} until within {within_latest*100}% of the latest value:")
difftime_summary(revision_behavior[["time_near_latest"]]) %>% print()
difftime_summary(revision_behavior[["lag_near_latest"]]) %>% print()
}
return(revision_behavior)
}

#' pull the value from lags when values starts indefinitely being within prop of it's last value.
#' @param values this should be a vector (e.g., a column). errors may occur otherwise
#' pull the value from lags when values starts indefinitely being within prop of its latest value.
#' @param lags vector of lags; should be sorted
#' @param values this should be a vector (e.g., a column) with length matching that of `lags`
#' @param prop optional length-1 double; proportion
#' @keywords internal
time_within_x_latest <- function(lags, values, prop = .2) {
lag_within_x_latest <- function(lags, values, prop = .2) {
latest_value <- values[[length(values)]]
close_enough <- abs(values - latest_value) < prop * latest_value
# we want to ignore any stretches where it's close, but goes farther away later
Expand Down
19 changes: 19 additions & 0 deletions man/lag_within_x_latest.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 6 additions & 6 deletions man/revision_summary.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 0 additions & 15 deletions man/time_within_x_latest.Rd

This file was deleted.

36 changes: 18 additions & 18 deletions tests/testthat/_snaps/revision-latency-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,15 @@
min median mean max
0 days 3 days 6.9 days 19 days
# A tibble: 7 x 11
time_value geo_value n_revisions min_lag max_lag time_near_latest spread
<date> <chr> <dbl> <drtn> <drtn> <drtn> <dbl>
1 2020-01-01 ak 4 2 days 19 days 19 days 101
2 2020-01-02 ak 1 4 days 5 days 4 days 9
3 2020-01-03 ak 0 3 days 3 days 3 days 0
4 2020-01-01 al 1 0 days 19 days 19 days 99
5 2020-01-02 al 0 0 days 0 days 0 days 0
6 2020-01-03 al 1 1 days 2 days 2 days 3
7 2020-01-04 al 0 1 days 1 days 1 days 0
time_value geo_value n_revisions min_lag max_lag lag_near_latest spread
<date> <chr> <dbl> <drtn> <drtn> <drtn> <dbl>
1 2020-01-01 ak 4 2 days 19 days 19 days 101
2 2020-01-02 ak 1 4 days 5 days 4 days 9
3 2020-01-03 ak 0 3 days 3 days 3 days 0
4 2020-01-01 al 1 0 days 19 days 19 days 99
5 2020-01-02 al 0 0 days 0 days 0 days 0
6 2020-01-03 al 1 1 days 2 days 2 days 3
7 2020-01-04 al 0 1 days 1 days 1 days 0
rel_spread min_value max_value median_value
<dbl> <dbl> <dbl> <dbl>
1 0.990 1 102 6
Expand Down Expand Up @@ -73,15 +73,15 @@
min median mean max
0 days 3 days 6.9 days 19 days
# A tibble: 7 x 11
time_value geo_value n_revisions min_lag max_lag time_near_latest spread
<date> <chr> <dbl> <drtn> <drtn> <drtn> <dbl>
1 2020-01-01 ak 6 2 days 19 days 19 days 101
2 2020-01-02 ak 1 4 days 5 days 4 days 9
3 2020-01-03 ak 0 3 days 3 days 3 days 0
4 2020-01-01 al 1 0 days 19 days 19 days 99
5 2020-01-02 al 0 0 days 0 days 0 days 0
6 2020-01-03 al 1 1 days 2 days 2 days 3
7 2020-01-04 al 1 0 days 1 days 1 days 0
time_value geo_value n_revisions min_lag max_lag lag_near_latest spread
<date> <chr> <dbl> <drtn> <drtn> <drtn> <dbl>
1 2020-01-01 ak 6 2 days 19 days 19 days 101
2 2020-01-02 ak 1 4 days 5 days 4 days 9
3 2020-01-03 ak 0 3 days 3 days 3 days 0
4 2020-01-01 al 1 0 days 19 days 19 days 99
5 2020-01-02 al 0 0 days 0 days 0 days 0
6 2020-01-03 al 1 1 days 2 days 2 days 3
7 2020-01-04 al 1 0 days 1 days 1 days 0
rel_spread min_value max_value median_value
<dbl> <dbl> <dbl> <dbl>
1 0.990 1 102 5.5
Expand Down

0 comments on commit 97fdc29

Please sign in to comment.