From 0ae1942118148674c140b3214e47382fe864615d Mon Sep 17 00:00:00 2001 From: hfrick Date: Wed, 4 Sep 2024 17:18:50 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20tidymode?= =?UTF-8?q?ls/rsample@5c8d38e64129a063029a45bdd13c431601202278=20?= =?UTF-8?q?=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- dev/articles/Common_Patterns.html | 12 +++++++----- dev/news/index.html | 1 + dev/pkgdown.yml | 2 +- dev/reference/index.html | 2 +- dev/reference/rolling_origin.html | 23 +++++++++++++++++++---- dev/search.json | 2 +- 6 files changed, 30 insertions(+), 12 deletions(-) diff --git a/dev/articles/Common_Patterns.html b/dev/articles/Common_Patterns.html index 7086f1f8..bb9744e4 100644 --- a/dev/articles/Common_Patterns.html +++ b/dev/articles/Common_Patterns.html @@ -494,16 +494,18 @@

Time-Based Resamplingrolling_origin() -function:

+observations, you can use the sliding_window() function +with lookback = -Inf:

-rolling_origin(Chicago) %>%
+sliding_window(Chicago, lookback = Inf) %>%
   head(2)
 #> # A tibble: 2 × 2
 #>   splits        id       
 #>   <list>        <chr>    
-#> 1 <split [5/1]> Slice0001
-#> 2 <split [6/1]> Slice0002
+#> 1 <split [1/1]> Slice0001 +#> 2 <split [2/1]> Slice0002 +

This is commonly referred to as “evaluation on a rolling forecasting +origin”, or more colloquially, “rolling origin cross-validation”.

Note that all of these time-based resampling functions are deterministic: unlike the rest of the package, running these functions repeatedly under different random seeds will always return the same diff --git a/dev/news/index.html b/dev/news/index.html index be3cb62e..bf022be7 100644 --- a/dev/news/index.html +++ b/dev/news/index.html @@ -50,6 +50,7 @@

inner_split() function and its methods for various resamples is for usage in tune to create a inner resample of the analysis set to fit the preprocessor and model on one part and the post-processor on the other part (#483, #488, #489).

  • Started moving error messages to cli (#499, #502). With contributions from @JamesHWade (#518).

  • Fixed example for nested_cv() (@seb09, #520).

  • +
  • rolling_origin() is now superseded by sliding_window(), sliding_index(), and sliding_period() which provide more flexibility and control (@nmercadeb, #524).

  • Removed trailing space in printing of mc_cv() objects (@ccani007, #464).

  • Improved documentation for initial_split() and friends (@laurabrianna, #519).

  • Formatting improvement: package names are now not in backticks anymore (@agmurray, #525).

  • diff --git a/dev/pkgdown.yml b/dev/pkgdown.yml index ee1ea865..30cd4784 100644 --- a/dev/pkgdown.yml +++ b/dev/pkgdown.yml @@ -7,7 +7,7 @@ articles: Applications/Recipes_and_rsample: Applications/Recipes_and_rsample.html rsample: rsample.html Working_with_rsets: Working_with_rsets.html -last_built: 2024-09-04T16:10Z +last_built: 2024-09-04T17:17Z urls: reference: https://rsample.tidymodels.org/reference article: https://rsample.tidymodels.org/articles diff --git a/dev/reference/index.html b/dev/reference/index.html index 89520063..5118590b 100644 --- a/dev/reference/index.html +++ b/dev/reference/index.html @@ -130,7 +130,7 @@

    Resampling methodsrolling_origin() - + superseded
    Rolling Origin Forecast Resampling
    diff --git a/dev/reference/rolling_origin.html b/dev/reference/rolling_origin.html index eaaa407e..d705893e 100644 --- a/dev/reference/rolling_origin.html +++ b/dev/reference/rolling_origin.html @@ -1,11 +1,21 @@ -Rolling Origin Forecast Resampling — rolling_origin • rsampleRolling Origin Forecast Resampling — rolling_origin • rsample +sorted in time order. +This function is superseded by sliding_window(), sliding_index(), and +sliding_period() which provide more flexibility and control. Superseded +functions will not go away, but active development will be focused on the new +functions."> Skip to content @@ -53,10 +63,15 @@
    -

    This resampling method is useful when the data set has a strong time +

    [Superseded]

    +

    This resampling method is useful when the data set has a strong time component. The resamples are not random and contain data points that are consecutive values. The function assumes that the original data set are sorted in time order.

    +

    This function is superseded by sliding_window(), sliding_index(), and +sliding_period() which provide more flexibility and control. Superseded +functions will not go away, but active development will be focused on the new +functions.

    diff --git a/dev/search.json b/dev/search.json index 51be5251..74008602 100644 --- a/dev/search.json +++ b/dev/search.json @@ -1 +1 @@ -[{"path":[]},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to tidymodels","title":"Contributing to tidymodels","text":"detailed information contributing tidymodels packages, see development contributing guide.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":"documentation","dir":"","previous_headings":"","what":"Documentation","title":"Contributing to tidymodels","text":"Typos grammatical errors documentation may edited directly using GitHub web interface, long changes made source file. YES ✅: edit roxygen comment .R file R/ directory. 🚫: edit .Rd file man/ directory. use roxygen2, Markdown syntax, documentation.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":"code","dir":"","previous_headings":"","what":"Code","title":"Contributing to tidymodels","text":"submit 🎯 pull request tidymodels package, always file issue confirm tidymodels team agrees idea happy basic proposal. tidymodels packages work together. package contains unit tests, integration tests tests using packages contained extratests. pull requests, recommend create fork repo usethis::create_from_github(), initiate new branch usethis::pr_init(). Look build status making changes. README contains badges continuous integration services used package. New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. user-facing changes, add bullet top NEWS.md current development version header describing changes made followed GitHub username, links relevant issue(s)/PR(s). use testthat. Contributions test cases included easier accept. contribution spans use one package, consider building extratests changes check breakages /adding new tests . Let us know PR ran extra tests.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"Code","what":"Code of Conduct","title":"Contributing to tidymodels","text":"project released Contributor Code Conduct. contributing project, agree abide terms.","code":""},{"path":"https://rsample.tidymodels.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2021 rsample authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"a-nonlinear-regression-example","dir":"Articles > Applications","previous_headings":"","what":"A nonlinear regression example","title":"Bootstrap confidence intervals","text":"demonstrate computations different types intervals, ’ll use nonlinear regression example Baty et al (2015). showed data monitored oxygen uptake patient rest exercise phases (data frame O2K). authors fit segmented regression model transition point known (time exercise commenced). model : broom::tidy() returns analysis object standardized way. column names shown used types objects allows us use results easily. rsample, ’ll rely tidy() method work bootstrap estimates need confidence intervals. ’s example end univariate statistic isn’t automatically formatted tidy(). run model different bootstraps, ’ll write function uses split object input produces tidy data frame: First, let’s create set resamples fit separate models . options apparent = TRUE set. creates final resample copy original (unsampled) data set. required interval methods. Let’s look data see outliers aberrant results: Now let’s create scatterplot matrix: One potential outlier right VO2peak ’ll leave . univariate distributions :","code":"library(tidymodels) library(nlstools) library(GGally) data(O2K) ggplot(O2K, aes(x = t, y = VO2)) + geom_point() nonlin_form <- as.formula( VO2 ~ (t <= 5.883) * VO2rest + (t > 5.883) * (VO2rest + (VO2peak - VO2rest) * (1 - exp(-(t - 5.883) / mu))) ) # Starting values from visual inspection start_vals <- list(VO2rest = 400, VO2peak = 1600, mu = 1) res <- nls(nonlin_form, start = start_vals, data = O2K) tidy(res) ## # A tibble: 3 × 5 ## term estimate std.error statistic p.value ## ## 1 VO2rest 357. 11.4 31.3 4.27e-26 ## 2 VO2peak 1631. 21.5 75.9 1.29e-38 ## 3 mu 1.19 0.0766 15.5 1.08e-16 # Will be used to fit the models to different bootstrap data sets: fit_fun <- function(split, ...) { # We could check for convergence, make new parameters, etc. nls(nonlin_form, data = analysis(split), ...) %>% tidy() } set.seed(462) nlin_bt <- bootstraps(O2K, times = 2000, apparent = TRUE) %>% mutate(models = map(splits, ~ fit_fun(.x, start = start_vals))) nlin_bt ## # Bootstrap sampling with apparent sample ## # A tibble: 2,001 × 3 ## splits id models ## ## 1 Bootstrap0001 ## 2 Bootstrap0002 ## 3 Bootstrap0003 ## 4 Bootstrap0004 ## 5 Bootstrap0005 ## 6 Bootstrap0006 ## 7 Bootstrap0007 ## 8 Bootstrap0008 ## 9 Bootstrap0009 ## 10 Bootstrap0010 ## # ℹ 1,991 more rows nlin_bt$models[[1]] ## # A tibble: 3 × 5 ## term estimate std.error statistic p.value ## ## 1 VO2rest 359. 10.7 33.5 4.59e-27 ## 2 VO2peak 1656. 31.1 53.3 1.39e-33 ## 3 mu 1.23 0.113 10.9 2.01e-12 library(tidyr) nls_coef <- nlin_bt %>% dplyr::select(-splits) %>% # Turn it into a tibble by stacking the `models` col unnest(cols = models) %>% # Get rid of unneeded columns dplyr::select(id, term, estimate) head(nls_coef) ## # A tibble: 6 × 3 ## id term estimate ## ## 1 Bootstrap0001 VO2rest 359. ## 2 Bootstrap0001 VO2peak 1656. ## 3 Bootstrap0001 mu 1.23 ## 4 Bootstrap0002 VO2rest 358. ## 5 Bootstrap0002 VO2peak 1662. ## 6 Bootstrap0002 mu 1.26 nls_coef %>% # Put different parameters in columns tidyr::pivot_wider(names_from = term, values_from = estimate) %>% # Keep only numeric columns dplyr::select(-id) %>% ggscatmat(alpha = .25) nls_coef %>% ggplot(aes(x = estimate)) + geom_histogram(bins = 20, col = \"white\") + facet_wrap(~ term, scales = \"free_x\")"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"percentile-intervals","dir":"Articles > Applications","previous_headings":"A nonlinear regression example","what":"Percentile intervals","title":"Bootstrap confidence intervals","text":"basic type interval uses percentiles resampling distribution. get percentile intervals, rset object passed first argument second argument list column tidy results: overlaid univariate distributions: intervals compare parametric asymptotic values? percentile intervals wider parametric intervals (assume asymptotic normality). estimates appear normally distributed? can look quantile-quantile plots:","code":"p_ints <- int_pctl(nlin_bt, models) p_ints ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1576. 1632. 1694. 0.05 percentile ## 2 VO2rest 344. 357. 370. 0.05 percentile ## 3 mu 1.00 1.18 1.35 0.05 percentile nls_coef %>% ggplot(aes(x = estimate)) + geom_histogram(bins = 20, col = \"white\") + facet_wrap(~ term, scales = \"free_x\") + geom_vline(data = p_ints, aes(xintercept = .lower), col = \"red\") + geom_vline(data = p_ints, aes(xintercept = .upper), col = \"red\") parametric <- tidy(res, conf.int = TRUE) %>% dplyr::select( term, .lower = conf.low, .estimate = estimate, .upper = conf.high ) %>% mutate( .alpha = 0.05, .method = \"parametric\" ) intervals <- bind_rows(parametric, p_ints) %>% arrange(term, .method) intervals %>% split(intervals$term) ## $mu ## # A tibble: 2 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 mu 1.05 1.19 1.34 0.05 parametric ## 2 mu 1.00 1.18 1.35 0.05 percentile ## ## $VO2peak ## # A tibble: 2 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1590. 1631. 1675. 0.05 parametric ## 2 VO2peak 1576. 1632. 1694. 0.05 percentile ## ## $VO2rest ## # A tibble: 2 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2rest 334. 357. 380. 0.05 parametric ## 2 VO2rest 344. 357. 370. 0.05 percentile nls_coef %>% ggplot(aes(sample = estimate)) + stat_qq() + stat_qq_line(alpha = .25) + facet_wrap(~ term, scales = \"free\")"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"t-intervals","dir":"Articles > Applications","previous_headings":"A nonlinear regression example","what":"t-intervals","title":"Bootstrap confidence intervals","text":"Bootstrap t-intervals estimated computing intermediate statistics t-like structure. use , require estimated variance individual resampled estimate. example, comes along fitted model object. can extract standard errors parameters. Luckily, tidy() methods provide column named std.error. arguments intervals :","code":"t_stats <- int_t(nlin_bt, models) intervals <- bind_rows(intervals, t_stats) %>% arrange(term, .method) intervals %>% split(intervals$term) ## $mu ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 mu 1.05 1.19 1.34 0.05 parametric ## 2 mu 1.00 1.18 1.35 0.05 percentile ## 3 mu 1.00 1.18 1.35 0.05 student-t ## ## $VO2peak ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1590. 1631. 1675. 0.05 parametric ## 2 VO2peak 1576. 1632. 1694. 0.05 percentile ## 3 VO2peak 1568. 1632. 1691. 0.05 student-t ## ## $VO2rest ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2rest 334. 357. 380. 0.05 parametric ## 2 VO2rest 344. 357. 370. 0.05 percentile ## 3 VO2rest 342. 357. 370. 0.05 student-t"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"bias-corrected-and-accelerated-intervals","dir":"Articles > Applications","previous_headings":"A nonlinear regression example","what":"Bias-corrected and accelerated intervals","title":"Bootstrap confidence intervals","text":"bias-corrected accelerated (BCa) intervals, additional argument required. .fn argument function computes statistic interest. first argument rsplit object arguments can passed using ellipses. intervals use internal leave-one-resample compute Jackknife statistic recompute statistic every bootstrap resample. statistic expensive compute, may take time. calculations, use furrr package can computed parallel set parallel processing plan (see ?future::plan). user-facing function takes argument function ellipses.","code":"bias_corr <- int_bca(nlin_bt, models, .fn = fit_fun, start = start_vals) intervals <- bind_rows(intervals, bias_corr) %>% arrange(term, .method) intervals %>% split(intervals$term) ## $mu ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 mu 0.996 1.18 1.34 0.05 BCa ## 2 mu 1.05 1.19 1.34 0.05 parametric ## 3 mu 1.00 1.18 1.35 0.05 percentile ## 4 mu 1.00 1.18 1.35 0.05 student-t ## ## $VO2peak ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1561. 1632. 1680. 0.05 BCa ## 2 VO2peak 1590. 1631. 1675. 0.05 parametric ## 3 VO2peak 1576. 1632. 1694. 0.05 percentile ## 4 VO2peak 1568. 1632. 1691. 0.05 student-t ## ## $VO2rest ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2rest 343. 357. 368. 0.05 BCa ## 2 VO2rest 334. 357. 380. 0.05 parametric ## 3 VO2rest 344. 357. 370. 0.05 percentile ## 4 VO2rest 342. 357. 370. 0.05 student-t"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"no-existing-tidy-method","dir":"Articles > Applications","previous_headings":"","what":"No existing tidy method","title":"Bootstrap confidence intervals","text":"case, function can emulate minimum results: character column called term, numeric column called estimate, , optionally, numeric column called std.error. last column needed int_t(). Suppose just want estimate fold-increase outcome 90th 10th percentiles course experiment. function might look like: Everything else works :","code":"fold_incr <- function(split, ...) { dat <- analysis(split) quants <- quantile(dat$VO2, probs = c(.1, .9)) tibble( term = \"fold increase\", estimate = unname(quants[2]/quants[1]), # We don't know the analytical formula for this std.error = NA_real_ ) } nlin_bt <- nlin_bt %>% mutate(folds = map(splits, fold_incr)) int_pctl(nlin_bt, folds) ## # A tibble: 1 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 fold increase 4.42 4.76 5.05 0.05 percentile int_bca(nlin_bt, folds, .fn = fold_incr) ## # A tibble: 1 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 fold increase 4.53 4.76 5.36 0.05 BCa"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"intervals-for-linearish-parametric-intervals","dir":"Articles > Applications","previous_headings":"","what":"Intervals for linear(ish) parametric intervals","title":"Bootstrap confidence intervals","text":"rsample also contains reg_intervals() function can used linear regression (via lm()), generalized linear models (glm()), log-linear survival models (survival::survreg() survival::coxph()). function makes easier get intervals models. simple example logistic regression using dementia data modeldata package: Let’s fit model predictors: Let’s use model student-t intervals: can also save resamples plotting: Now can unnest data use ggplot:","code":"data(ad_data, package = \"modeldata\") lr_mod <- glm(Class ~ male + age + Ab_42 + tau, data = ad_data, family = binomial) glance(lr_mod) ## # A tibble: 1 × 8 ## null.deviance df.null logLik AIC BIC deviance df.residual nobs ## ## 1 391. 332 -140. 289. 308. 279. 328 333 tidy(lr_mod) ## # A tibble: 5 × 5 ## term estimate std.error statistic p.value ## ## 1 (Intercept) 129. 112. 1.15 0.250 ## 2 male -0.744 0.307 -2.43 0.0152 ## 3 age -125. 114. -1.10 0.272 ## 4 Ab_42 0.534 0.104 5.14 0.000000282 ## 5 tau -1.78 0.309 -5.77 0.00000000807 set.seed(29832) lr_int <- reg_intervals(Class ~ male + age + Ab_42 + tau, data = ad_data, model_fn = \"glm\", family = binomial) lr_int ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 Ab_42 0.316 0.548 0.765 0.05 student-t ## 2 age -332. -133. 85.7 0.05 student-t ## 3 male -1.35 -0.755 -0.133 0.05 student-t ## 4 tau -2.38 -1.83 -1.17 0.05 student-t set.seed(29832) lr_int <- reg_intervals(Class ~ male + age + Ab_42 + tau, data = ad_data, keep_reps = TRUE, model_fn = \"glm\", family = binomial) lr_int ## # A tibble: 4 × 7 ## term .lower .estimate .upper .alpha .method .replicates ## > ## 1 Ab_42 0.316 0.548 0.765 0.05 student-t [1,001 × 2] ## 2 age -332. -133. 85.7 0.05 student-t [1,001 × 2] ## 3 male -1.35 -0.755 -0.133 0.05 student-t [1,001 × 2] ## 4 tau -2.38 -1.83 -1.17 0.05 student-t [1,001 × 2] lr_int %>% select(term, .replicates) %>% unnest(cols = .replicates) %>% ggplot(aes(x = estimate)) + geom_histogram(bins = 30) + facet_wrap(~ term, scales = \"free_x\") + geom_vline(data = lr_int, aes(xintercept = .lower), col = \"red\") + geom_vline(data = lr_int, aes(xintercept = .upper), col = \"red\") + geom_vline(xintercept = 0, col = \"green\")"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Recipes_and_rsample.html","id":"an-example-recipe","dir":"Articles > Applications","previous_headings":"","what":"An Example Recipe","title":"Recipes with rsample","text":"illustration, Ames housing data used. sale prices homes along various descriptors property: Suppose fit simple regression model formula: distribution lot size right-skewed: might benefit model estimate transformation data using Box-Cox procedure. Also, note frequencies neighborhoods can vary: resampled, neighborhoods included test set result column dummy variables zero entries. true House_Style variable. might want collapse rarely occurring values “” categories. define design matrix, initial recipe created: recreates work formula method traditionally uses additional steps. original data object ames used call, used define variables characteristics single recipe valid across resampled versions data. recipe can estimated analysis component resample. execute recipe entire data set: get values data, bake function can used: Note fewer dummy variables Neighborhood House_Style data. Also, code using prep() benefits default argument retain = TRUE, keeps processed version data set don’t reapply steps extract processed values. data used train recipe, used: next section explore recipes bootstrap resampling modeling:","code":"data(ames, package = \"modeldata\") log10(Sale_Price) ~ Neighborhood + House_Style + Year_Sold + Lot_Area library(ggplot2) theme_set(theme_bw()) ggplot(ames, aes(x = Lot_Area)) + geom_histogram(binwidth = 5000, col = \"red\", fill =\"red\", alpha = .5) ggplot(ames, aes(x = Neighborhood)) + geom_bar() + coord_flip() + xlab(\"\") library(recipes) # Apply log10 transformation outside the recipe # https://www.tmwr.org/recipes.html#skip-equals-true ames <- ames %>% mutate(Sale_Price = log10(Sale_Price)) rec <- recipe(Sale_Price ~ Neighborhood + House_Style + Year_Sold + Lot_Area, data = ames) %>% # Collapse rarely occurring jobs into \"other\" step_other(Neighborhood, House_Style, threshold = 0.05) %>% # Dummy variables on the qualitative predictors step_dummy(all_nominal()) %>% # Unskew a predictor step_BoxCox(Lot_Area) %>% # Normalize step_center(all_predictors()) %>% step_scale(all_predictors()) rec rec_training_set <- prep(rec, training = ames) rec_training_set ## ## ── Recipe ──────────────────────────────────────────────────────────────── ## ## ── Inputs ## Number of variables by role ## outcome: 1 ## predictor: 4 ## ## ── Training information ## Training data contained 2930 data points and no incomplete rows. ## ## ── Operations ## • Collapsing factor levels for: Neighborhood and House_Style | Trained ## • Dummy variables from: Neighborhood and House_Style | Trained ## • Box-Cox transformation on: Lot_Area | Trained ## • Centering for: Year_Sold and Lot_Area, ... | Trained ## • Scaling for: Year_Sold and Lot_Area, ... | Trained # By default, the selector `everything()` is used to # return all the variables. Other selectors can be used too. bake(rec_training_set, new_data = head(ames)) ## # A tibble: 6 × 14 ## Year_Sold Lot_Area Sale_Price Neighborhood_College_Creek ## ## 1 1.68 2.70 5.33 -0.317 ## 2 1.68 0.506 5.02 -0.317 ## 3 1.68 0.930 5.24 -0.317 ## 4 1.68 0.423 5.39 -0.317 ## 5 1.68 0.865 5.28 -0.317 ## 6 1.68 0.197 5.29 -0.317 ## # ℹ 10 more variables: Neighborhood_Old_Town , ## # Neighborhood_Edwards , Neighborhood_Somerset , ## # Neighborhood_Northridge_Heights , Neighborhood_Gilbert , ## # Neighborhood_Sawyer , Neighborhood_other , ## # House_Style_One_Story , House_Style_Two_Story , ## # House_Style_other bake(rec_training_set, new_data = NULL) %>% head ## # A tibble: 6 × 14 ## Year_Sold Lot_Area Sale_Price Neighborhood_College_Creek ## ## 1 1.68 2.70 5.33 -0.317 ## 2 1.68 0.506 5.02 -0.317 ## 3 1.68 0.930 5.24 -0.317 ## 4 1.68 0.423 5.39 -0.317 ## 5 1.68 0.865 5.28 -0.317 ## 6 1.68 0.197 5.29 -0.317 ## # ℹ 10 more variables: Neighborhood_Old_Town , ## # Neighborhood_Edwards , Neighborhood_Somerset , ## # Neighborhood_Northridge_Heights , Neighborhood_Gilbert , ## # Neighborhood_Sawyer , Neighborhood_other , ## # House_Style_One_Story , House_Style_Two_Story , ## # House_Style_other library(rsample) set.seed(7712) bt_samples <- bootstraps(ames) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 2 ## splits id ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows bt_samples$splits[[1]] ## ## <2930/1095/2930>"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Recipes_and_rsample.html","id":"working-with-resamples","dir":"Articles > Applications","previous_headings":"","what":"Working with Resamples","title":"Recipes with rsample","text":"can add recipe column tibble. recipes convenience function called prepper() can used call prep() split object first argument (easier purrring): Now, fit model, fit function needs recipe input. code implicitly used retain = TRUE option prep(). Otherwise, split objects also needed bake() recipe (prediction function ). get predictions, function needs three arguments: splits (get assessment data), recipe (process ), model. iterate , function purrr::pmap() used: Calculating RMSE:","code":"library(purrr) bt_samples$recipes <- map(bt_samples$splits, prepper, recipe = rec) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 3 ## splits id recipes ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows bt_samples$recipes[[1]] ## ## ── Recipe ──────────────────────────────────────────────────────────────── ## ## ── Inputs ## Number of variables by role ## outcome: 1 ## predictor: 4 ## ## ── Training information ## Training data contained 2930 data points and no incomplete rows. ## ## ── Operations ## • Collapsing factor levels for: Neighborhood and House_Style | Trained ## • Dummy variables from: Neighborhood and House_Style | Trained ## • Box-Cox transformation on: Lot_Area | Trained ## • Centering for: Year_Sold and Lot_Area, ... | Trained ## • Scaling for: Year_Sold and Lot_Area, ... | Trained fit_lm <- function(rec_obj, ...) lm(..., data = bake(rec_obj, new_data = NULL, everything())) bt_samples$lm_mod <- map( bt_samples$recipes, fit_lm, Sale_Price ~ . ) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 4 ## splits id recipes lm_mod ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows pred_lm <- function(split_obj, rec_obj, model_obj, ...) { mod_data <- bake( rec_obj, new_data = assessment(split_obj), all_predictors(), all_outcomes() ) out <- mod_data %>% select(Sale_Price) out$predicted <- predict(model_obj, newdata = mod_data %>% select(-Sale_Price)) out } bt_samples$pred <- pmap( lst( split_obj = bt_samples$splits, rec_obj = bt_samples$recipes, model_obj = bt_samples$lm_mod ), pred_lm ) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 5 ## splits id recipes lm_mod pred ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows library(yardstick) results <- map(bt_samples$pred, rmse, Sale_Price, predicted) %>% list_rbind() results ## # A tibble: 25 × 3 ## .metric .estimator .estimate ## ## 1 rmse standard 0.132 ## 2 rmse standard 0.128 ## 3 rmse standard 0.129 ## 4 rmse standard 0.123 ## 5 rmse standard 0.125 ## 6 rmse standard 0.140 ## 7 rmse standard 0.129 ## 8 rmse standard 0.130 ## 9 rmse standard 0.122 ## 10 rmse standard 0.127 ## # ℹ 15 more rows mean(results$.estimate) ## [1] 0.129"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"random-resampling","dir":"Articles","previous_headings":"","what":"Random Resampling","title":"Common Resampling Patterns","text":"far away, common use rsample generate simple random resamples data. rsample package includes number functions specifically purpose.","code":""},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"initial-splits","dir":"Articles","previous_headings":"Random Resampling","what":"Initial Splits","title":"Common Resampling Patterns","text":"split data two sets – often referred “training” “testing” sets – rsample provides initial_split() function: output rsplit object observation assigned one two sets. can control proportion data assigned “training” set prop argument: get actual data assigned either set, use training() testing() functions:","code":"initial_split(ames) #> #> <2197/733/2930> initial_split(ames, prop = 0.8) #> #> <2344/586/2930> resample <- initial_split(ames, prop = 0.6) head(training(resample), 2) #> # A tibble: 2 × 74 #> MS_SubClass MS_Zoning Lot_Frontage Lot_Area Street Alley Lot_Shape #> #> 1 One_Story_1946_a… Resident… 110 14333 Pave No_A… Regular #> 2 One_Story_1946_a… Resident… 65 8450 Pave No_A… Regular #> # ℹ 67 more variables: Land_Contour , Utilities , #> # Lot_Config , Land_Slope , Neighborhood , #> # Condition_1 , Condition_2 , Bldg_Type , #> # House_Style , Overall_Cond , Year_Built , #> # Year_Remod_Add , Roof_Style , Roof_Matl , #> # Exterior_1st , Exterior_2nd , Mas_Vnr_Type , #> # Mas_Vnr_Area , Exter_Cond , Foundation , … head(testing(resample), 2) #> # A tibble: 2 × 74 #> MS_SubClass MS_Zoning Lot_Frontage Lot_Area Street Alley Lot_Shape #> #> 1 One_Story_1946_a… Resident… 141 31770 Pave No_A… Slightly… #> 2 One_Story_1946_a… Resident… 80 11622 Pave No_A… Regular #> # ℹ 67 more variables: Land_Contour , Utilities , #> # Lot_Config , Land_Slope , Neighborhood , #> # Condition_1 , Condition_2 , Bldg_Type , #> # House_Style , Overall_Cond , Year_Built , #> # Year_Remod_Add , Roof_Style , Roof_Matl , #> # Exterior_1st , Exterior_2nd , Mas_Vnr_Type , #> # Mas_Vnr_Area , Exter_Cond , Foundation , …"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"v-fold-cross-validation","dir":"Articles","previous_headings":"Random Resampling","what":"V-Fold Cross-Validation","title":"Common Resampling Patterns","text":"evaluate models test set , ’ve completely finished tuning training models. estimate performance model candidates, typically split training data one part used model fitting one part used measuring performance. distinguish set training test set, refer analysis assessment set, respectively. Typically, split training data analysis assessment sets multiple times get stable estimates model performance. Perhaps common cross-validation method V-fold cross-validation. Also known “k-fold cross-validation”, method creates V resamples splitting data V groups (also known “folds”) roughly equal size. analysis set resample made V-1 folds, remaining fold used assessment set. way, observation data used exactly one assessment set. use V-fold cross-validation rsample, use vfold_cv() function: One downside V-fold cross validation tends produce “noisy”, high-variance, estimates compared resampling methods. try reduce variance, ’s often helpful perform ’s known repeated cross-validation, effectively running V-fold resampling procedure multiple times data. perform repeated V-fold cross-validation rsample, can use repeats argument inside vfold_cv():","code":"vfold_cv(ames, v = 2) #> # 2-fold cross-validation #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2 vfold_cv(ames, v = 2, repeats = 2) #> # 2-fold cross-validation repeated 2 times #> # A tibble: 4 × 3 #> splits id id2 #> #> 1 Repeat1 Fold1 #> 2 Repeat1 Fold2 #> 3 Repeat2 Fold1 #> 4 Repeat2 Fold2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"monte-carlo-cross-validation","dir":"Articles","previous_headings":"Random Resampling","what":"Monte-Carlo Cross-Validation","title":"Common Resampling Patterns","text":"alternative V-fold cross-validation Monte-Carlo cross-validation. V-fold assigns observation data one (exactly one) assessment set, Monte-Carlo cross-validation takes random subset data assessment set, meaning observation can used 0, 1, many assessment sets. analysis set made observations weren’t selected. assessment set sampled independently, can repeat many times want. use Monte-Carlo cross-validation rsample, use mc_cv() function: Similar initial_split(), can control proportion data assigned analysis fold using prop. can also control number resamples create using times argument. Monte-Carlo cross-validation tends produce biased estimates V-fold. , computationally feasible typically recommend using five repeats 10-fold cross-validation model assessment.","code":"mc_cv(ames, prop = 0.8, times = 2) #> # Monte Carlo cross-validation (0.8/0.2) with 2 resamples #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"bootstrap-resampling","dir":"Articles","previous_headings":"Random Resampling","what":"Bootstrap Resampling","title":"Common Resampling Patterns","text":"last primary technique rsample creating resamples training data bootstrap resampling. “bootstrap sample” sample data set, size data set, taken replacement single observation might sampled multiple times. assessment set made observations weren’t selected analysis set. Generally, bootstrap resampling produces pessimistic estimates model accuracy. can create bootstrap resamples rsample using bootstraps() function. can’t control proportion data set – assessment set bootstrap resample always size training data – function otherwise works exactly like mc_cv():","code":"bootstraps(ames, times = 2) #> # Bootstrap sampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"validation-set","dir":"Articles","previous_headings":"Random Resampling","what":"Validation Set","title":"Common Resampling Patterns","text":"data vast enough reliable performance estimate just one assessment set, can three-way split data training, validation test set right start. (validation set role single assessment set.) Instead using initial_split() create binary split, can use initial_validation_split() create three-way split: prop argument two elements, specifying proportion data assigned training validation set. create rset object tuning, validation_set() bundles together training validation set, read use tune) package.","code":"three_way_split <- initial_validation_split(ames, prop = c(0.6, 0.2)) three_way_split #> #> <1758/586/586/2930> validation_set(three_way_split) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"stratified-resampling","dir":"Articles","previous_headings":"","what":"Stratified Resampling","title":"Common Resampling Patterns","text":"data heavily imbalanced (, distribution important continuous variable skewed, classes categorical variable much common others), simple random resampling may accidentally skew data even allocating “rare” observations disproportionately analysis assessment fold. situations, can useful instead use stratified resampling ensure analysis assessment folds similar distribution overall data. functions discussed far support stratified resampling strata argument. argument takes single column identifier uses stratify resampling procedure: default, rsample cut continuous variables four bins, ensure bin proportionally represented set. desired, behavior can changed using breaks argument:","code":"vfold_cv(ames, v = 2, strata = Sale_Price) #> # 2-fold cross-validation using stratification #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2 vfold_cv(ames, v = 2, strata = Sale_Price, breaks = 100) #> # 2-fold cross-validation using stratification #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"grouped-resampling","dir":"Articles","previous_headings":"","what":"Grouped Resampling","title":"Common Resampling Patterns","text":"Often, observations data “related” probable random chance, instance represent repeated measurements subject collected single location. situations, often want assign related observations either analysis assessment fold group, avoid assessment data ’s closely related data used fit model. functions discussed far “grouped resampling” variation handle situations. functions start group_ prefix, use argument group specify column used group observations. respecting groups, functions work like ungrouped variants: ’s important note , functions like group_mc_cv() still let specify proportion data analysis set (group_bootstraps() still attempts create analysis sets size original data), rsample won’t “split” groups order exactly meet proportion. functions start assigning one group random set (, group_vfold_cv(), fold) assign remaining groups, random order, whichever set brings relative sizes set closest target proportion. means resamples randomized, can safely use repeated cross-validation just ungrouped resampling, also means can wind differently sized analysis assessment sets anticipated groups unbalanced: grouped resampling functions always focused balancing proportion data analysis set, default group_vfold_cv() attempt balance number groups assigned fold. instead ’d like balance number observations fold (meaning assessment sets similar sizes, smaller groups likely assigned folds happen random chance), can use argument balance = \"observations\": ’re working spatial data, observations often related neighbors rest data set; Tobler’s first law geography puts , “everything related everything else, near things related distant things.” However, often won’t pre-defined “location” variable can use group related observations. spatialsample package provides functions spatial cross-validation using rsample syntax classes, often useful situations.","code":"resample <- group_initial_split(Orange, group = Tree) unique(training(resample)$Tree) #> [1] 1 2 3 4 #> Levels: 3 < 1 < 5 < 2 < 4 unique(testing(resample)$Tree) #> [1] 5 #> Levels: 3 < 1 < 5 < 2 < 4 set.seed(1) group_bootstraps(ames, Neighborhood, times = 2) #> # Group bootstrap sampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 group_vfold_cv(ames, Neighborhood, balance = \"observations\", v = 2) #> # Group 2-fold cross-validation #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"time-based-resampling","dir":"Articles","previous_headings":"","what":"Time-Based Resampling","title":"Common Resampling Patterns","text":"working time-based data, usually doesn’t make sense randomly resample data: random resampling likely result analysis set observations later assessment set, isn’t realistic way assess model performance. , rsample provides different functions make sure data assessment sets analysis set. First , two variants initial_split() initial_validation_split(), initial_time_split() initial_validation_time_split(), assign first rows data training set (number rows assigned determined prop): also several functions rsample help construct multiple analysis assessment sets time-based data. instance, sliding_window() create “windows” data, moving rows data frame: want create sliding windows data based specific variable, can use sliding_index() function: want set size windows based units time, instance window contain year data, can use sliding_period(): functions produce analysis sets size, start end analysis set “sliding” data frame. ’d rather analysis set get progressively larger, ’re predicting new data based upon growing set older observations, can use rolling_origin() function: Note time-based resampling functions deterministic: unlike rest package, running functions repeatedly different random seeds always return results.","code":"initial_time_split(Chicago) #> #> <4273/1425/5698> initial_validation_time_split(Chicago) #> #> <3418/1140/1140/5698> sliding_window(Chicago) %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice0001 #> 2 Slice0002 sliding_index(Chicago, date) %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice0001 #> 2 Slice0002 sliding_period(Chicago, date, \"year\") %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 rolling_origin(Chicago) %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice0001 #> 2 Slice0002"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Working with resampling sets","text":"rsample package can used create objects containing resamples original data. vignette contains demonstration objects can used data analysis. Let’s use attrition data set. documentation: data IBM Watson Analytics Lab. website describes data “Uncover factors lead employee attrition explore important questions ‘show breakdown distance home job role attrition’ ‘compare average monthly income education attrition’. fictional data set created IBM data scientists.” 1470 rows. data can accessed using","code":"library(rsample) data(\"attrition\", package = \"modeldata\") names(attrition) #> [1] \"Age\" \"Attrition\" \"BusinessTravel\" #> [4] \"DailyRate\" \"Department\" \"DistanceFromHome\" #> [7] \"Education\" \"EducationField\" \"EnvironmentSatisfaction\" #> [10] \"Gender\" \"HourlyRate\" \"JobInvolvement\" #> [13] \"JobLevel\" \"JobRole\" \"JobSatisfaction\" #> [16] \"MaritalStatus\" \"MonthlyIncome\" \"MonthlyRate\" #> [19] \"NumCompaniesWorked\" \"OverTime\" \"PercentSalaryHike\" #> [22] \"PerformanceRating\" \"RelationshipSatisfaction\" \"StockOptionLevel\" #> [25] \"TotalWorkingYears\" \"TrainingTimesLastYear\" \"WorkLifeBalance\" #> [28] \"YearsAtCompany\" \"YearsInCurrentRole\" \"YearsSinceLastPromotion\" #> [31] \"YearsWithCurrManager\" table(attrition$Attrition) #> #> No Yes #> 1233 237"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"model-assessment","dir":"Articles","previous_headings":"","what":"Model Assessment","title":"Working with resampling sets","text":"Let’s fit logistic regression model data model terms job satisfaction, gender, monthly income. fitting model entire data set, might model attrition using convenience, ’ll create formula object used later: evaluate model, use 10 repeats 10-fold cross-validation use 100 holdout samples evaluate overall accuracy model. First, let’s make splits data: Now let’s write function , resample: obtain analysis data set (.e. 90% used modeling) fit logistic regression model predict assessment data (10% used model) using broom package determine sample predicted correctly. function: example: model, .fitted value linear predictor log-odds units. compute data set 100 resamples, ’ll use map() function purrr package: Now can compute accuracy values assessment data sets: Keep mind baseline accuracy beat rate non-attrition, 0.839. great model far.","code":"glm(Attrition ~ JobSatisfaction + Gender + MonthlyIncome, data = attrition, family = binomial) mod_form <- as.formula(Attrition ~ JobSatisfaction + Gender + MonthlyIncome) library(rsample) set.seed(4622) rs_obj <- vfold_cv(attrition, v = 10, repeats = 10) rs_obj #> # 10-fold cross-validation repeated 10 times #> # A tibble: 100 × 3 #> splits id id2 #> #> 1 Repeat01 Fold01 #> 2 Repeat01 Fold02 #> 3 Repeat01 Fold03 #> 4 Repeat01 Fold04 #> 5 Repeat01 Fold05 #> 6 Repeat01 Fold06 #> 7 Repeat01 Fold07 #> 8 Repeat01 Fold08 #> 9 Repeat01 Fold09 #> 10 Repeat01 Fold10 #> # ℹ 90 more rows ## splits will be the `rsplit` object with the 90/10 partition holdout_results <- function(splits, ...) { # Fit the model to the 90% mod <- glm(..., data = analysis(splits), family = binomial) # Save the 10% holdout <- assessment(splits) # `augment` will save the predictions with the holdout data set res <- broom::augment(mod, newdata = holdout) # Class predictions on the assessment set from class probs lvls <- levels(holdout$Attrition) predictions <- factor(ifelse(res$.fitted > 0, lvls[2], lvls[1]), levels = lvls) # Calculate whether the prediction was correct res$correct <- predictions == holdout$Attrition # Return the assessment data set with the additional columns res } example <- holdout_results(rs_obj$splits[[1]], mod_form) dim(example) #> [1] 147 34 dim(assessment(rs_obj$splits[[1]])) #> [1] 147 31 ## newly added columns: example[1:10, setdiff(names(example), names(attrition))] #> # A tibble: 10 × 3 #> .rownames .fitted correct #> #> 1 11 -1.20 TRUE #> 2 24 -1.78 TRUE #> 3 30 -1.45 TRUE #> 4 39 -1.60 TRUE #> 5 53 -1.54 TRUE #> 6 72 -1.93 TRUE #> 7 73 -3.06 TRUE #> 8 80 -3.28 TRUE #> 9 83 -2.23 TRUE #> 10 90 -1.28 FALSE library(purrr) rs_obj$results <- map(rs_obj$splits, holdout_results, mod_form) rs_obj #> # 10-fold cross-validation repeated 10 times #> # A tibble: 100 × 4 #> splits id id2 results #> #> 1 Repeat01 Fold01 #> 2 Repeat01 Fold02 #> 3 Repeat01 Fold03 #> 4 Repeat01 Fold04 #> 5 Repeat01 Fold05 #> 6 Repeat01 Fold06 #> 7 Repeat01 Fold07 #> 8 Repeat01 Fold08 #> 9 Repeat01 Fold09 #> 10 Repeat01 Fold10 #> # ℹ 90 more rows rs_obj$accuracy <- map_dbl(rs_obj$results, function(x) mean(x$correct)) summary(rs_obj$accuracy) #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.776 0.821 0.840 0.839 0.859 0.905"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"using-the-bootstrap-to-make-comparisons","dir":"Articles","previous_headings":"","what":"Using the Bootstrap to Make Comparisons","title":"Working with resampling sets","text":"Traditionally, bootstrap primarily used empirically determine sampling distribution test statistic. Given set samples replacement, statistic can calculated analysis set results can used make inferences (confidence intervals). example, differences median monthly income genders? wanted compare genders, conduct t-test rank-based test. Instead, let’s use bootstrap see difference median incomes two groups. need simple function compute statistic resample: Now create large number bootstrap samples (say 2000+). illustration, ’ll 500 document. function computed across resample: bootstrap distribution statistic slightly bimodal skewed distribution: variation considerable statistic. One method computing confidence interval take percentiles bootstrap distribution. 95% confidence interval difference means : calculated 95% confidence interval contains zero, don’t evidence difference median income genders confidence level 95%.","code":"ggplot(attrition, aes(x = Gender, y = MonthlyIncome)) + geom_boxplot() + scale_y_log10() median_diff <- function(splits) { x <- analysis(splits) median(x$MonthlyIncome[x$Gender == \"Female\"]) - median(x$MonthlyIncome[x$Gender == \"Male\"]) } set.seed(353) bt_resamples <- bootstraps(attrition, times = 500) bt_resamples$wage_diff <- map_dbl(bt_resamples$splits, median_diff) ggplot(bt_resamples, aes(x = wage_diff)) + geom_line(stat = \"density\", adjust = 1.25) + xlab(\"Difference in Median Monthly Income (Female - Male)\") quantile(bt_resamples$wage_diff, probs = c(0.025, 0.975)) #> 2.5% 97.5% #> -189 615"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"bootstrap-estimates-of-model-coefficients","dir":"Articles","previous_headings":"","what":"Bootstrap Estimates of Model Coefficients","title":"Working with resampling sets","text":"Unless already column resample object contains fitted model, function can used fit model save model coefficients. broom package package tidy() function save coefficients data frame. Instead returning data frame row model term, save data frame single row columns model term. , purrr::map() can used estimate save values split.","code":"glm_coefs <- function(splits, ...) { ## use `analysis` or `as.data.frame` to get the analysis data mod <- glm(..., data = analysis(splits), family = binomial) as.data.frame(t(coef(mod))) } bt_resamples$betas <- map(.x = bt_resamples$splits, .f = glm_coefs, mod_form) bt_resamples #> # Bootstrap sampling #> # A tibble: 500 × 4 #> splits id wage_diff betas #> #> 1 Bootstrap001 136 #> 2 Bootstrap002 282. #> 3 Bootstrap003 470 #> 4 Bootstrap004 -213 #> 5 Bootstrap005 453 #> 6 Bootstrap006 684 #> 7 Bootstrap007 60 #> 8 Bootstrap008 286 #> 9 Bootstrap009 -30 #> 10 Bootstrap010 410 #> # ℹ 490 more rows bt_resamples$betas[[1]] #> (Intercept) JobSatisfaction.L JobSatisfaction.Q JobSatisfaction.C GenderMale #> 1 -0.939 -0.501 -0.272 0.0842 0.0989 #> MonthlyIncome #> 1 -0.000129"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"keeping-tidy","dir":"Articles","previous_headings":"","what":"Keeping Tidy","title":"Working with resampling sets","text":"previously mentioned, broom package contains class called tidy created representations objects can easily used analysis, plotting, etc. rsample contains tidy methods rset rsplit objects. example: ","code":"first_resample <- bt_resamples$splits[[1]] class(first_resample) #> [1] \"boot_split\" \"rsplit\" tidy(first_resample) #> # A tibble: 1,470 × 2 #> Row Data #> #> 1 2 Analysis #> 2 3 Analysis #> 3 4 Analysis #> 4 7 Analysis #> 5 9 Analysis #> 6 10 Analysis #> 7 11 Analysis #> 8 13 Analysis #> 9 18 Analysis #> 10 19 Analysis #> # ℹ 1,460 more rows class(bt_resamples) #> [1] \"bootstraps\" \"rset\" \"tbl_df\" \"tbl\" \"data.frame\" tidy(bt_resamples) #> # A tibble: 735,000 × 3 #> Row Data Resample #> #> 1 1 Analysis Bootstrap002 #> 2 1 Analysis Bootstrap004 #> 3 1 Analysis Bootstrap007 #> 4 1 Analysis Bootstrap008 #> 5 1 Analysis Bootstrap009 #> 6 1 Analysis Bootstrap010 #> 7 1 Analysis Bootstrap011 #> 8 1 Analysis Bootstrap013 #> 9 1 Analysis Bootstrap015 #> 10 1 Analysis Bootstrap016 #> # ℹ 734,990 more rows"},{"path":"https://rsample.tidymodels.org/dev/articles/rsample.html","id":"terminology","dir":"Articles","previous_headings":"","what":"Terminology","title":"Introduction to rsample","text":"define resample result two-way split data set. example, bootstrapping, one part resample sample replacement original data. part split contains instances contained bootstrap sample. Cross-validation another type resampling.","code":""},{"path":"https://rsample.tidymodels.org/dev/articles/rsample.html","id":"rset-objects-contain-many-resamples","dir":"Articles","previous_headings":"","what":"rset Objects Contain Many Resamples","title":"Introduction to rsample","text":"main class package (rset) set collection resamples. 10-fold cross-validation, set consist 10 different resamples original data. Like modelr, resamples stored data-frame-like tibble object. simple example, small set bootstraps mtcars data:","code":"library(rsample) set.seed(8584) bt_resamples <- bootstraps(mtcars, times = 3) bt_resamples #> # Bootstrap sampling #> # A tibble: 3 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Bootstrap3"},{"path":"https://rsample.tidymodels.org/dev/articles/rsample.html","id":"individual-resamples-are-rsplit-objects","dir":"Articles","previous_headings":"","what":"Individual Resamples are rsplit Objects","title":"Introduction to rsample","text":"resamples stored splits column object class rsplit. package use following terminology two partitions comprise resample: analysis data selected resample. bootstrap, sample replacement. 10-fold cross-validation, 90% data. data often used fit model calculate statistic traditional bootstrapping. assessment data usually section original data covered analysis set. , 10-fold CV, 10% held . data often used evaluate performance model fit analysis data. (Aside: might use term “training” “testing” data sets, avoid since labels often conflict data result initial partition data typically done resampling. training/test split can conducted using initial_split() function package.) Let’s look one rsplit objects indicates 32 data points analysis set, 14 instances assessment set, original data contained 32 data points. results can also determined using dim function rsplit object. obtain either data sets rsplit, .data.frame() function can used. default, analysis set returned data option can used return assessment data: Alternatively, can use shortcuts analysis(first_resample) assessment(first_resample).","code":"first_resample <- bt_resamples$splits[[1]] first_resample #> #> <32/14/32> head(as.data.frame(first_resample)) #> mpg cyl disp hp drat wt qsec vs am gear carb #> Fiat 128...1 32.4 4 78.7 66 4.08 2.20 19.5 1 1 4 1 #> Toyota Corolla...2 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1 #> Toyota Corolla...3 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1 #> AMC Javelin...4 15.2 8 304.0 150 3.15 3.44 17.3 0 0 3 2 #> Valiant...5 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 #> Merc 450SLC...6 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3 as.data.frame(first_resample, data = \"assessment\") #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 #> Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 #> Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 #> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 #> Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 #> Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 #> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.25 18.0 0 0 3 4 #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.34 17.4 0 0 3 4 #> Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.94 18.9 1 1 4 1 #> Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 #> Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2"},{"path":"https://rsample.tidymodels.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Hannah Frick. Author, maintainer. Fanny Chow. Author. Max Kuhn. Author. Michael Mahoney. Author. Julia Silge. Author. Hadley Wickham. Author. . Copyright holder, funder.","code":""},{"path":"https://rsample.tidymodels.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Frick H, Chow F, Kuhn M, Mahoney M, Silge J, Wickham H (2024). rsample: General Resampling Infrastructure. R package version 1.2.1.9000, https://github.com/tidymodels/rsample, https://rsample.tidymodels.org.","code":"@Manual{, title = {rsample: General Resampling Infrastructure}, author = {Hannah Frick and Fanny Chow and Max Kuhn and Michael Mahoney and Julia Silge and Hadley Wickham}, year = {2024}, note = {R package version 1.2.1.9000, https://github.com/tidymodels/rsample}, url = {https://rsample.tidymodels.org}, }"},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"General Resampling Infrastructure","text":"rsample package provides functions create different types resamples corresponding classes analysis. goal modular set methods can used : resampling estimating sampling distribution statistic estimating model performance using holdout set scope rsample provide basic building blocks creating analyzing resamples data set, package include code modeling calculating statistics. Working Resample Sets vignette gives demonstration rsample tools can used building models. Note resampled data sets created rsample directly accessible resampling object contain much overhead memory. Since original data modified, R make automatic copy. example, creating 50 bootstraps data set create object 50-fold larger memory: Created 2022-02-28 reprex package (v2.0.1) memory usage 50 bootstrap samples less 3-fold original data set.","code":"library(rsample) library(mlbench) data(LetterRecognition) lobstr::obj_size(LetterRecognition) #> 2,644,640 B set.seed(35222) boots <- bootstraps(LetterRecognition, times = 50) lobstr::obj_size(boots) #> 6,686,776 B # Object size per resample lobstr::obj_size(boots)/nrow(boots) #> 133,735.5 B # Fold increase is <<< 50 as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition)) #> [1] 2.528426"},{"path":"https://rsample.tidymodels.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"General Resampling Infrastructure","text":"install , use: development version GitHub :","code":"install.packages(\"rsample\") # install.packages(\"pak\") pak::pak(\"rsample\")"},{"path":"https://rsample.tidymodels.org/dev/index.html","id":"contributing","dir":"","previous_headings":"","what":"Contributing","title":"General Resampling Infrastructure","text":"project released Contributor Code Conduct. contributing project, agree abide terms. questions discussions tidymodels packages, modeling, machine learning, please post Posit Community. think encountered bug, please submit issue. Either way, learn create share reprex (minimal, reproducible example), clearly communicate code. Check details contributing guidelines tidymodels packages get help.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":null,"dir":"Reference","previous_headings":"","what":"Augment a data set with resampling identifiers — add_resample_id","title":"Augment a data set with resampling identifiers — add_resample_id","text":"data set, add_resample_id() add least one new column identifies resample data came . cases, single column added resampling methods, two added.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Augment a data set with resampling identifiers — add_resample_id","text":"","code":"add_resample_id(.data, split, dots = FALSE)"},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Augment a data set with resampling identifiers — add_resample_id","text":".data data frame. split single rset object. dots single logical: id columns prefixed \".\" avoid name conflicts .data?","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Augment a data set with resampling identifiers — add_resample_id","text":"updated data frame.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Augment a data set with resampling identifiers — add_resample_id","text":"","code":"library(dplyr) #> #> Attaching package: ‘dplyr’ #> The following objects are masked from ‘package:stats’: #> #> filter, lag #> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union set.seed(363) car_folds <- vfold_cv(mtcars, repeats = 3) analysis(car_folds$splits[[1]]) %>% add_resample_id(car_folds$splits[[1]]) %>% head() #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 #> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 #> id id2 #> Mazda RX4 Repeat1 Fold01 #> Mazda RX4 Wag Repeat1 Fold01 #> Datsun 710 Repeat1 Fold01 #> Hornet 4 Drive Repeat1 Fold01 #> Hornet Sportabout Repeat1 Fold01 #> Valiant Repeat1 Fold01 car_bt <- bootstraps(mtcars) analysis(car_bt$splits[[1]]) %>% add_resample_id(car_bt$splits[[1]]) %>% head() #> mpg cyl disp hp drat wt qsec vs am gear carb #> Toyota Corona...1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> Mazda RX4...2 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> Chrysler Imperial...3 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> Volvo 142E...4 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> Chrysler Imperial...5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> Volvo 142E...6 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> id #> Toyota Corona...1 Bootstrap01 #> Mazda RX4...2 Bootstrap01 #> Chrysler Imperial...3 Bootstrap01 #> Volvo 142E...4 Bootstrap01 #> Chrysler Imperial...5 Bootstrap01 #> Volvo 142E...6 Bootstrap01"},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":null,"dir":"Reference","previous_headings":"","what":"Sampling for the Apparent Error Rate — apparent","title":"Sampling for the Apparent Error Rate — apparent","text":"building model data set re-predicting data, performance estimate predictions often called \"apparent\" performance model. estimate can wildly optimistic. \"Apparent sampling\" means analysis assessment samples . resamples sometimes used analysis bootstrap samples otherwise avoided like old sushi.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sampling for the Apparent Error Rate — apparent","text":"","code":"apparent(data, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sampling for the Apparent Error Rate — apparent","text":"data data frame. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sampling for the Apparent Error Rate — apparent","text":"tibble single row classes apparent, rset, tbl_df, tbl, data.frame. results include column data split objects one column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Sampling for the Apparent Error Rate — apparent","text":"","code":"apparent(mtcars) #> # Apparent sampling #> # A tibble: 1 × 2 #> splits id #> #> 1 Apparent"},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert an rsplit object to a data frame — as.data.frame.rsplit","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"analysis assessment code can returned data frame (dictated data argument) using .data.frame.rsplit(). analysis() assessment() shortcuts.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"","code":"# S3 method for class 'rsplit' as.data.frame(x, row.names = NULL, optional = FALSE, data = \"analysis\", ...) analysis(x, ...) # Default S3 method analysis(x, ...) # S3 method for class 'rsplit' analysis(x, ...) assessment(x, ...) # Default S3 method assessment(x, ...) # S3 method for class 'rsplit' assessment(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"x rsplit object. row.names NULL character vector giving row names data frame. Missing values allowed. optional logical: column names data checked legality? data Either \"analysis\" \"assessment\" specify data returned. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"","code":"library(dplyr) set.seed(104) folds <- vfold_cv(mtcars) model_data_1 <- folds$splits[[1]] %>% analysis() holdout_data_1 <- folds$splits[[1]] %>% assessment()"},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":null,"dir":"Reference","previous_headings":"","what":"Bootstrap Sampling — bootstraps","title":"Bootstrap Sampling — bootstraps","text":"bootstrap sample sample size original data set made using replacement. results analysis samples multiple replicates original rows data. assessment set defined rows original data included bootstrap sample. often referred \"--bag\" (OOB) sample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Bootstrap Sampling — bootstraps","text":"","code":"bootstraps( data, times = 25, strata = NULL, breaks = 4, pool = 0.1, apparent = FALSE, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Bootstrap Sampling — bootstraps","text":"data data frame. times number bootstrap samples. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. apparent logical. extra resample added analysis holdout subset entire data set. required estimators used summary() function require apparent error rate. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Bootstrap Sampling — bootstraps","text":"tibble classes bootstraps, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Bootstrap Sampling — bootstraps","text":"argument apparent enables option additional \"resample\" analysis assessment data sets original data set. can required types analysis bootstrap results. strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Bootstrap Sampling — bootstraps","text":"","code":"bootstraps(mtcars, times = 2) #> # Bootstrap sampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 bootstraps(mtcars, times = 2, apparent = TRUE) #> # Bootstrap sampling with apparent sample #> # A tibble: 3 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Apparent library(purrr) library(modeldata) data(wa_churn) set.seed(13) resample1 <- bootstraps(wa_churn, times = 3) map_dbl( resample1$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2798523 0.2639500 0.2648019 set.seed(13) resample2 <- bootstraps(wa_churn, strata = churn, times = 3) map_dbl( resample2$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2653699 0.2653699 0.2653699 set.seed(13) resample3 <- bootstraps(wa_churn, strata = tenure, breaks = 6, times = 3) map_dbl( resample3$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2625302 0.2659378 0.2696294"},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster Cross-Validation — clustering_cv","title":"Cluster Cross-Validation — clustering_cv","text":"Cluster cross-validation splits data V groups disjointed sets using k-means clustering variables. resample analysis data consists V-1 folds/clusters assessment set contains final fold/cluster. basic cross-validation (.e. repeats), number resamples equal V.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster Cross-Validation — clustering_cv","text":"","code":"clustering_cv( data, vars, v = 10, repeats = 1, distance_function = \"dist\", cluster_function = c(\"kmeans\", \"hclust\"), ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster Cross-Validation — clustering_cv","text":"data data frame. vars vector bare variable names use cluster data. v number partitions data set. repeats number times repeat clustered partitioning. distance_function function used distance calculations? Defaults stats::dist(). can also provide function; see Details. cluster_function function used clustering? Options either \"kmeans\" (use stats::kmeans()) \"hclust\" (use stats::hclust()). can also provide function; see Details. ... Extra arguments passed cluster_function.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster Cross-Validation — clustering_cv","text":"tibble classes rset, tbl_df, tbl, data.frame. results include column data split objects identification variable id.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster Cross-Validation — clustering_cv","text":"variables vars argument used k-means clustering data disjointed sets hierarchical clustering data. clusters used folds cross-validation. Depending data distributed, may equal number points fold. can optionally provide custom function distance_function. function take data frame (created via data[vars]) return stats::dist() object distances data points. can optionally provide custom function cluster_function. function must take three arguments: dists, stats::dist() object distances data points v, length-1 numeric number folds create ..., pass additional named arguments function function return vector cluster assignments length nrow(data), element vector corresponding matching row data frame.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Cluster Cross-Validation — clustering_cv","text":"","code":"data(ames, package = \"modeldata\") clustering_cv(ames, vars = c(Sale_Price, First_Flr_SF, Second_Flr_SF), v = 2) #> # 2-cluster cross-validation #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2"},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":null,"dir":"Reference","previous_headings":"","what":"Determine the Assessment Samples — complement","title":"Determine the Assessment Samples — complement","text":"method function help find data belong analysis assessment sets.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Determine the Assessment Samples — complement","text":"","code":"complement(x, ...) # S3 method for class 'rsplit' complement(x, ...) # S3 method for class 'rof_split' complement(x, ...) # S3 method for class 'sliding_window_split' complement(x, ...) # S3 method for class 'sliding_index_split' complement(x, ...) # S3 method for class 'sliding_period_split' complement(x, ...) # S3 method for class 'apparent_split' complement(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Determine the Assessment Samples — complement","text":"x rsplit object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Determine the Assessment Samples — complement","text":"integer vector.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Determine the Assessment Samples — complement","text":"Given rsplit object, complement() determine data rows contained assessment set. save space, many rsplit objects contain indices assessment split.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Determine the Assessment Samples — complement","text":"","code":"set.seed(28432) fold_rs <- vfold_cv(mtcars) head(fold_rs$splits[[1]]$in_id) #> [1] 2 3 4 5 6 7 fold_rs$splits[[1]]$out_id #> [1] NA complement(fold_rs$splits[[1]]) #> [1] 1 9 25 27"},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the split arguments from an rset — .get_split_args","title":"Get the split arguments from an rset — .get_split_args","text":"Get split arguments rset","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the split arguments from an rset — .get_split_args","text":"","code":".get_split_args(x, allow_strata_false = FALSE)"},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the split arguments from an rset — .get_split_args","text":"x rset initial_split object. allow_strata_false logical specify value use stratification specified. default use strata = NULL, alternative strata = FALSE.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the split arguments from an rset — .get_split_args","text":"list arguments used create rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract Predictor Names from Formula or Terms — form_pred","title":"Extract Predictor Names from Formula or Terms — form_pred","text":".vars() returns variables used formula, function returns variables explicitly used right-hand side (.e., resolve dots unless object terms data set specified).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"","code":"form_pred(object, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"object model formula stats::terms() object. ... Arguments pass .vars()","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"character vector names","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"","code":"form_pred(y ~ x + z) #> [1] \"x\" \"z\" form_pred(terms(y ~ x + z)) #> [1] \"x\" \"z\" form_pred(y ~ x + log(z)) #> [1] \"x\" \"z\" form_pred(log(y) ~ x + z) #> [1] \"x\" \"z\" form_pred(y1 + y2 ~ x + z) #> [1] \"x\" \"z\" form_pred(log(y1) + y2 ~ x + z) #> [1] \"x\" \"z\" # will fail: # form_pred(y ~ .) form_pred(terms(mpg ~ (.)^2, data = mtcars)) #> [1] \"cyl\" \"disp\" \"hp\" \"drat\" \"wt\" \"qsec\" \"vs\" \"am\" \"gear\" \"carb\" form_pred(terms(~ (.)^2, data = mtcars)) #> [1] \"mpg\" \"cyl\" \"disp\" \"hp\" \"drat\" \"wt\" \"qsec\" \"vs\" \"am\" \"gear\" #> [11] \"carb\""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":null,"dir":"Reference","previous_headings":"","what":"Obtain a identifier for the resamples — .get_fingerprint","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"function returns hash (NA) attribute created rset initially constructed. can used compare resampling objects see .","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"","code":".get_fingerprint(x, ...) # Default S3 method .get_fingerprint(x, ...) # S3 method for class 'rset' .get_fingerprint(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"x rset tune_results object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"character value NA_character_ object created prior rsample version 0.1.0.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"","code":"set.seed(1) .get_fingerprint(vfold_cv(mtcars)) #> [1] \"10edc17b4467d256910fb9dc53c3599a\" set.seed(1) .get_fingerprint(vfold_cv(mtcars)) #> [1] \"10edc17b4467d256910fb9dc53c3599a\" set.seed(2) .get_fingerprint(vfold_cv(mtcars)) #> [1] \"9070fd5cd338c4757f525de2e2a7beaa\" set.seed(1) .get_fingerprint(vfold_cv(mtcars, repeats = 2)) #> [1] \"e2457324f2637e7f0f593755d1592d03\""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve individual rsplits objects from an rset — get_rsplit","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"Retrieve individual rsplits objects rset","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"","code":"get_rsplit(x, index, ...) # S3 method for class 'rset' get_rsplit(x, index, ...) # Default S3 method get_rsplit(x, index, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"x rset object retrieve rsplit . index integer indicating rsplit retrieve: 1 rsplit first row rset, 2 second, . ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"rsplit object row index rset","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"","code":"set.seed(123) (starting_splits <- group_vfold_cv(mtcars, cyl, v = 3)) #> # Group 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 get_rsplit(starting_splits, 1) #> #> <21/11/32>"},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":null,"dir":"Reference","previous_headings":"","what":"Group Bootstraps — group_bootstraps","title":"Group Bootstraps — group_bootstraps","text":"Group bootstrapping creates splits data based grouping variable (may single row associated ). common use kind resampling repeated measures subject. bootstrap sample sample size original data set made using replacement. results analysis samples multiple replicates original rows data. assessment set defined rows original data included bootstrap sample. often referred \"--bag\" (OOB) sample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Group Bootstraps — group_bootstraps","text":"","code":"group_bootstraps( data, group, times = 25, apparent = FALSE, ..., strata = NULL, pool = 0.1 )"},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Group Bootstraps — group_bootstraps","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. times number bootstrap samples. apparent logical. extra resample added analysis holdout subset entire data set. required estimators used summary() function require apparent error rate. ... dots future extensions must empty. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Group Bootstraps — group_bootstraps","text":"tibble classes group_bootstraps bootstraps, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Group Bootstraps — group_bootstraps","text":"argument apparent enables option additional \"resample\" analysis assessment data sets original data set. can required types analysis bootstrap results.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Group Bootstraps — group_bootstraps","text":"","code":"data(ames, package = \"modeldata\") set.seed(13) group_bootstraps(ames, Neighborhood, times = 3) #> # Group bootstrap sampling #> # A tibble: 3 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Bootstrap3 group_bootstraps(ames, Neighborhood, times = 3, apparent = TRUE) #> # Group bootstrap sampling with apparent sample #> # A tibble: 4 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Bootstrap3 #> 4 Apparent"},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Group Monte Carlo Cross-Validation — group_mc_cv","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"Group Monte Carlo cross-validation creates splits data based grouping variable (may single row associated ). One resample Monte Carlo cross-validation takes random sample (without replacement) groups original data set used analysis. data points added assessment set. common use kind resampling repeated measures subject.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"","code":"group_mc_cv( data, group, prop = 3/4, times = 25, ..., strata = NULL, pool = 0.1 )"},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. prop proportion data retained modeling/analysis. times number times repeat sampling. ... dots future extensions must empty. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"tibble classes group_mc_cv, rset, tbl_df, tbl, data.frame. results include column data split objects identification variable.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"","code":"data(ames, package = \"modeldata\") set.seed(123) group_mc_cv(ames, group = Neighborhood, times = 5) #> # Group Monte Carlo cross-validation (0.75/0.25) with 5 resamples #> # A tibble: 5 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5"},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Group V-Fold Cross-Validation — group_vfold_cv","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"Group V-fold cross-validation creates splits data based grouping variable (may single row associated ). function can create many splits unique values grouping variable can create smaller set splits one group left time. common use kind resampling repeated measures subject.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"","code":"group_vfold_cv( data, group = NULL, v = NULL, repeats = 1, balance = c(\"groups\", \"observations\"), ..., strata = NULL, pool = 0.1 )"},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. v number partitions data set. left NULL (default), v set number unique values grouping variable, creating \"leave-one-group-\" splits. repeats number times repeat V-fold partitioning. balance v less number unique groups, groups combined folds? one \"groups\", assign roughly number groups fold, \"observations\", assign roughly number observations fold. ... dots future extensions must empty. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"tibble classes group_vfold_cv, rset, tbl_df, tbl, data.frame. results include column data split objects identification variable.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"","code":"data(ames, package = \"modeldata\") set.seed(123) group_vfold_cv(ames, group = Neighborhood, v = 5) #> # Group 5-fold cross-validation #> # A tibble: 5 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5 group_vfold_cv( ames, group = Neighborhood, v = 5, balance = \"observations\" ) #> # Group 5-fold cross-validation #> # A tibble: 5 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5 group_vfold_cv(ames, group = Neighborhood, v = 5, repeats = 2) #> # Group 5-fold cross-validation #> # A tibble: 10 × 3 #> splits id id2 #> #> 1 Repeat1 Resample1 #> 2 Repeat1 Resample2 #> 3 Repeat1 Resample3 #> 4 Repeat1 Resample4 #> 5 Repeat1 Resample5 #> 6 Repeat2 Resample1 #> 7 Repeat2 Resample2 #> 8 Repeat2 Resample3 #> 9 Repeat2 Resample4 #> 10 Repeat2 Resample5 # Leave-one-group-out CV group_vfold_cv(ames, group = Neighborhood) #> # Group 28-fold cross-validation #> # A tibble: 28 × 2 #> splits id #> #> 1 Resample01 #> 2 Resample02 #> 3 Resample03 #> 4 Resample04 #> 5 Resample05 #> 6 Resample06 #> 7 Resample07 #> 8 Resample08 #> 9 Resample09 #> 10 Resample10 #> # ℹ 18 more rows library(dplyr) data(Sacramento, package = \"modeldata\") city_strata <- Sacramento %>% group_by(city) %>% summarize(strata = mean(price)) %>% summarize(city = city, strata = cut(strata, quantile(strata), include.lowest = TRUE)) #> Warning: Returning more (or less) than 1 row per `summarise()` group was #> deprecated in dplyr 1.1.0. #> ℹ Please use `reframe()` instead. #> ℹ When switching from `summarise()` to `reframe()`, remember that #> `reframe()` always returns an ungrouped data frame and adjust #> accordingly. sacramento_data <- Sacramento %>% full_join(city_strata, by = \"city\") group_vfold_cv(sacramento_data, city, strata = strata) #> Warning: Leaving `v = NULL` while using stratification will set `v` to the number of groups present in the least common stratum. #> ℹ Set `v` explicitly to override this warning. #> # Group 14-fold cross-validation #> # A tibble: 14 × 2 #> splits id #> #> 1 Resample01 #> 2 Resample02 #> 3 Resample03 #> 4 Resample04 #> 5 Resample05 #> 6 Resample06 #> 7 Resample07 #> 8 Resample08 #> 9 Resample09 #> 10 Resample10 #> 11 Resample11 #> 12 Resample12 #> 13 Resample13 #> 14 Resample14"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Simple Training/Test Set Splitting — initial_split","title":"Simple Training/Test Set Splitting — initial_split","text":"initial_split() creates single binary split data training set testing set. initial_time_split() , takes first prop samples training, instead random selection. group_initial_split() creates splits data based grouping variable, data \"group\" assigned split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simple Training/Test Set Splitting — initial_split","text":"","code":"initial_split(data, prop = 3/4, strata = NULL, breaks = 4, pool = 0.1, ...) initial_time_split(data, prop = 3/4, lag = 0, ...) training(x, ...) # Default S3 method training(x, ...) # S3 method for class 'rsplit' training(x, ...) testing(x, ...) # Default S3 method testing(x, ...) # S3 method for class 'rsplit' testing(x, ...) group_initial_split(data, group, prop = 3/4, ..., strata = NULL, pool = 0.1)"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simple Training/Test Set Splitting — initial_split","text":"data data frame. prop proportion data retained modeling/analysis. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty. lag value include lag assessment analysis set. useful lagged predictors used training testing. x rsplit object produced initial_split() initial_time_split(). group variable data (single character name) used grouping observations value either analysis assessment set within fold.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Simple Training/Test Set Splitting — initial_split","text":"rsplit object can used training() testing() functions extract data split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Simple Training/Test Set Splitting — initial_split","text":"training() testing() used extract resulting data. strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simple Training/Test Set Splitting — initial_split","text":"","code":"set.seed(1353) car_split <- initial_split(mtcars) train_data <- training(car_split) test_data <- testing(car_split) data(drinks, package = \"modeldata\") drinks_split <- initial_time_split(drinks) train_data <- training(drinks_split) test_data <- testing(drinks_split) c(max(train_data$date), min(test_data$date)) # no lag #> [1] \"2011-03-01\" \"2011-04-01\" # With 12 period lag drinks_lag_split <- initial_time_split(drinks, lag = 12) train_data <- training(drinks_lag_split) test_data <- testing(drinks_lag_split) c(max(train_data$date), min(test_data$date)) # 12 period lag #> [1] \"2011-03-01\" \"2010-04-01\" set.seed(1353) car_split <- group_initial_split(mtcars, cyl) train_data <- training(car_split) test_data <- testing(car_split)"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Create an Initial Train/Validation/Test Split — initial_validation_split","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"initial_validation_split() creates random three-way split data training set, validation set, testing set. initial_validation_time_split() , instead random selection training, validation, testing set order full data set, first observations put training set. group_initial_validation_split() creates similar random splits data based grouping variable, data \"group\" assigned partition.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"","code":"initial_validation_split( data, prop = c(0.6, 0.2), strata = NULL, breaks = 4, pool = 0.1, ... ) initial_validation_time_split(data, prop = c(0.6, 0.2), ...) group_initial_validation_split( data, group, prop = c(0.6, 0.2), ..., strata = NULL, pool = 0.1 ) # S3 method for class 'initial_validation_split' training(x, ...) # S3 method for class 'initial_validation_split' testing(x, ...) validation(x, ...) # Default S3 method validation(x, ...) # S3 method for class 'initial_validation_split' validation(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"data data frame. prop length-2 vector proportions data retained training validation data, respectively. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty. group variable data (single character name) used grouping observations value either analysis assessment set within fold. x object class initial_validation_split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"initial_validation_split object can used training(), validation(), testing() functions extract data split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"training(), validation(), testing() can used extract resulting data sets. Use validation_set() create rset object use functions tune package tune::tune_grid(). strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"","code":"set.seed(1353) car_split <- initial_validation_split(mtcars) train_data <- training(car_split) validation_data <- validation(car_split) test_data <- testing(car_split) data(drinks, package = \"modeldata\") drinks_split <- initial_validation_time_split(drinks) train_data <- training(drinks_split) validation_data <- validation(drinks_split) c(max(train_data$date), min(validation_data$date)) #> [1] \"2007-05-01\" \"2007-06-01\" data(ames, package = \"modeldata\") set.seed(1353) ames_split <- group_initial_validation_split(ames, group = Neighborhood) train_data <- training(ames_split) validation_data <- validation(ames_split) test_data <- testing(ames_split)"},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Inner split of the analysis set for fitting a post-processor — inner_split","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"Inner split analysis set fitting post-processor","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"","code":"inner_split(x, ...) # S3 method for class 'mc_split' inner_split(x, split_args, ...) # S3 method for class 'group_mc_split' inner_split(x, split_args, ...) # S3 method for class 'vfold_split' inner_split(x, split_args, ...) # S3 method for class 'group_vfold_split' inner_split(x, split_args, ...) # S3 method for class 'boot_split' inner_split(x, split_args, ...) # S3 method for class 'group_boot_split' inner_split(x, split_args, ...) # S3 method for class 'val_split' inner_split(x, split_args, ...) # S3 method for class 'group_val_split' inner_split(x, split_args, ...) # S3 method for class 'time_val_split' inner_split(x, split_args, ...) # S3 method for class 'clustering_split' inner_split(x, split_args, ...) # S3 method for class 'apparent_split' inner_split(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"x rsplit object. ... currently used. split_args list arguments used inner split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"rsplit object.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"rsplit objects live commonly inside rset object. split_args argument can output .get_split_args() corresponding rset object, even arguments used create rset object needed inner split. mc_split group_mc_split objects, inner_split() ignore split_args$times. vfold_split group_vfold_split objects, ignore split_args$times split_args$repeats. split_args$v used set split_args$prop 1 - 1/v prop already set otherwise ignored. method group_vfold_split always use split_args$balance = NULL. boot_split group_boot_split objects, ignore split_args$times. val_split, group_val_split, time_val_split objects, interpret length-2 split_args$prop ratio training validation sets split inner analysis inner assessment set ratio. split_args$prop single value, used proportion inner analysis set. clustering_split objects, ignore split_args$repeats.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":null,"dir":"Reference","previous_headings":"","what":"Bootstrap confidence intervals — int_pctl","title":"Bootstrap confidence intervals — int_pctl","text":"Calculate bootstrap confidence intervals using various methods.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Bootstrap confidence intervals — int_pctl","text":"","code":"int_pctl(.data, ...) # S3 method for class 'bootstraps' int_pctl(.data, statistics, alpha = 0.05, ...) int_t(.data, ...) # S3 method for class 'bootstraps' int_t(.data, statistics, alpha = 0.05, ...) int_bca(.data, ...) # S3 method for class 'bootstraps' int_bca(.data, statistics, alpha = 0.05, .fn, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Bootstrap confidence intervals — int_pctl","text":".data data frame containing bootstrap resamples created using bootstraps(). t- BCa-intervals, apparent argument set TRUE. Even apparent argument set TRUE percentile method, apparent data never used calculating percentile confidence interval. ... Arguments pass .fn (int_bca() ). statistics unquoted column name dplyr selector identifies single column data set containing individual bootstrap estimates. must list column tidy tibbles (columns term estimate). t-intervals, standard tidy column (usually called std.err) required. See examples . alpha Level significance. .fn function calculate statistic interest. function take rsplit first argument ... required.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Bootstrap confidence intervals — int_pctl","text":"function returns tibble columns .lower, .estimate, .upper, .alpha, .method, term. .method type interval (eg. \"percentile\", \"student-t\", \"BCa\"). term name estimate. Note .estimate returned int_pctl() mean estimates bootstrap resamples estimate apparent model.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Bootstrap confidence intervals — int_pctl","text":"Percentile intervals standard method obtaining confidence intervals require thousands resamples accurate. T-intervals may need fewer resamples require corresponding variance estimate. Bias-corrected accelerated intervals require original function used create statistics interest computationally taxing.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Bootstrap confidence intervals — int_pctl","text":"https://rsample.tidymodels.org/articles/Applications/Intervals.html Davison, ., & Hinkley, D. (1997). Bootstrap Methods Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Bootstrap confidence intervals — int_pctl","text":"","code":"# \\donttest{ library(broom) library(dplyr) library(purrr) library(tibble) lm_est <- function(split, ...) { lm(mpg ~ disp + hp, data = analysis(split)) %>% tidy() } set.seed(52156) car_rs <- bootstraps(mtcars, 500, apparent = TRUE) %>% mutate(results = map(splits, lm_est)) int_pctl(car_rs, results) #> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`. #> # A tibble: 3 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 (Intercept) 27.5 30.7 33.6 0.05 percentile #> 2 disp -0.0440 -0.0300 -0.0162 0.05 percentile #> 3 hp -0.0572 -0.0260 -0.00840 0.05 percentile int_t(car_rs, results) #> # A tibble: 3 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 (Intercept) 28.1 30.7 34.6 0.05 student-t #> 2 disp -0.0446 -0.0300 -0.0170 0.05 student-t #> 3 hp -0.0449 -0.0260 -0.00337 0.05 student-t int_bca(car_rs, results, .fn = lm_est) #> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`. #> # A tibble: 3 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 (Intercept) 27.7 30.7 33.7 0.05 BCa #> 2 disp -0.0446 -0.0300 -0.0172 0.05 BCa #> 3 hp -0.0576 -0.0260 -0.00843 0.05 BCa # putting results into a tidy format rank_corr <- function(split) { dat <- analysis(split) tibble( term = \"corr\", estimate = cor(dat$sqft, dat$price, method = \"spearman\"), # don't know the analytical std.err so no t-intervals std.err = NA_real_ ) } set.seed(69325) data(Sacramento, package = \"modeldata\") bootstraps(Sacramento, 1000, apparent = TRUE) %>% mutate(correlations = map(splits, rank_corr)) %>% int_pctl(correlations) #> # A tibble: 1 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 corr 0.737 0.768 0.796 0.05 percentile # }"},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":null,"dir":"Reference","previous_headings":"","what":"Find Labels from rset Object — labels.rset","title":"Find Labels from rset Object — labels.rset","text":"Produce vector resampling labels (e.g. \"Fold1\") rset object. Currently, nested_cv() supported.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find Labels from rset Object — labels.rset","text":"","code":"# S3 method for class 'rset' labels(object, make_factor = FALSE, ...) # S3 method for class 'vfold_cv' labels(object, make_factor = FALSE, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find Labels from rset Object — labels.rset","text":"object rset object. make_factor logical whether results character factor. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find Labels from rset Object — labels.rset","text":"single character factor vector.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find Labels from rset Object — labels.rset","text":"","code":"labels(vfold_cv(mtcars)) #> [1] \"Fold01\" \"Fold02\" \"Fold03\" \"Fold04\" \"Fold05\" \"Fold06\" \"Fold07\" #> [8] \"Fold08\" \"Fold09\" \"Fold10\""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Find Labels from rsplit Object — labels.rsplit","title":"Find Labels from rsplit Object — labels.rsplit","text":"Produce tibble identification variables single splits can linked particular resample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find Labels from rsplit Object — labels.rsplit","text":"","code":"# S3 method for class 'rsplit' labels(object, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find Labels from rsplit Object — labels.rsplit","text":"object rsplit object ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find Labels from rsplit Object — labels.rsplit","text":"tibble.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find Labels from rsplit Object — labels.rsplit","text":"","code":"cv_splits <- vfold_cv(mtcars) labels(cv_splits$splits[[1]]) #> # A tibble: 1 × 1 #> id #> #> 1 Fold01"},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Leave-One-Out Cross-Validation — loo_cv","title":"Leave-One-Out Cross-Validation — loo_cv","text":"Leave-one-(LOO) cross-validation uses one data point original set assessment data data points analysis set. LOO resampling set many resamples rows original data set.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Leave-One-Out Cross-Validation — loo_cv","text":"","code":"loo_cv(data, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Leave-One-Out Cross-Validation — loo_cv","text":"data data frame. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Leave-One-Out Cross-Validation — loo_cv","text":"tibble classes loo_cv, rset, tbl_df, tbl, data.frame. results include column data split objects one column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Leave-One-Out Cross-Validation — loo_cv","text":"","code":"loo_cv(mtcars) #> # Leave-one-out cross-validation #> # A tibble: 32 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5 #> 6 Resample6 #> 7 Resample7 #> 8 Resample8 #> 9 Resample9 #> 10 Resample10 #> # ℹ 22 more rows"},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":null,"dir":"Reference","previous_headings":"","what":"Make groupings for grouped rsplits — make_groups","title":"Make groupings for grouped rsplits — make_groups","text":"function powers grouped resampling splitting data based upon grouping variable returning assessment set indices split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make groupings for grouped rsplits — make_groups","text":"","code":"make_groups( data, group, v, balance = c(\"groups\", \"observations\", \"prop\"), strata = NULL, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make groupings for grouped rsplits — make_groups","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. v number partitions data set. balance v less number unique groups, groups combined folds? one \"groups\", \"observations\", \"prop\". strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. ... Arguments passed balance functions.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Make groupings for grouped rsplits — make_groups","text":"balance options accepted – make sense – resampling functions. instance, balance = \"prop\" assigns groups folds random, meaning given observation guaranteed one (one) assessment set. means balance = \"prop\" used group_vfold_cv(), option available function. Similarly, group_mc_cv() derivatives assign data one (one) assessment set, rather allow observation assessment set zero--times. result, functions balance argument, hood always specify balance = \"prop\" call make_groups().","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":null,"dir":"Reference","previous_headings":"","what":"Constructors for split objects — make_splits","title":"Constructors for split objects — make_splits","text":"Constructors split objects","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Constructors for split objects — make_splits","text":"","code":"make_splits(x, ...) # Default S3 method make_splits(x, ...) # S3 method for class 'list' make_splits(x, data, class = NULL, ...) # S3 method for class 'data.frame' make_splits(x, assessment, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Constructors for split objects — make_splits","text":"x list integers names \"analysis\" \"assessment\", data frame analysis training data. ... currently used. data data frame. class optional class give object. assessment data frame assessment testing data, can empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Constructors for split objects — make_splits","text":"","code":"df <- data.frame( year = 1900:1999, value = 10 + 8*1900:1999 + runif(100L, 0, 100) ) split_from_indices <- make_splits( x = list(analysis = which(df$year <= 1980), assessment = which(df$year > 1980)), data = df ) split_from_data_frame <- make_splits( x = df[df$year <= 1980,], assessment = df[df$year > 1980,] ) identical(split_from_indices, split_from_data_frame) #> [1] TRUE"},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":null,"dir":"Reference","previous_headings":"","what":"Create or Modify Stratification Variables — make_strata","title":"Create or Modify Stratification Variables — make_strata","text":"function can create strata numeric data make non-numeric data conducive stratification.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create or Modify Stratification Variables — make_strata","text":"","code":"make_strata(x, breaks = 4, nunique = 5, pool = 0.1, depth = 20)"},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create or Modify Stratification Variables — make_strata","text":"x input vector. breaks single number giving number bins desired stratify numeric stratification variable. nunique integer number unique value threshold algorithm. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. depth integer used determine best number percentiles used. number bins based min(5, floor(n / depth)) n = length(x). x numeric, must least 40 rows data set (depth = 20) conduct stratified sampling.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create or Modify Stratification Variables — make_strata","text":"factor vector.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create or Modify Stratification Variables — make_strata","text":"numeric data, number unique levels less nunique, data treated categorical data. categorical inputs, function find levels x occur data percentage less pool. values groups randomly assigned remaining strata (data points missing values x). numeric data unique values nunique, data converted categorical based percentiles data. percentile groups 20 percent data group. , missing values x randomly assigned groups.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create or Modify Stratification Variables — make_strata","text":"","code":"set.seed(61) x1 <- rpois(100, lambda = 5) table(x1) #> x1 #> 1 2 3 4 5 6 7 8 9 10 11 #> 3 16 8 19 14 18 11 4 5 1 1 table(make_strata(x1)) #> #> [1,3] (3,5] (5,6] (6,11] #> 27 33 18 22 set.seed(554) x2 <- rpois(100, lambda = 1) table(x2) #> x2 #> 0 1 2 3 4 #> 36 34 19 6 5 table(make_strata(x2)) #> #> 0 1 2 #> 38 40 22 # small groups are randomly assigned x3 <- factor(x2) table(x3) #> x3 #> 0 1 2 3 4 #> 36 34 19 6 5 table(make_strata(x3)) #> #> 0 1 2 #> 41 35 24 x4 <- rep(LETTERS[1:7], c(37, 26, 3, 7, 11, 10, 2)) table(x4) #> x4 #> A B C D E F G #> 37 26 3 7 11 10 2 table(make_strata(x4)) #> #> A B E F #> 40 27 14 15 table(make_strata(x4, pool = 0.1)) #> #> A B E F #> 38 29 12 17 table(make_strata(x4, pool = 0.0)) #> Warning: Stratifying groups that make up 0% of the data may be statistically risky. #> • Consider increasing `pool` to at least 0.1 #> #> A B C D E F G #> 37 26 3 7 11 10 2 # not enough data to stratify x5 <- rnorm(20) table(make_strata(x5)) #> Warning: The number of observations in each quantile is below the recommended threshold of 20. #> • Stratification will use 1 breaks instead. #> Warning: Too little data to stratify. #> • Resampling will be unstratified. #> #> strata1 #> 20 set.seed(483) x6 <- rnorm(200) quantile(x6, probs = (0:10) / 10) #> 0% 10% 20% 30% 40% 50% #> -2.9114060 -1.4508635 -0.9513821 -0.6257852 -0.3286468 -0.0364388 #> 60% 70% 80% 90% 100% #> 0.2027140 0.4278573 0.7050643 1.2471852 2.6792505 table(make_strata(x6, breaks = 10)) #> #> [-2.91,-1.45] (-1.45,-0.951] (-0.951,-0.626] (-0.626,-0.329] #> 20 20 20 20 #> (-0.329,-0.0364] (-0.0364,0.203] (0.203,0.428] (0.428,0.705] #> 20 20 20 20 #> (0.705,1.25] (1.25,2.68] #> 20 20"},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":null,"dir":"Reference","previous_headings":"","what":"Manual resampling — manual_rset","title":"Manual resampling — manual_rset","text":"manual_rset() used constructing minimal rset possible. can useful custom rsplit objects built make_splits(), want create new rset splits contained within existing rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Manual resampling — manual_rset","text":"","code":"manual_rset(splits, ids)"},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Manual resampling — manual_rset","text":"splits list \"rsplit\" objects. easiest create using make_splits(). ids character vector ids. length ids must length splits.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Manual resampling — manual_rset","text":"","code":"df <- data.frame(x = c(1, 2, 3, 4, 5, 6)) # Create an rset from custom indices indices <- list( list(analysis = c(1L, 2L), assessment = 3L), list(analysis = c(4L, 5L), assessment = 6L) ) splits <- lapply(indices, make_splits, data = df) manual_rset(splits, c(\"Split 1\", \"Split 2\")) #> # Manual resampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Split 1 #> 2 Split 2 # You can also use this to create an rset from a subset of an # existing rset resamples <- vfold_cv(mtcars) best_split <- resamples[5, ] manual_rset(best_split$splits, best_split$id) #> # Manual resampling #> # A tibble: 1 × 2 #> splits id #> #> 1 Fold05"},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Monte Carlo Cross-Validation — mc_cv","title":"Monte Carlo Cross-Validation — mc_cv","text":"One resample Monte Carlo cross-validation takes random sample (without replacement) original data set used analysis. data points added assessment set.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Monte Carlo Cross-Validation — mc_cv","text":"","code":"mc_cv(data, prop = 3/4, times = 25, strata = NULL, breaks = 4, pool = 0.1, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Monte Carlo Cross-Validation — mc_cv","text":"data data frame. prop proportion data retained modeling/analysis. times number times repeat sampling. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Monte Carlo Cross-Validation — mc_cv","text":"tibble classes mc_cv, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Monte Carlo Cross-Validation — mc_cv","text":"strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Monte Carlo Cross-Validation — mc_cv","text":"","code":"mc_cv(mtcars, times = 2) #> # Monte Carlo cross-validation (0.75/0.25) with 2 resamples #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 mc_cv(mtcars, prop = .5, times = 2) #> # Monte Carlo cross-validation (0.5/0.5) with 2 resamples #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 library(purrr) data(wa_churn, package = \"modeldata\") set.seed(13) resample1 <- mc_cv(wa_churn, times = 3, prop = .5) map_dbl( resample1$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2709458 0.2621414 0.2632775 set.seed(13) resample2 <- mc_cv(wa_churn, strata = churn, times = 3, prop = .5) map_dbl( resample2$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2652655 0.2652655 0.2652655 set.seed(13) resample3 <- mc_cv(wa_churn, strata = tenure, breaks = 6, times = 3, prop = .5) map_dbl( resample3$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2636364 0.2599432 0.2576705"},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Nested or Double Resampling — nested_cv","title":"Nested or Double Resampling — nested_cv","text":"nested_cv() can used take results one resampling procedure conduct resamples within split. type resampling used rsample can used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Nested or Double Resampling — nested_cv","text":"","code":"nested_cv(data, outside, inside)"},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Nested or Double Resampling — nested_cv","text":"data data frame. outside initial resampling specification. can already created object expression new object (see examples ). latter used, data argument need specified , given, ignored. inside expression type resampling conducted within initial procedure.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Nested or Double Resampling — nested_cv","text":"tibble nested_cv class classes outer resampling process normally contains. results include column outer data split objects, one id columns, column nested tibbles called inner_resamples additional resamples.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Nested or Double Resampling — nested_cv","text":"bad idea use bootstrapping outer resampling procedure (see example )","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Nested or Double Resampling — nested_cv","text":"","code":"## Using expressions for the resampling procedures: nested_cv(mtcars, outside = vfold_cv(v = 3), inside = bootstraps(times = 5)) #> # Nested resampling: #> # outer: 3-fold cross-validation #> # inner: Bootstrap sampling #> # A tibble: 3 × 3 #> splits id inner_resamples #> #> 1 Fold1 #> 2 Fold2 #> 3 Fold3 ## Using an existing object: folds <- vfold_cv(mtcars) nested_cv(mtcars, folds, inside = bootstraps(times = 5)) #> # Nested resampling: #> # outer: `folds` #> # inner: Bootstrap sampling #> # A tibble: 10 × 3 #> splits id inner_resamples #> #> 1 Fold01 #> 2 Fold02 #> 3 Fold03 #> 4 Fold04 #> 5 Fold05 #> 6 Fold06 #> 7 Fold07 #> 8 Fold08 #> 9 Fold09 #> 10 Fold10 ## The dangers of outer bootstraps: set.seed(2222) bad_idea <- nested_cv(mtcars, outside = bootstraps(times = 5), inside = vfold_cv(v = 3) ) #> Warning: Using bootstrapping as the outer resample is dangerous since the inner resample might have the same data point in both the analysis and assessment set. first_outer_split <- get_rsplit(bad_idea, 1) outer_analysis <- analysis(first_outer_split) sum(grepl(\"Camaro Z28\", rownames(outer_analysis))) #> [1] 3 ## For the 3-fold CV used inside of each bootstrap, how are the replicated ## `Camaro Z28` data partitioned? first_inner_split <- get_rsplit(bad_idea$inner_resamples[[1]], 1) inner_analysis <- analysis(first_inner_split) inner_assess <- assessment(first_inner_split) sum(grepl(\"Camaro Z28\", rownames(inner_analysis))) #> [1] 1 sum(grepl(\"Camaro Z28\", rownames(inner_assess))) #> [1] 2"},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":null,"dir":"Reference","previous_headings":"","what":"Constructor for new rset objects — new_rset","title":"Constructor for new rset objects — new_rset","text":"Constructor new rset objects","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Constructor for new rset objects — new_rset","text":"","code":"new_rset(splits, ids, attrib = NULL, subclass = character())"},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Constructor for new rset objects — new_rset","text":"splits list column rsplits tibble single column called \"splits\" list column rsplits. ids character vector tibble one columns begin \"id\". attrib optional named list attributes add object. subclass character vector subclasses add.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Constructor for new rset objects — new_rset","text":"rset object.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Constructor for new rset objects — new_rset","text":"new rset constructed, additional attribute called \"fingerprint\" added hash rset. can used make sure objects exact resamples.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":null,"dir":"Reference","previous_headings":"","what":"Permutation sampling — permutations","title":"Permutation sampling — permutations","text":"permutation sample size original data set made permuting/shuffling one columns. results analysis samples columns original order columns permuted random order. Unlike sampling functions rsample, assessment set calling assessment() permutation split throw error.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Permutation sampling — permutations","text":"","code":"permutations(data, permute = NULL, times = 25, apparent = FALSE, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Permutation sampling — permutations","text":"data data frame. permute One columns shuffle. argument supports tidyselect selectors. Multiple expressions can combined c(). Variable names can used positions data frame, expressions like x:y can used select range variables. See language details. times number permutation samples. apparent logical. extra resample added analysis standard data set. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Permutation sampling — permutations","text":"tibble classes permutations, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Permutation sampling — permutations","text":"argument apparent enables option additional \"resample\" analysis data set original data set. Permutation-based resampling can especially helpful computing statistic null hypothesis (e.g. t-statistic). forms basis permutation test, computes test statistic possible permutations data.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Permutation sampling — permutations","text":"","code":"permutations(mtcars, mpg, times = 2) #> # Permutation sampling #> # Permuted columns: [mpg] #> # A tibble: 2 × 2 #> splits id #> #> 1 Permutations1 #> 2 Permutations2 permutations(mtcars, mpg, times = 2, apparent = TRUE) #> # Permutation sampling with apparent sample #> # Permuted columns: [mpg] #> # A tibble: 3 × 2 #> splits id #> #> 1 Permutations1 #> 2 Permutations2 #> 3 Apparent library(purrr) resample1 <- permutations(mtcars, starts_with(\"c\"), times = 1) resample1$splits[[1]] %>% analysis() #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 1 #> Mazda RX4 Wag 21.0 4 160.0 110 3.90 2.875 17.02 0 1 4 2 #> Datsun 710 22.8 8 108.0 93 3.85 2.320 18.61 1 1 4 4 #> Hornet 4 Drive 21.4 8 258.0 110 3.08 3.215 19.44 1 0 3 4 #> Hornet Sportabout 18.7 4 360.0 175 3.15 3.440 17.02 0 0 3 1 #> Valiant 18.1 8 225.0 105 2.76 3.460 20.22 1 0 3 4 #> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 2 #> Merc 240D 24.4 8 146.7 62 3.69 3.190 20.00 1 0 4 3 #> Merc 230 22.8 8 140.8 95 3.92 3.150 22.90 1 0 4 4 #> Merc 280 19.2 8 167.6 123 3.92 3.440 18.30 1 0 4 2 #> Merc 280C 17.8 8 167.6 123 3.92 3.440 18.90 1 0 4 2 #> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 4 #> Merc 450SL 17.3 6 275.8 180 3.07 3.730 17.60 0 0 3 4 #> Merc 450SLC 15.2 6 275.8 180 3.07 3.780 18.00 0 0 3 4 #> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 8 #> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 2 #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 3 #> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 2 #> Honda Civic 30.4 6 75.7 52 4.93 1.615 18.52 1 1 4 1 #> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 2 #> Toyota Corona 21.5 6 120.1 97 3.70 2.465 20.01 1 0 3 4 #> Dodge Challenger 15.5 4 318.0 150 2.76 3.520 16.87 0 0 3 1 #> AMC Javelin 15.2 4 304.0 150 3.15 3.435 17.30 0 0 3 2 #> Camaro Z28 13.3 4 350.0 245 3.73 3.840 15.41 0 0 3 2 #> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 3 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 #> Porsche 914-2 26.0 6 120.3 91 4.43 2.140 16.70 0 1 5 4 #> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 1 #> Ford Pantera L 15.8 4 351.0 264 4.22 3.170 14.50 0 1 5 1 #> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 4 #> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 resample2 <- permutations(mtcars, hp, times = 10, apparent = TRUE) map_dbl(resample2$splits, function(x) { t.test(hp ~ vs, data = analysis(x))$statistic }) #> [1] 1.831884490 0.360219662 -1.271345514 -1.086517310 0.884050160 #> [6] 1.130681222 0.369342268 -2.595445455 0.007920257 0.562836352 #> [11] 6.290837794"},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":null,"dir":"Reference","previous_headings":"","what":"Add Assessment Indices — populate","title":"Add Assessment Indices — populate","text":"Many rsplit rset objects contain indicators assessment samples. populate() can used fill slot appropriate indices.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add Assessment Indices — populate","text":"","code":"populate(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add Assessment Indices — populate","text":"x rsplit rset object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add Assessment Indices — populate","text":"object kind integer indices.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add Assessment Indices — populate","text":"","code":"set.seed(28432) fold_rs <- vfold_cv(mtcars) fold_rs$splits[[1]]$out_id #> [1] NA complement(fold_rs$splits[[1]]) #> [1] 1 9 25 27 populate(fold_rs$splits[[1]])$out_id #> [1] 1 9 25 27 fold_rs_all <- populate(fold_rs) fold_rs_all$splits[[1]]$out_id #> [1] 1 9 25 27"},{"path":"https://rsample.tidymodels.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. generics tidy tidyselect all_of, any_of, contains, ends_with, everything, last_col, matches, num_range, starts_with","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":null,"dir":"Reference","previous_headings":"","what":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"convenience function confidence intervals linear-ish parametric models","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"","code":"reg_intervals( formula, data, model_fn = \"lm\", type = \"student-t\", times = NULL, alpha = 0.05, filter = term != \"(Intercept)\", keep_reps = FALSE, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"formula R model formula one outcome least one predictor. data data frame. model_fn model fit. Allowable values \"lm\", \"glm\", \"survreg\", \"coxph\". latter two require survival package installed. type type bootstrap confidence interval. Values \"student-t\" \"percentile\" allowed. times single integer number bootstrap samples. left NULL, 1,001 used t-intervals 2,001 percentile intervals. alpha Level significance. filter logical expression used remove rows final result, NULL keep rows. keep_reps individual parameter estimates bootstrap sample retained? ... Options pass model function (family stats::glm()).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"tibble columns \"term\", \".lower\", \".estimate\", \".upper\", \".alpha\", \".method\". keep_reps = TRUE, additional list column called \".replicates\" also returned.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"Davison, ., & Hinkley, D. (1997). Bootstrap Methods Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843 Bootstrap Confidence Intervals, https://rsample.tidymodels.org/articles/Applications/Intervals.html","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"","code":"# \\donttest{ set.seed(1) reg_intervals(mpg ~ I(1 / sqrt(disp)), data = mtcars) #> # A tibble: 1 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 I(1/sqrt(disp)) 207. 249. 290. 0.05 student-t set.seed(1) reg_intervals(mpg ~ I(1 / sqrt(disp)), data = mtcars, keep_reps = TRUE) #> # A tibble: 1 × 7 #> term .lower .estimate .upper .alpha .method .replicates #> #> 1 I(1/sqrt(disp)) 207. 249. 290. 0.05 student-t [1,001 × 2] # }"},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":null,"dir":"Reference","previous_headings":"","what":"","title":"","text":"function re-generates rset object, using arguments used generate original.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"","text":"","code":"reshuffle_rset(rset)"},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"","text":"rset rset object reshuffled","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"","text":"rset class rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"","text":"","code":"set.seed(123) (starting_splits <- group_vfold_cv(mtcars, cyl, v = 3)) #> # Group 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 reshuffle_rset(starting_splits) #> # Group 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3"},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":null,"dir":"Reference","previous_headings":"","what":"Reverse the analysis and assessment sets — reverse_splits","title":"Reverse the analysis and assessment sets — reverse_splits","text":"functions \"swaps\" analysis assessment sets either single rsplit rsplits splits column rset object.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Reverse the analysis and assessment sets — reverse_splits","text":"","code":"reverse_splits(x, ...) # Default S3 method reverse_splits(x, ...) # S3 method for class 'permutations' reverse_splits(x, ...) # S3 method for class 'perm_split' reverse_splits(x, ...) # S3 method for class 'rsplit' reverse_splits(x, ...) # S3 method for class 'rset' reverse_splits(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Reverse the analysis and assessment sets — reverse_splits","text":"x rset rsplit object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Reverse the analysis and assessment sets — reverse_splits","text":"object class x","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Reverse the analysis and assessment sets — reverse_splits","text":"","code":"set.seed(123) starting_splits <- vfold_cv(mtcars, v = 3) reverse_splits(starting_splits) #> # 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2 #> 3 Fold3 reverse_splits(starting_splits$splits[[1]]) #> #> <11/21/32>"},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":null,"dir":"Reference","previous_headings":"","what":"Rolling Origin Forecast Resampling — rolling_origin","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"resampling method useful data set strong time component. resamples random contain data points consecutive values. function assumes original data set sorted time order.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"","code":"rolling_origin( data, initial = 5, assess = 1, cumulative = TRUE, skip = 0, lag = 0, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"data data frame. initial number samples used analysis/modeling initial resample. assess number samples used assessment resample. cumulative logical. analysis resample grow beyond size specified initial resample?. skip integer indicating many () additional resamples skip thin total amount data points analysis resample. See example . lag value include lag assessment analysis set. useful lagged predictors used training testing. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"tibble classes rolling_origin, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"main options, initial assess, control number data points original data analysis assessment set, respectively. cumulative = TRUE, analysis set grow resampling continues assessment set size always remain static. skip enables function use every data point resamples. skip = 0, resampling data sets increment one position. Suppose rows data set consecutive days. Using skip = 6 make analysis data set operate weeks instead days. assessment set size affected option.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"","code":"set.seed(1131) ex_data <- data.frame(row = 1:20, some_var = rnorm(20)) dim(rolling_origin(ex_data)) #> [1] 15 2 dim(rolling_origin(ex_data, skip = 2)) #> [1] 5 2 dim(rolling_origin(ex_data, skip = 2, cumulative = FALSE)) #> [1] 5 2 # You can also roll over calendar periods by first nesting by that period, # which is especially useful for irregular series where a fixed window # is not useful. This example slides over 5 years at a time. library(dplyr) library(tidyr) data(drinks, package = \"modeldata\") drinks_annual <- drinks %>% mutate(year = as.POSIXlt(date)$year + 1900) %>% nest(data = c(-year)) multi_year_roll <- rolling_origin(drinks_annual, cumulative = FALSE) analysis(multi_year_roll$splits[[1]]) #> # A tibble: 5 × 2 #> year data #> #> 1 1992 #> 2 1993 #> 3 1994 #> 4 1995 #> 5 1996 assessment(multi_year_roll$splits[[1]]) #> # A tibble: 1 × 2 #> year data #> #> 1 1997 "},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":null,"dir":"Reference","previous_headings":"","what":"Compatibility with dplyr — rsample-dplyr","title":"Compatibility with dplyr — rsample-dplyr","text":"page lays compatibility rsample dplyr. rset objects rsample specific subclass tibbles, hence standard dplyr operations like joins well row column modifications work. However, whether operation returns rset tibble depends details operation. overarching principle operation leaves specific characteristics rset intact return rset. operation modifies following characteristics, result tibble rather rset: Rows: number rows needs remain unchanged retain rset property. example, 10-fold CV object without 10 rows. order rows can changed though object remains rset. Columns: splits column id column(s) required rset need remain untouched. dropped, renamed, modified result remain rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":"joins","dir":"Reference","previous_headings":"","what":"Joins","title":"Compatibility with dplyr — rsample-dplyr","text":"following affect dplyr joins, left_join(), right_join(), full_join(), inner_join(). resulting object rset number rows unaffected. Rows can reordered added removed, otherwise resulting object tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":"row-operations","dir":"Reference","previous_headings":"","what":"Row Operations","title":"Compatibility with dplyr — rsample-dplyr","text":"resulting object rset number rows unaffected. Rows can reordered added removed, otherwise resulting object tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":"column-operations","dir":"Reference","previous_headings":"","what":"Column Operations","title":"Compatibility with dplyr — rsample-dplyr","text":"resulting object rset required splits id columns remain unaltered. Otherwise resulting object tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-package.html","id":null,"dir":"Reference","previous_headings":"","what":"rsample: General Resampling Infrastructure — rsample-package","title":"rsample: General Resampling Infrastructure — rsample-package","text":"Classes functions create summarize different types resampling objects (e.g. bootstrap, cross-validation).","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"rsample: General Resampling Infrastructure — rsample-package","text":"Maintainer: Hannah Frick hannah@posit.co (ORCID) Authors: Fanny Chow fannybchow@gmail.com Max Kuhn max@posit.co Michael Mahoney mike.mahoney.218@gmail.com (ORCID) Julia Silge julia.silge@posit.co (ORCID) Hadley Wickham hadley@posit.co contributors: Posit Software, PBC [copyright holder, funder]","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert Resampling Objects to Other Formats — rsample2caret","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"functions can convert resampling objects rsample caret.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"","code":"rsample2caret(object, data = c(\"analysis\", \"assessment\")) caret2rsample(ctrl, data = NULL)"},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"object rset object. Currently, nested_cv() supported. data data originally used produce ctrl object. ctrl object produced caret::trainControl() index indexOut elements populated integers. One method getting extract control objects object produced train.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"rsample2caret() returns list mimics index indexOut elements trainControl object. caret2rsample() returns rset object appropriate class.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":null,"dir":"Reference","previous_headings":"","what":"Extending rsample with new rset subclasses — rset_reconstruct","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"rset_reconstruct() encapsulates logic allowing new rset subclasses work properly vctrs (vctrs::vec_restore()) dplyr (dplyr::dplyr_reconstruct()). intended developer tool, required normal usage rsample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"","code":"rset_reconstruct(x, to)"},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"x data frame restore rset subclass. rset subclass restore .","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"x restored rset subclass .","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"rset objects considered \"reconstructable\" vctrs/dplyr operation : x identical column named \"splits\" (column row order matter). x identical columns prefixed \"id\" (column row order matter).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"","code":"to <- bootstraps(mtcars, times = 25) # Imitate a vctrs/dplyr operation, # where the class might be lost along the way x <- tibble::as_tibble(to) # Say we added a new column to `x`. Here we mock a `mutate()`. x$foo <- \"bar\" # This is still reconstructable to `to` rset_reconstruct(x, to) #> # Bootstrap sampling #> # A tibble: 25 × 3 #> splits id foo #> #> 1 Bootstrap01 bar #> 2 Bootstrap02 bar #> 3 Bootstrap03 bar #> 4 Bootstrap04 bar #> 5 Bootstrap05 bar #> 6 Bootstrap06 bar #> 7 Bootstrap07 bar #> 8 Bootstrap08 bar #> 9 Bootstrap09 bar #> 10 Bootstrap10 bar #> # ℹ 15 more rows # Say we lose the first row x <- x[-1, ] # This is no longer reconstructable to `to`, as `x` is no longer an rset # bootstraps object with 25 bootstraps if one is lost! rset_reconstruct(x, to) #> # A tibble: 24 × 3 #> splits id foo #> #> 1 Bootstrap02 bar #> 2 Bootstrap03 bar #> 3 Bootstrap04 bar #> 4 Bootstrap05 bar #> 5 Bootstrap06 bar #> 6 Bootstrap07 bar #> 7 Bootstrap08 bar #> 8 Bootstrap09 bar #> 9 Bootstrap10 bar #> 10 Bootstrap11 bar #> # ℹ 14 more rows"},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":null,"dir":"Reference","previous_headings":"","what":"Time-based Resampling — slide-resampling","title":"Time-based Resampling — slide-resampling","text":"resampling functions focused various forms time series resampling. sliding_window() uses row number computing resampling indices. independent time index, useful completely regular series. sliding_index() computes resampling indices relative index column. often Date POSIXct column, . useful resampling irregular series, using irregular lookback periods lookback = lubridate::years(1) daily data (number days year may vary). sliding_period() first breaks index less granular groups based period, uses construct resampling indices. extremely useful constructing rolling monthly yearly windows daily data.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Time-based Resampling — slide-resampling","text":"","code":"sliding_window( data, ..., lookback = 0L, assess_start = 1L, assess_stop = 1L, complete = TRUE, step = 1L, skip = 0L ) sliding_index( data, index, ..., lookback = 0L, assess_start = 1L, assess_stop = 1L, complete = TRUE, step = 1L, skip = 0L ) sliding_period( data, index, period, ..., lookback = 0L, assess_start = 1L, assess_stop = 1L, complete = TRUE, step = 1L, skip = 0L, every = 1L, origin = NULL )"},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Time-based Resampling — slide-resampling","text":"data data frame. ... dots future extensions must empty. lookback number elements look back current element computing resampling indices analysis set. current row always included analysis set. sliding_window(), single integer defining number rows look back current row. sliding_index(), single object subtracted index index - lookback define boundary start searching rows include current resample. often integer value corresponding number days look back, lubridate Period object. sliding_period(), single integer defining number groups look back current group, groups defined breaking index according period. cases, Inf also allowed force expanding window. assess_start, assess_stop combination arguments determines far future look constructing assessment set. Together construct range [index + assess_start, index + assess_stop] search rows include assessment set. Generally, assess_start always 1 indicate first value potentially include assessment set start one element current row, can increased larger value create \"gaps\" analysis assessment set worried high levels correlation short term forecasting. sliding_window(), single integers defining number rows look forward current row. sliding_index(), single objects added index compute range search rows include assessment set. often integer value corresponding number days look forward, lubridate Period object. sliding_period(), single integers defining number groups look forward current group, groups defined breaking index according period. complete single logical. using lookback compute analysis sets, complete windows considered? set FALSE, partial windows used possible create complete window (based lookback). way use expanding window certain point, switch sliding window. step single positive integer. computing resampling indices, step used thin results selecting every step-th result subsetting indices seq(1L, n_indices, = step). step applied skip. Note step independent time index used. skip single positive integer, zero. computing resampling indices, first skip results dropped subsetting indices seq(skip + 1L, n_indices). can especially useful combined lookback = Inf, creates expanding window starting first row. skipping forward, can drop first windows data points. skip applied step. Note skip independent time index used. index index compute resampling indices relative , specified bare column name. must existing column data. sliding_index(), commonly date vector, required. sliding_period(), required Date POSIXct vector. index must increasing vector, duplicate values allowed. Additionally, index contain missing values. period period group index . specified single string, \"year\" \"month\". See .period argument slider::slide_period() full list options explanation. every single positive integer. number periods group together. example, period set \"year\" every value 2, years 1970 1971 placed group. origin reference date time value. default left NULL epoch time 1970-01-01 00:00:00, time zone index. generally used define anchor time count , relevant every value > 1.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Time-based Resampling — slide-resampling","text":"","code":"library(vctrs) #> #> Attaching package: ‘vctrs’ #> The following object is masked from ‘package:tibble’: #> #> data_frame #> The following object is masked from ‘package:dplyr’: #> #> data_frame library(tibble) library(modeldata) data(\"Chicago\") index <- new_date(c(1, 3, 4, 7, 8, 9, 13, 15, 16, 17)) df <- tibble(x = 1:10, index = index) df #> # A tibble: 10 × 2 #> x index #> #> 1 1 1970-01-02 #> 2 2 1970-01-04 #> 3 3 1970-01-05 #> 4 4 1970-01-08 #> 5 5 1970-01-09 #> 6 6 1970-01-10 #> 7 7 1970-01-14 #> 8 8 1970-01-16 #> 9 9 1970-01-17 #> 10 10 1970-01-18 # Look back two rows beyond the current row, for a total of three rows # in each analysis set. Each assessment set is composed of the two rows after # the current row. sliding_window(df, lookback = 2, assess_stop = 2) #> # Sliding window resampling #> # A tibble: 6 × 2 #> splits id #> #> 1 Slice1 #> 2 Slice2 #> 3 Slice3 #> 4 Slice4 #> 5 Slice5 #> 6 Slice6 # Same as before, but step forward by 3 rows between each resampling slice, # rather than just by 1. rset <- sliding_window(df, lookback = 2, assess_stop = 2, step = 3) rset #> # Sliding window resampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice1 #> 2 Slice2 analysis(rset$splits[[1]]) #> # A tibble: 3 × 2 #> x index #> #> 1 1 1970-01-02 #> 2 2 1970-01-04 #> 3 3 1970-01-05 analysis(rset$splits[[2]]) #> # A tibble: 3 × 2 #> x index #> #> 1 4 1970-01-08 #> 2 5 1970-01-09 #> 3 6 1970-01-10 # Now slide relative to the `index` column in `df`. This time we look back # 2 days from the current row's `index` value, and 2 days forward from # it to construct the assessment set. Note that this series is irregular, # so it produces different results than `sliding_window()`. Additionally, # note that it is entirely possible for the assessment set to contain no # data if you have a highly irregular series and \"look forward\" into a # date range where no data points actually exist! sliding_index(df, index, lookback = 2, assess_stop = 2) #> # Sliding index resampling #> # A tibble: 7 × 2 #> splits id #> #> 1 Slice1 #> 2 Slice2 #> 3 Slice3 #> 4 Slice4 #> 5 Slice5 #> 6 Slice6 #> 7 Slice7 # With `sliding_period()`, we can break up our date index into less granular # chunks, and slide over them instead of the index directly. Here we'll use # the Chicago data, which contains daily data spanning 16 years, and we'll # break it up into rolling yearly chunks. Three years worth of data will # be used for the analysis set, and one years worth of data will be held out # for performance assessment. sliding_period( Chicago, date, \"year\", lookback = 2, assess_stop = 1 ) #> # Sliding period resampling #> # A tibble: 13 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 #> 3 Slice03 #> 4 Slice04 #> 5 Slice05 #> 6 Slice06 #> 7 Slice07 #> 8 Slice08 #> 9 Slice09 #> 10 Slice10 #> 11 Slice11 #> 12 Slice12 #> 13 Slice13 # Because `lookback = 2`, three years are required to form a \"complete\" # window of data. To allow partial windows, set `complete = FALSE`. # Here that first constructs two expanding windows until a complete three # year window can be formed, at which point we switch to a sliding window. sliding_period( Chicago, date, \"year\", lookback = 2, assess_stop = 1, complete = FALSE ) #> # Sliding period resampling #> # A tibble: 15 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 #> 3 Slice03 #> 4 Slice04 #> 5 Slice05 #> 6 Slice06 #> 7 Slice07 #> 8 Slice08 #> 9 Slice09 #> 10 Slice10 #> 11 Slice11 #> 12 Slice12 #> 13 Slice13 #> 14 Slice14 #> 15 Slice15 # Alternatively, you could break the resamples up by month. Here we'll # use an expanding monthly window by setting `lookback = Inf`, and each # assessment set will contain two months of data. To ensure that we have # enough data to fit our models, we'll `skip` the first 4 expanding windows. # Finally, to thin out the results, we'll `step` forward by 2 between # each resample. sliding_period( Chicago, date, \"month\", lookback = Inf, assess_stop = 2, skip = 4, step = 2 ) #> # Sliding period resampling #> # A tibble: 91 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 #> 3 Slice03 #> 4 Slice04 #> 5 Slice05 #> 6 Slice06 #> 7 Slice07 #> 8 Slice08 #> 9 Slice09 #> 10 Slice10 #> # ℹ 81 more rows"},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy Resampling Object — tidy.rsplit","title":"Tidy Resampling Object — tidy.rsplit","text":"tidy() function broom package can used rset rsplit objects generate tibbles rows analysis assessment sets.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy Resampling Object — tidy.rsplit","text":"","code":"# S3 method for class 'rsplit' tidy(x, unique_ind = TRUE, ...) # S3 method for class 'rset' tidy(x, unique_ind = TRUE, ...) # S3 method for class 'vfold_cv' tidy(x, ...) # S3 method for class 'nested_cv' tidy(x, unique_ind = TRUE, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy Resampling Object — tidy.rsplit","text":"x rset rsplit object unique_ind unique row identifiers returned? example, FALSE bootstrapping results include multiple rows sample row original data. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tidy Resampling Object — tidy.rsplit","text":"tibble columns Row Data. latter possible values \"Analysis\" \"Assessment\". rset inputs, identification columns also returned names values depend type resampling. vfold_cv(), contains column \"Fold\" , repeats used, another called \"Repeats\". bootstraps() mc_cv() use column \"Resample\".","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy Resampling Object — tidy.rsplit","text":"Note nested resampling, rows inner resample, named inner_Row, relative row indices correspond rows original data set.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy Resampling Object — tidy.rsplit","text":"","code":"library(ggplot2) theme_set(theme_bw()) set.seed(4121) cv <- tidy(vfold_cv(mtcars, v = 5)) ggplot(cv, aes(x = Fold, y = Row, fill = Data)) + geom_tile() + scale_fill_brewer() set.seed(4121) rcv <- tidy(vfold_cv(mtcars, v = 5, repeats = 2)) ggplot(rcv, aes(x = Fold, y = Row, fill = Data)) + geom_tile() + facet_wrap(~Repeat) + scale_fill_brewer() set.seed(4121) mccv <- tidy(mc_cv(mtcars, times = 5)) ggplot(mccv, aes(x = Resample, y = Row, fill = Data)) + geom_tile() + scale_fill_brewer() set.seed(4121) bt <- tidy(bootstraps(mtcars, time = 5)) ggplot(bt, aes(x = Resample, y = Row, fill = Data)) + geom_tile() + scale_fill_brewer() dat <- data.frame(day = 1:30) # Resample by week instead of day ts_cv <- rolling_origin(dat, initial = 7, assess = 7, skip = 6, cumulative = FALSE ) ts_cv <- tidy(ts_cv) ggplot(ts_cv, aes(x = Resample, y = factor(Row), fill = Data)) + geom_tile() + scale_fill_brewer()"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a Validation Split for Tuning — validation_set","title":"Create a Validation Split for Tuning — validation_set","text":"validation_set() creates validation split model tuning.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a Validation Split for Tuning — validation_set","text":"","code":"validation_set(split, ...) # S3 method for class 'val_split' analysis(x, ...) # S3 method for class 'val_split' assessment(x, ...) # S3 method for class 'val_split' training(x, ...) # S3 method for class 'val_split' validation(x, ...) # S3 method for class 'val_split' testing(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a Validation Split for Tuning — validation_set","text":"split object class initial_validation_split, resulting initial_validation_split() group_initial_validation_split(). ... dots future extensions must empty. x rsplit object produced validation_set().","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a Validation Split for Tuning — validation_set","text":"tibble classes validation_set, rset, tbl_df, tbl, data.frame. results include column data split object column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a Validation Split for Tuning — validation_set","text":"","code":"set.seed(1353) car_split <- initial_validation_split(mtcars) car_set <- validation_set(car_split)"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a Validation Set — validation_split","title":"Create a Validation Set — validation_split","text":"function deprecated part approach constructing training, validation, testing set sequence two binary splits: testing / -testing (initial_split() one variants) -testing split training/validation validation_split(). Instead, now use initial_validation_split() one variants construct three sets via one 3-way split. validation_split() takes single random sample (without replacement) original data set used analysis. data points added assessment set (used validation set). validation_time_split() , takes first prop samples training, instead random selection. group_validation_split() creates splits data based grouping variable, data \"group\" assigned split. Note input data validation_split(), validation_time_split(), group_validation_split() contain testing data. create three-way split directly entire data set, use initial_validation_split().","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a Validation Set — validation_split","text":"","code":"validation_split(data, prop = 3/4, strata = NULL, breaks = 4, pool = 0.1, ...) validation_time_split(data, prop = 3/4, lag = 0, ...) group_validation_split(data, group, prop = 3/4, ..., strata = NULL, pool = 0.1)"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a Validation Set — validation_split","text":"data data frame. prop proportion data retained modeling/analysis. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty. lag value include lag assessment analysis set. useful lagged predictors used training testing. group variable data (single character name) used grouping observations value either analysis assessment set within fold.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a Validation Set — validation_split","text":"tibble classes validation_split, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a Validation Set — validation_split","text":"strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a Validation Set — validation_split","text":"","code":"cars_split <- initial_split(mtcars) cars_not_testing <- training(cars_split) validation_split(cars_not_testing, prop = .9) #> Warning: `validation_split()` was deprecated in rsample 1.2.0. #> ℹ Please use `initial_validation_split()` instead. #> # Validation Set Split (0.9/0.1) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation group_validation_split(cars_not_testing, cyl) #> Warning: `group_validation_split()` was deprecated in rsample 1.2.0. #> ℹ Please use `group_initial_validation_split()` instead. #> # Group Validation Set Split (0.75/0.25) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation data(drinks, package = \"modeldata\") validation_time_split(drinks[1:200,]) #> Warning: `validation_time_split()` was deprecated in rsample 1.2.0. #> ℹ Please use `initial_validation_time_split()` instead. #> # Validation Set Split (0.75/0.25) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation # Alternative cars_split_3 <- initial_validation_split(mtcars) validation_set(cars_split_3) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation"},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"V-Fold Cross-Validation — vfold_cv","title":"V-Fold Cross-Validation — vfold_cv","text":"V-fold cross-validation (also known k-fold cross-validation) randomly splits data V groups roughly equal size (called \"folds\"). resample analysis data consists V-1 folds assessment set contains final fold. basic V-fold cross-validation (.e. repeats), number resamples equal V.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"V-Fold Cross-Validation — vfold_cv","text":"","code":"vfold_cv(data, v = 10, repeats = 1, strata = NULL, breaks = 4, pool = 0.1, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"V-Fold Cross-Validation — vfold_cv","text":"data data frame. v number partitions data set. repeats number times repeat V-fold partitioning. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"V-Fold Cross-Validation — vfold_cv","text":"tibble classes vfold_cv, rset, tbl_df, tbl, data.frame. results include column data split objects one identification variables. single repeat, one column called id character string fold identifier. repeats, id repeat number additional column called id2 contains fold information (within repeat).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"V-Fold Cross-Validation — vfold_cv","text":"one repeat, basic V-fold cross-validation conducted time. example, three repeats used v = 10, total 30 splits: three groups 10 generated separately. strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"V-Fold Cross-Validation — vfold_cv","text":"","code":"vfold_cv(mtcars, v = 10) #> # 10-fold cross-validation #> # A tibble: 10 × 2 #> splits id #> #> 1 Fold01 #> 2 Fold02 #> 3 Fold03 #> 4 Fold04 #> 5 Fold05 #> 6 Fold06 #> 7 Fold07 #> 8 Fold08 #> 9 Fold09 #> 10 Fold10 vfold_cv(mtcars, v = 10, repeats = 2) #> # 10-fold cross-validation repeated 2 times #> # A tibble: 20 × 3 #> splits id id2 #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> 4 Repeat1 Fold04 #> 5 Repeat1 Fold05 #> 6 Repeat1 Fold06 #> 7 Repeat1 Fold07 #> 8 Repeat1 Fold08 #> 9 Repeat1 Fold09 #> 10 Repeat1 Fold10 #> 11 Repeat2 Fold01 #> 12 Repeat2 Fold02 #> 13 Repeat2 Fold03 #> 14 Repeat2 Fold04 #> 15 Repeat2 Fold05 #> 16 Repeat2 Fold06 #> 17 Repeat2 Fold07 #> 18 Repeat2 Fold08 #> 19 Repeat2 Fold09 #> 20 Repeat2 Fold10 library(purrr) data(wa_churn, package = \"modeldata\") set.seed(13) folds1 <- vfold_cv(wa_churn, v = 5) map_dbl( folds1$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2649982 0.2660632 0.2609159 0.2679681 0.2669033 set.seed(13) folds2 <- vfold_cv(wa_churn, strata = churn, v = 5) map_dbl( folds2$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2653532 0.2653532 0.2653532 0.2653532 0.2654365 set.seed(13) folds3 <- vfold_cv(wa_churn, strata = tenure, breaks = 6, v = 5) map_dbl( folds3$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2656250 0.2661104 0.2652228 0.2638396 0.2660518"},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-development-version","dir":"Changelog","previous_headings":"","what":"rsample (development version)","title":"rsample (development version)","text":"new inner_split() function methods various resamples usage tune create inner resample analysis set fit preprocessor model one part post-processor part (#483, #488, #489). Started moving error messages cli (#499, #502). contributions @JamesHWade (#518). Fixed example nested_cv() (@seb09, #520). Removed trailing space printing mc_cv() objects (@ccani007, #464). Improved documentation initial_split() friends (@laurabrianna, #519). Formatting improvement: package names now backticks anymore (@agmurray, #525). Improved documentation formatting: function names now easily identifiable either () end links function documentation (@brshallo , #521).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"bug-fixes-development-version","dir":"Changelog","previous_headings":"","what":"Bug fixes","title":"rsample (development version)","text":"vfold_cv() now utilizes breaks argument correctly repeated cross-validation (@ZWael, #471). Grouped resampling functions now work explicit strata = NULL instead strata either name missing (#485).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"breaking-changes-development-version","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"rsample (development version)","text":"class grouped MC splits now group_mc_split instead grouped_mc_split, aligning grouped splits (#478). rsplit objects apparent() split now correct class inheritance structure. order now apparent_split rsplit rather way around (#477).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-121","dir":"Changelog","previous_headings":"","what":"rsample 1.2.1","title":"rsample 1.2.1","text":"CRAN release: 2024-03-25 nested_cv() longer errors outside long call (#459, #461). validation_set class now pretty() method (#456).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-120","dir":"Changelog","previous_headings":"","what":"rsample 1.2.0","title":"rsample 1.2.0","text":"CRAN release: 2023-08-23 new initial_validation_split(), along variants initial_validation_time_split() group_initial_validation_split(), generates three-way split data training, validation, test sets. new validation_set(), can turned rset object tuning (#403, #446). validation_split(), validation_time_split(), group_validation_split() soft-deprecated favor new functions implementing 3-way split (initial_validation_split(), initial_validation_time_split(), group_initial_validation_split()) (#449). Functions don’t use ellipsis ... now enforce empty dots (#429). make_splits() gained example documentation (@AngelFelizR, #432). training(), testing(), analysis(), assessment() now S3 generics methods rsplit objects. Previously manually required input rsplit object (#384). int_*() functions now S3 generics corresponding methods class bootstraps (#435). underlying mechanics data splitting changed Surv objects maintain class. change affects row names resulting objects; reindexed one instead subset original row names (#443). rsample re-export gather() anymore (#451).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-111","dir":"Changelog","previous_headings":"","what":"rsample 1.1.1","title":"rsample 1.1.1","text":"CRAN release: 2022-12-07 grouped resampling functions (group_vfold_cv(), group_mc_cv(), group_initial_split() group_validation_split(), group_bootstraps()) now support stratification. Strata must constant within group (@mikemahoney218, #317, #360, #363, #364, #365). Added new function, clustering_cv(), blocked cross-validation various predictor spaces. flexible function, taking arguments distance_function cluster_function, allowing used spatial clustering well potentially phylogenetic forms clustering (@mikemahoney218, #351). bootstraps() group_bootstraps() now warn resampling returns empty assessment sets. Previously, bootstraps() silent group_bootstraps() errored (@mikemahoney218, #356, #357). assessment set validation_time_split() now also contains lagged observations (#376). new helper get_rsplit() lets conveniently access rsplit objects inside rset objects (@mikemahoney218, #399). result initial_time_split() now subclass \"initial_time_split\", addition existing classes (#397). dependency ellipsis package removed (#393). Removed overly strict test preparation dplyr 1.1.0 (#380).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-110","dir":"Changelog","previous_headings":"","what":"rsample 1.1.0","title":"rsample 1.1.0","text":"CRAN release: 2022-08-08 rset objects now include parameters used create attributes (#329). Objects returned sliding functions now index attribute, appropriate, containing column name used index (#329). Objects returned permutations() now permutes attribute containing column name used permutation (#329). Added breaks pool attributes functions support stratification (#329). Changed “strata” attribute rset objects now either character vector identifying column used stratify data, present (set NULL) stratification used. (#329) Added new function, reshuffle_rset(), takes rset object generates new version using arguments current random seed. (#79, #329) Added arguments control group_vfold_cv() combines groups. Use balance = \"groups\" assign (roughly) number groups fold, balance = \"observations\" assign (roughly) number observations fold. Added repeats argument group_vfold_cv() (#330). Added new functions grouped resampling: group_mc_cv() (#313), group_initial_split() group_validation_split() (#315), group_bootstraps() (#316). Added new function, reverse_splits(), swap analysis assessment splits (#319, #284). Improved error thrown calling assessment() perm_split object created permutations() (#321, #322).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-100","dir":"Changelog","previous_headings":"","what":"rsample 1.0.0","title":"rsample 1.0.0","text":"CRAN release: 2022-06-24 Fixed nested_cv() handles call objects variables environment can used specifying resampling schemes (#81). Updated testthat 3e (#280) added better checking vfold_cv() (#293). Finally removed gather() method rset objects. Use tidyr::pivot_longer() instead (#280). Changed initial_split() avoid calling tidyselect twice strata (#296). fix stops initial_split() generating messages like: Added better printing methods initial split objects.","code":"Note: Using an external vector in selections is ambiguous. i Use `all_of(strata)` instead of `strata` to silence this message. i See ."},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-011","dir":"Changelog","previous_headings":"","what":"rsample 0.1.1","title":"rsample 0.1.1","text":"CRAN release: 2021-11-08 Updated documentation stratified sampling (#245). Changed make_splits() S3 generic, original functionality method list new method dataframes allows users create split existing analysis & assessment sets (@LiamBlake, #246). Added validation_time_split() single validation sample taking first samples training (@mine-cetinkaya-rundel, #256). Escalated deprecation gather() method rset objects hard deprecation. Use tidyr::pivot_longer() instead (#257). Changed resample “fingerprint” hash indices rather entire resample result (including data object). much faster still ensure resample original data object (#259).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-010","dir":"Changelog","previous_headings":"","what":"rsample 0.1.0","title":"rsample 0.1.0","text":"CRAN release: 2021-05-08 Fixed mc_cv(), initial_split(), validation_split() use prop argument first compute assessment indices, rather analysis indices. minor breaking change situations; previous implementation cause inconsistency sizes generated analysis assessment sets compared prop documented function (#217, @issactoast). Fixed problem creation apparent() (#223) caret2rsample() (#232) resamples. Re-licensed package GPL-2 MIT. See consent copyright holders . Attempts stratify Surv object now error informatively (#230). Exposed pool argument make_strata() user-facing resampling functions (#229). Deprecated gather() method rset objects favor tidyr::pivot_longer() (#233). Fixed bug make_strata() numeric variables NA values (@brian-j-smith, #236).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-009","dir":"Changelog","previous_headings":"","what":"rsample 0.0.9","title":"rsample 0.0.9","text":"CRAN release: 2021-02-17 New rset_reconstruct(), developer tool ease creation new rset subclasses (#210). Added permutations(), function creating permutation resamples performing column-wise shuffling (@mattwarkentin, #198). Fixed issue empty assessment sets couldn’t created make_splits() (#188). rset objects now contain “fingerprint” attribute can used check see object uses resamples. reg_intervals() function convenience function lm(), glm(), survreg(), coxph() models (#206). internal functions exported rsample-adjacent packages can use underlying code. obj_sum() method rsplit objects updated (#215). Changed inheritance structure rsplit objects specific general simplified methods complement() generic (#216).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-008","dir":"Changelog","previous_headings":"","what":"rsample 0.0.8","title":"rsample 0.0.8","text":"CRAN release: 2020-09-23 New manual_rset() constructing rset objects manually custom rsplits (tidymodels/tune#273). Three new time based resampling functions added: sliding_window(), sliding_index(), sliding_period(), flexibility pre-existing rolling_origin(). Correct alpha parameter handling bootstrap CI functions (#179, #184).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-007","dir":"Changelog","previous_headings":"","what":"rsample 0.0.7","title":"rsample 0.0.7","text":"CRAN release: 2020-06-04 Lower threshold pooling strata 10% (15%) (#149). print() methods rsplit val_split objects adjusted show \"\" , respectively. drinks, attrition, two_class_dat data sets removed. modeldata package. Compatability dplyr 1.0.0.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-006","dir":"Changelog","previous_headings":"","what":"rsample 0.0.6","title":"rsample 0.0.6","text":"CRAN release: 2020-03-31 Added validation_set() making single resample. Correct tidy method bootstraps (#115). Changes upcoming `tibble release. Exported constructors rset split objects (#40) initial_time_split() rolling_origin() now lag parameter ensures previous data available lagged variables can calculated. (#135, #136)","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-005","dir":"Changelog","previous_headings":"","what":"rsample 0.0.5","title":"rsample 0.0.5","text":"CRAN release: 2019-07-12 Added three functions compute different bootstrap confidence intervals. new function (add_resample_id()) augments data frame columns resampling identifier. Updated initial_split(), mc_cv(), vfold_cv(), bootstraps(), group_vfold_cv() use tidyselect stratification variable. Updated initial_split(), mc_cv(), vfold_cv(), bootstraps() new breaks parameter specifies number bins stratify numeric stratification variable.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-004","dir":"Changelog","previous_headings":"","what":"rsample 0.0.4","title":"rsample 0.0.4","text":"CRAN release: 2019-01-07 Small maintenance release.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"minor-improvements-and-fixes-0-0-4","dir":"Changelog","previous_headings":"","what":"Minor improvements and fixes","title":"rsample 0.0.4","text":"fill() removed per deprecation warning. Small changes made new version tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-003","dir":"Changelog","previous_headings":"","what":"rsample 0.0.3","title":"rsample 0.0.3","text":"CRAN release: 2018-11-20","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"new-features-0-0-3","dir":"Changelog","previous_headings":"","what":"New features","title":"rsample 0.0.3","text":"Added function initial_time_split() ordered initial sampling appropriate time series data.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"minor-improvements-and-fixes-0-0-3","dir":"Changelog","previous_headings":"","what":"Minor improvements and fixes","title":"rsample 0.0.3","text":"fill() renamed populate() avoid conflict tidyr::fill(). Changed R version requirement R >= 3.1 instead 3.3.3. recipes-related prepper() function moved recipes package. makes rsample install footprint much smaller. rsplit objects shown differently inside tibble. Moved broom package generics package.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-002","dir":"Changelog","previous_headings":"","what":"rsample 0.0.2","title":"rsample 0.0.2","text":"CRAN release: 2017-11-12 initial_split, training, testing added training/testing splits prior resampling. Another resampling method, group_vfold_cv, added. caret2rsample rsample2caret can convert rset objects used caret::trainControl vice-versa. function called form_pred can used determine original names predictors formula terms object. vignette function (prepper) included facilitate using recipes rsample. gather method added rset objects. labels method added rsplit objects. can help identify resample used even whole rset object available. variety dplyr methods added (e.g. filter(), mutate(), etc) work without dropping classes attributes rsample objects.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-001-2017-07-08","dir":"Changelog","previous_headings":"","what":"rsample 0.0.1 (2017-07-08)","title":"rsample 0.0.1 (2017-07-08)","text":"CRAN release: 2017-07-08 Initial public version CRAN","code":""}] +[{"path":[]},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-pledge","dir":"","previous_headings":"","what":"Our Pledge","title":"Contributor Covenant Code of Conduct","text":"members, contributors, leaders pledge make participation community harassment-free experience everyone, regardless age, body size, visible invisible disability, ethnicity, sex characteristics, gender identity expression, level experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, sexual identity orientation. pledge act interact ways contribute open, welcoming, diverse, inclusive, healthy community.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"our-standards","dir":"","previous_headings":"","what":"Our Standards","title":"Contributor Covenant Code of Conduct","text":"Examples behavior contributes positive environment community include: Demonstrating empathy kindness toward people respectful differing opinions, viewpoints, experiences Giving gracefully accepting constructive feedback Accepting responsibility apologizing affected mistakes, learning experience Focusing best just us individuals, overall community Examples unacceptable behavior include: use sexualized language imagery, sexual attention advances kind Trolling, insulting derogatory comments, personal political attacks Public private harassment Publishing others’ private information, physical email address, without explicit permission conduct reasonably considered inappropriate professional setting","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-responsibilities","dir":"","previous_headings":"","what":"Enforcement Responsibilities","title":"Contributor Covenant Code of Conduct","text":"Community leaders responsible clarifying enforcing standards acceptable behavior take appropriate fair corrective action response behavior deem inappropriate, threatening, offensive, harmful. Community leaders right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct, communicate reasons moderation decisions appropriate.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"scope","dir":"","previous_headings":"","what":"Scope","title":"Contributor Covenant Code of Conduct","text":"Code Conduct applies within community spaces, also applies individual officially representing community public spaces. Examples representing community include using official e-mail address, posting via official social media account, acting appointed representative online offline event.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement","dir":"","previous_headings":"","what":"Enforcement","title":"Contributor Covenant Code of Conduct","text":"Instances abusive, harassing, otherwise unacceptable behavior may reported community leaders responsible enforcement codeofconduct@posit.co. complaints reviewed investigated promptly fairly. community leaders obligated respect privacy security reporter incident.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"enforcement-guidelines","dir":"","previous_headings":"","what":"Enforcement Guidelines","title":"Contributor Covenant Code of Conduct","text":"Community leaders follow Community Impact Guidelines determining consequences action deem violation Code Conduct:","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_1-correction","dir":"","previous_headings":"Enforcement Guidelines","what":"1. Correction","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Use inappropriate language behavior deemed unprofessional unwelcome community. Consequence: private, written warning community leaders, providing clarity around nature violation explanation behavior inappropriate. public apology may requested.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_2-warning","dir":"","previous_headings":"Enforcement Guidelines","what":"2. Warning","title":"Contributor Covenant Code of Conduct","text":"Community Impact: violation single incident series actions. Consequence: warning consequences continued behavior. interaction people involved, including unsolicited interaction enforcing Code Conduct, specified period time. includes avoiding interactions community spaces well external channels like social media. Violating terms may lead temporary permanent ban.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_3-temporary-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"3. Temporary Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: serious violation community standards, including sustained inappropriate behavior. Consequence: temporary ban sort interaction public communication community specified period time. public private interaction people involved, including unsolicited interaction enforcing Code Conduct, allowed period. Violating terms may lead permanent ban.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"id_4-permanent-ban","dir":"","previous_headings":"Enforcement Guidelines","what":"4. Permanent Ban","title":"Contributor Covenant Code of Conduct","text":"Community Impact: Demonstrating pattern violation community standards, including sustained inappropriate behavior, harassment individual, aggression toward disparagement classes individuals. Consequence: permanent ban sort public interaction within community.","code":""},{"path":"https://rsample.tidymodels.org/dev/CODE_OF_CONDUCT.html","id":"attribution","dir":"","previous_headings":"","what":"Attribution","title":"Contributor Covenant Code of Conduct","text":"Code Conduct adapted Contributor Covenant, version 2.1, available https://www.contributor-covenant.org/version/2/1/code_of_conduct.html. Community Impact Guidelines inspired [Mozilla’s code conduct enforcement ladder][https://github.com/mozilla/inclusion]. answers common questions code conduct, see FAQ https://www.contributor-covenant.org/faq. Translations available https://www.contributor-covenant.org/translations.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing to tidymodels","title":"Contributing to tidymodels","text":"detailed information contributing tidymodels packages, see development contributing guide.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":"documentation","dir":"","previous_headings":"","what":"Documentation","title":"Contributing to tidymodels","text":"Typos grammatical errors documentation may edited directly using GitHub web interface, long changes made source file. YES ✅: edit roxygen comment .R file R/ directory. 🚫: edit .Rd file man/ directory. use roxygen2, Markdown syntax, documentation.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":"code","dir":"","previous_headings":"","what":"Code","title":"Contributing to tidymodels","text":"submit 🎯 pull request tidymodels package, always file issue confirm tidymodels team agrees idea happy basic proposal. tidymodels packages work together. package contains unit tests, integration tests tests using packages contained extratests. pull requests, recommend create fork repo usethis::create_from_github(), initiate new branch usethis::pr_init(). Look build status making changes. README contains badges continuous integration services used package. New code follow tidyverse style guide. can use styler package apply styles, please don’t restyle code nothing PR. user-facing changes, add bullet top NEWS.md current development version header describing changes made followed GitHub username, links relevant issue(s)/PR(s). use testthat. Contributions test cases included easier accept. contribution spans use one package, consider building extratests changes check breakages /adding new tests . Let us know PR ran extra tests.","code":""},{"path":"https://rsample.tidymodels.org/dev/CONTRIBUTING.html","id":"code-of-conduct","dir":"","previous_headings":"Code","what":"Code of Conduct","title":"Contributing to tidymodels","text":"project released Contributor Code Conduct. contributing project, agree abide terms.","code":""},{"path":"https://rsample.tidymodels.org/dev/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2021 rsample authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"a-nonlinear-regression-example","dir":"Articles > Applications","previous_headings":"","what":"A nonlinear regression example","title":"Bootstrap confidence intervals","text":"demonstrate computations different types intervals, ’ll use nonlinear regression example Baty et al (2015). showed data monitored oxygen uptake patient rest exercise phases (data frame O2K). authors fit segmented regression model transition point known (time exercise commenced). model : broom::tidy() returns analysis object standardized way. column names shown used types objects allows us use results easily. rsample, ’ll rely tidy() method work bootstrap estimates need confidence intervals. ’s example end univariate statistic isn’t automatically formatted tidy(). run model different bootstraps, ’ll write function uses split object input produces tidy data frame: First, let’s create set resamples fit separate models . options apparent = TRUE set. creates final resample copy original (unsampled) data set. required interval methods. Let’s look data see outliers aberrant results: Now let’s create scatterplot matrix: One potential outlier right VO2peak ’ll leave . univariate distributions :","code":"library(tidymodels) library(nlstools) library(GGally) data(O2K) ggplot(O2K, aes(x = t, y = VO2)) + geom_point() nonlin_form <- as.formula( VO2 ~ (t <= 5.883) * VO2rest + (t > 5.883) * (VO2rest + (VO2peak - VO2rest) * (1 - exp(-(t - 5.883) / mu))) ) # Starting values from visual inspection start_vals <- list(VO2rest = 400, VO2peak = 1600, mu = 1) res <- nls(nonlin_form, start = start_vals, data = O2K) tidy(res) ## # A tibble: 3 × 5 ## term estimate std.error statistic p.value ## ## 1 VO2rest 357. 11.4 31.3 4.27e-26 ## 2 VO2peak 1631. 21.5 75.9 1.29e-38 ## 3 mu 1.19 0.0766 15.5 1.08e-16 # Will be used to fit the models to different bootstrap data sets: fit_fun <- function(split, ...) { # We could check for convergence, make new parameters, etc. nls(nonlin_form, data = analysis(split), ...) %>% tidy() } set.seed(462) nlin_bt <- bootstraps(O2K, times = 2000, apparent = TRUE) %>% mutate(models = map(splits, ~ fit_fun(.x, start = start_vals))) nlin_bt ## # Bootstrap sampling with apparent sample ## # A tibble: 2,001 × 3 ## splits id models ## ## 1 Bootstrap0001 ## 2 Bootstrap0002 ## 3 Bootstrap0003 ## 4 Bootstrap0004 ## 5 Bootstrap0005 ## 6 Bootstrap0006 ## 7 Bootstrap0007 ## 8 Bootstrap0008 ## 9 Bootstrap0009 ## 10 Bootstrap0010 ## # ℹ 1,991 more rows nlin_bt$models[[1]] ## # A tibble: 3 × 5 ## term estimate std.error statistic p.value ## ## 1 VO2rest 359. 10.7 33.5 4.59e-27 ## 2 VO2peak 1656. 31.1 53.3 1.39e-33 ## 3 mu 1.23 0.113 10.9 2.01e-12 library(tidyr) nls_coef <- nlin_bt %>% dplyr::select(-splits) %>% # Turn it into a tibble by stacking the `models` col unnest(cols = models) %>% # Get rid of unneeded columns dplyr::select(id, term, estimate) head(nls_coef) ## # A tibble: 6 × 3 ## id term estimate ## ## 1 Bootstrap0001 VO2rest 359. ## 2 Bootstrap0001 VO2peak 1656. ## 3 Bootstrap0001 mu 1.23 ## 4 Bootstrap0002 VO2rest 358. ## 5 Bootstrap0002 VO2peak 1662. ## 6 Bootstrap0002 mu 1.26 nls_coef %>% # Put different parameters in columns tidyr::pivot_wider(names_from = term, values_from = estimate) %>% # Keep only numeric columns dplyr::select(-id) %>% ggscatmat(alpha = .25) nls_coef %>% ggplot(aes(x = estimate)) + geom_histogram(bins = 20, col = \"white\") + facet_wrap(~ term, scales = \"free_x\")"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"percentile-intervals","dir":"Articles > Applications","previous_headings":"A nonlinear regression example","what":"Percentile intervals","title":"Bootstrap confidence intervals","text":"basic type interval uses percentiles resampling distribution. get percentile intervals, rset object passed first argument second argument list column tidy results: overlaid univariate distributions: intervals compare parametric asymptotic values? percentile intervals wider parametric intervals (assume asymptotic normality). estimates appear normally distributed? can look quantile-quantile plots:","code":"p_ints <- int_pctl(nlin_bt, models) p_ints ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1576. 1632. 1694. 0.05 percentile ## 2 VO2rest 344. 357. 370. 0.05 percentile ## 3 mu 1.00 1.18 1.35 0.05 percentile nls_coef %>% ggplot(aes(x = estimate)) + geom_histogram(bins = 20, col = \"white\") + facet_wrap(~ term, scales = \"free_x\") + geom_vline(data = p_ints, aes(xintercept = .lower), col = \"red\") + geom_vline(data = p_ints, aes(xintercept = .upper), col = \"red\") parametric <- tidy(res, conf.int = TRUE) %>% dplyr::select( term, .lower = conf.low, .estimate = estimate, .upper = conf.high ) %>% mutate( .alpha = 0.05, .method = \"parametric\" ) intervals <- bind_rows(parametric, p_ints) %>% arrange(term, .method) intervals %>% split(intervals$term) ## $mu ## # A tibble: 2 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 mu 1.05 1.19 1.34 0.05 parametric ## 2 mu 1.00 1.18 1.35 0.05 percentile ## ## $VO2peak ## # A tibble: 2 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1590. 1631. 1675. 0.05 parametric ## 2 VO2peak 1576. 1632. 1694. 0.05 percentile ## ## $VO2rest ## # A tibble: 2 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2rest 334. 357. 380. 0.05 parametric ## 2 VO2rest 344. 357. 370. 0.05 percentile nls_coef %>% ggplot(aes(sample = estimate)) + stat_qq() + stat_qq_line(alpha = .25) + facet_wrap(~ term, scales = \"free\")"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"t-intervals","dir":"Articles > Applications","previous_headings":"A nonlinear regression example","what":"t-intervals","title":"Bootstrap confidence intervals","text":"Bootstrap t-intervals estimated computing intermediate statistics t-like structure. use , require estimated variance individual resampled estimate. example, comes along fitted model object. can extract standard errors parameters. Luckily, tidy() methods provide column named std.error. arguments intervals :","code":"t_stats <- int_t(nlin_bt, models) intervals <- bind_rows(intervals, t_stats) %>% arrange(term, .method) intervals %>% split(intervals$term) ## $mu ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 mu 1.05 1.19 1.34 0.05 parametric ## 2 mu 1.00 1.18 1.35 0.05 percentile ## 3 mu 1.00 1.18 1.35 0.05 student-t ## ## $VO2peak ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1590. 1631. 1675. 0.05 parametric ## 2 VO2peak 1576. 1632. 1694. 0.05 percentile ## 3 VO2peak 1568. 1632. 1691. 0.05 student-t ## ## $VO2rest ## # A tibble: 3 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2rest 334. 357. 380. 0.05 parametric ## 2 VO2rest 344. 357. 370. 0.05 percentile ## 3 VO2rest 342. 357. 370. 0.05 student-t"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"bias-corrected-and-accelerated-intervals","dir":"Articles > Applications","previous_headings":"A nonlinear regression example","what":"Bias-corrected and accelerated intervals","title":"Bootstrap confidence intervals","text":"bias-corrected accelerated (BCa) intervals, additional argument required. .fn argument function computes statistic interest. first argument rsplit object arguments can passed using ellipses. intervals use internal leave-one-resample compute Jackknife statistic recompute statistic every bootstrap resample. statistic expensive compute, may take time. calculations, use furrr package can computed parallel set parallel processing plan (see ?future::plan). user-facing function takes argument function ellipses.","code":"bias_corr <- int_bca(nlin_bt, models, .fn = fit_fun, start = start_vals) intervals <- bind_rows(intervals, bias_corr) %>% arrange(term, .method) intervals %>% split(intervals$term) ## $mu ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 mu 0.996 1.18 1.34 0.05 BCa ## 2 mu 1.05 1.19 1.34 0.05 parametric ## 3 mu 1.00 1.18 1.35 0.05 percentile ## 4 mu 1.00 1.18 1.35 0.05 student-t ## ## $VO2peak ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2peak 1561. 1632. 1680. 0.05 BCa ## 2 VO2peak 1590. 1631. 1675. 0.05 parametric ## 3 VO2peak 1576. 1632. 1694. 0.05 percentile ## 4 VO2peak 1568. 1632. 1691. 0.05 student-t ## ## $VO2rest ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 VO2rest 343. 357. 368. 0.05 BCa ## 2 VO2rest 334. 357. 380. 0.05 parametric ## 3 VO2rest 344. 357. 370. 0.05 percentile ## 4 VO2rest 342. 357. 370. 0.05 student-t"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"no-existing-tidy-method","dir":"Articles > Applications","previous_headings":"","what":"No existing tidy method","title":"Bootstrap confidence intervals","text":"case, function can emulate minimum results: character column called term, numeric column called estimate, , optionally, numeric column called std.error. last column needed int_t(). Suppose just want estimate fold-increase outcome 90th 10th percentiles course experiment. function might look like: Everything else works :","code":"fold_incr <- function(split, ...) { dat <- analysis(split) quants <- quantile(dat$VO2, probs = c(.1, .9)) tibble( term = \"fold increase\", estimate = unname(quants[2]/quants[1]), # We don't know the analytical formula for this std.error = NA_real_ ) } nlin_bt <- nlin_bt %>% mutate(folds = map(splits, fold_incr)) int_pctl(nlin_bt, folds) ## # A tibble: 1 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 fold increase 4.42 4.76 5.05 0.05 percentile int_bca(nlin_bt, folds, .fn = fold_incr) ## # A tibble: 1 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 fold increase 4.53 4.76 5.36 0.05 BCa"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Intervals.html","id":"intervals-for-linearish-parametric-intervals","dir":"Articles > Applications","previous_headings":"","what":"Intervals for linear(ish) parametric intervals","title":"Bootstrap confidence intervals","text":"rsample also contains reg_intervals() function can used linear regression (via lm()), generalized linear models (glm()), log-linear survival models (survival::survreg() survival::coxph()). function makes easier get intervals models. simple example logistic regression using dementia data modeldata package: Let’s fit model predictors: Let’s use model student-t intervals: can also save resamples plotting: Now can unnest data use ggplot:","code":"data(ad_data, package = \"modeldata\") lr_mod <- glm(Class ~ male + age + Ab_42 + tau, data = ad_data, family = binomial) glance(lr_mod) ## # A tibble: 1 × 8 ## null.deviance df.null logLik AIC BIC deviance df.residual nobs ## ## 1 391. 332 -140. 289. 308. 279. 328 333 tidy(lr_mod) ## # A tibble: 5 × 5 ## term estimate std.error statistic p.value ## ## 1 (Intercept) 129. 112. 1.15 0.250 ## 2 male -0.744 0.307 -2.43 0.0152 ## 3 age -125. 114. -1.10 0.272 ## 4 Ab_42 0.534 0.104 5.14 0.000000282 ## 5 tau -1.78 0.309 -5.77 0.00000000807 set.seed(29832) lr_int <- reg_intervals(Class ~ male + age + Ab_42 + tau, data = ad_data, model_fn = \"glm\", family = binomial) lr_int ## # A tibble: 4 × 6 ## term .lower .estimate .upper .alpha .method ## ## 1 Ab_42 0.316 0.548 0.765 0.05 student-t ## 2 age -332. -133. 85.7 0.05 student-t ## 3 male -1.35 -0.755 -0.133 0.05 student-t ## 4 tau -2.38 -1.83 -1.17 0.05 student-t set.seed(29832) lr_int <- reg_intervals(Class ~ male + age + Ab_42 + tau, data = ad_data, keep_reps = TRUE, model_fn = \"glm\", family = binomial) lr_int ## # A tibble: 4 × 7 ## term .lower .estimate .upper .alpha .method .replicates ## > ## 1 Ab_42 0.316 0.548 0.765 0.05 student-t [1,001 × 2] ## 2 age -332. -133. 85.7 0.05 student-t [1,001 × 2] ## 3 male -1.35 -0.755 -0.133 0.05 student-t [1,001 × 2] ## 4 tau -2.38 -1.83 -1.17 0.05 student-t [1,001 × 2] lr_int %>% select(term, .replicates) %>% unnest(cols = .replicates) %>% ggplot(aes(x = estimate)) + geom_histogram(bins = 30) + facet_wrap(~ term, scales = \"free_x\") + geom_vline(data = lr_int, aes(xintercept = .lower), col = \"red\") + geom_vline(data = lr_int, aes(xintercept = .upper), col = \"red\") + geom_vline(xintercept = 0, col = \"green\")"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Recipes_and_rsample.html","id":"an-example-recipe","dir":"Articles > Applications","previous_headings":"","what":"An Example Recipe","title":"Recipes with rsample","text":"illustration, Ames housing data used. sale prices homes along various descriptors property: Suppose fit simple regression model formula: distribution lot size right-skewed: might benefit model estimate transformation data using Box-Cox procedure. Also, note frequencies neighborhoods can vary: resampled, neighborhoods included test set result column dummy variables zero entries. true House_Style variable. might want collapse rarely occurring values “” categories. define design matrix, initial recipe created: recreates work formula method traditionally uses additional steps. original data object ames used call, used define variables characteristics single recipe valid across resampled versions data. recipe can estimated analysis component resample. execute recipe entire data set: get values data, bake function can used: Note fewer dummy variables Neighborhood House_Style data. Also, code using prep() benefits default argument retain = TRUE, keeps processed version data set don’t reapply steps extract processed values. data used train recipe, used: next section explore recipes bootstrap resampling modeling:","code":"data(ames, package = \"modeldata\") log10(Sale_Price) ~ Neighborhood + House_Style + Year_Sold + Lot_Area library(ggplot2) theme_set(theme_bw()) ggplot(ames, aes(x = Lot_Area)) + geom_histogram(binwidth = 5000, col = \"red\", fill =\"red\", alpha = .5) ggplot(ames, aes(x = Neighborhood)) + geom_bar() + coord_flip() + xlab(\"\") library(recipes) # Apply log10 transformation outside the recipe # https://www.tmwr.org/recipes.html#skip-equals-true ames <- ames %>% mutate(Sale_Price = log10(Sale_Price)) rec <- recipe(Sale_Price ~ Neighborhood + House_Style + Year_Sold + Lot_Area, data = ames) %>% # Collapse rarely occurring jobs into \"other\" step_other(Neighborhood, House_Style, threshold = 0.05) %>% # Dummy variables on the qualitative predictors step_dummy(all_nominal()) %>% # Unskew a predictor step_BoxCox(Lot_Area) %>% # Normalize step_center(all_predictors()) %>% step_scale(all_predictors()) rec rec_training_set <- prep(rec, training = ames) rec_training_set ## ## ── Recipe ──────────────────────────────────────────────────────────────── ## ## ── Inputs ## Number of variables by role ## outcome: 1 ## predictor: 4 ## ## ── Training information ## Training data contained 2930 data points and no incomplete rows. ## ## ── Operations ## • Collapsing factor levels for: Neighborhood and House_Style | Trained ## • Dummy variables from: Neighborhood and House_Style | Trained ## • Box-Cox transformation on: Lot_Area | Trained ## • Centering for: Year_Sold and Lot_Area, ... | Trained ## • Scaling for: Year_Sold and Lot_Area, ... | Trained # By default, the selector `everything()` is used to # return all the variables. Other selectors can be used too. bake(rec_training_set, new_data = head(ames)) ## # A tibble: 6 × 14 ## Year_Sold Lot_Area Sale_Price Neighborhood_College_Creek ## ## 1 1.68 2.70 5.33 -0.317 ## 2 1.68 0.506 5.02 -0.317 ## 3 1.68 0.930 5.24 -0.317 ## 4 1.68 0.423 5.39 -0.317 ## 5 1.68 0.865 5.28 -0.317 ## 6 1.68 0.197 5.29 -0.317 ## # ℹ 10 more variables: Neighborhood_Old_Town , ## # Neighborhood_Edwards , Neighborhood_Somerset , ## # Neighborhood_Northridge_Heights , Neighborhood_Gilbert , ## # Neighborhood_Sawyer , Neighborhood_other , ## # House_Style_One_Story , House_Style_Two_Story , ## # House_Style_other bake(rec_training_set, new_data = NULL) %>% head ## # A tibble: 6 × 14 ## Year_Sold Lot_Area Sale_Price Neighborhood_College_Creek ## ## 1 1.68 2.70 5.33 -0.317 ## 2 1.68 0.506 5.02 -0.317 ## 3 1.68 0.930 5.24 -0.317 ## 4 1.68 0.423 5.39 -0.317 ## 5 1.68 0.865 5.28 -0.317 ## 6 1.68 0.197 5.29 -0.317 ## # ℹ 10 more variables: Neighborhood_Old_Town , ## # Neighborhood_Edwards , Neighborhood_Somerset , ## # Neighborhood_Northridge_Heights , Neighborhood_Gilbert , ## # Neighborhood_Sawyer , Neighborhood_other , ## # House_Style_One_Story , House_Style_Two_Story , ## # House_Style_other library(rsample) set.seed(7712) bt_samples <- bootstraps(ames) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 2 ## splits id ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows bt_samples$splits[[1]] ## ## <2930/1095/2930>"},{"path":"https://rsample.tidymodels.org/dev/articles/Applications/Recipes_and_rsample.html","id":"working-with-resamples","dir":"Articles > Applications","previous_headings":"","what":"Working with Resamples","title":"Recipes with rsample","text":"can add recipe column tibble. recipes convenience function called prepper() can used call prep() split object first argument (easier purrring): Now, fit model, fit function needs recipe input. code implicitly used retain = TRUE option prep(). Otherwise, split objects also needed bake() recipe (prediction function ). get predictions, function needs three arguments: splits (get assessment data), recipe (process ), model. iterate , function purrr::pmap() used: Calculating RMSE:","code":"library(purrr) bt_samples$recipes <- map(bt_samples$splits, prepper, recipe = rec) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 3 ## splits id recipes ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows bt_samples$recipes[[1]] ## ## ── Recipe ──────────────────────────────────────────────────────────────── ## ## ── Inputs ## Number of variables by role ## outcome: 1 ## predictor: 4 ## ## ── Training information ## Training data contained 2930 data points and no incomplete rows. ## ## ── Operations ## • Collapsing factor levels for: Neighborhood and House_Style | Trained ## • Dummy variables from: Neighborhood and House_Style | Trained ## • Box-Cox transformation on: Lot_Area | Trained ## • Centering for: Year_Sold and Lot_Area, ... | Trained ## • Scaling for: Year_Sold and Lot_Area, ... | Trained fit_lm <- function(rec_obj, ...) lm(..., data = bake(rec_obj, new_data = NULL, everything())) bt_samples$lm_mod <- map( bt_samples$recipes, fit_lm, Sale_Price ~ . ) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 4 ## splits id recipes lm_mod ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows pred_lm <- function(split_obj, rec_obj, model_obj, ...) { mod_data <- bake( rec_obj, new_data = assessment(split_obj), all_predictors(), all_outcomes() ) out <- mod_data %>% select(Sale_Price) out$predicted <- predict(model_obj, newdata = mod_data %>% select(-Sale_Price)) out } bt_samples$pred <- pmap( lst( split_obj = bt_samples$splits, rec_obj = bt_samples$recipes, model_obj = bt_samples$lm_mod ), pred_lm ) bt_samples ## # Bootstrap sampling ## # A tibble: 25 × 5 ## splits id recipes lm_mod pred ## ## 1 Bootstrap01 ## 2 Bootstrap02 ## 3 Bootstrap03 ## 4 Bootstrap04 ## 5 Bootstrap05 ## 6 Bootstrap06 ## 7 Bootstrap07 ## 8 Bootstrap08 ## 9 Bootstrap09 ## 10 Bootstrap10 ## # ℹ 15 more rows library(yardstick) results <- map(bt_samples$pred, rmse, Sale_Price, predicted) %>% list_rbind() results ## # A tibble: 25 × 3 ## .metric .estimator .estimate ## ## 1 rmse standard 0.132 ## 2 rmse standard 0.128 ## 3 rmse standard 0.129 ## 4 rmse standard 0.123 ## 5 rmse standard 0.125 ## 6 rmse standard 0.140 ## 7 rmse standard 0.129 ## 8 rmse standard 0.130 ## 9 rmse standard 0.122 ## 10 rmse standard 0.127 ## # ℹ 15 more rows mean(results$.estimate) ## [1] 0.129"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"random-resampling","dir":"Articles","previous_headings":"","what":"Random Resampling","title":"Common Resampling Patterns","text":"far away, common use rsample generate simple random resamples data. rsample package includes number functions specifically purpose.","code":""},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"initial-splits","dir":"Articles","previous_headings":"Random Resampling","what":"Initial Splits","title":"Common Resampling Patterns","text":"split data two sets – often referred “training” “testing” sets – rsample provides initial_split() function: output rsplit object observation assigned one two sets. can control proportion data assigned “training” set prop argument: get actual data assigned either set, use training() testing() functions:","code":"initial_split(ames) #> #> <2197/733/2930> initial_split(ames, prop = 0.8) #> #> <2344/586/2930> resample <- initial_split(ames, prop = 0.6) head(training(resample), 2) #> # A tibble: 2 × 74 #> MS_SubClass MS_Zoning Lot_Frontage Lot_Area Street Alley Lot_Shape #> #> 1 One_Story_1946_a… Resident… 110 14333 Pave No_A… Regular #> 2 One_Story_1946_a… Resident… 65 8450 Pave No_A… Regular #> # ℹ 67 more variables: Land_Contour , Utilities , #> # Lot_Config , Land_Slope , Neighborhood , #> # Condition_1 , Condition_2 , Bldg_Type , #> # House_Style , Overall_Cond , Year_Built , #> # Year_Remod_Add , Roof_Style , Roof_Matl , #> # Exterior_1st , Exterior_2nd , Mas_Vnr_Type , #> # Mas_Vnr_Area , Exter_Cond , Foundation , … head(testing(resample), 2) #> # A tibble: 2 × 74 #> MS_SubClass MS_Zoning Lot_Frontage Lot_Area Street Alley Lot_Shape #> #> 1 One_Story_1946_a… Resident… 141 31770 Pave No_A… Slightly… #> 2 One_Story_1946_a… Resident… 80 11622 Pave No_A… Regular #> # ℹ 67 more variables: Land_Contour , Utilities , #> # Lot_Config , Land_Slope , Neighborhood , #> # Condition_1 , Condition_2 , Bldg_Type , #> # House_Style , Overall_Cond , Year_Built , #> # Year_Remod_Add , Roof_Style , Roof_Matl , #> # Exterior_1st , Exterior_2nd , Mas_Vnr_Type , #> # Mas_Vnr_Area , Exter_Cond , Foundation , …"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"v-fold-cross-validation","dir":"Articles","previous_headings":"Random Resampling","what":"V-Fold Cross-Validation","title":"Common Resampling Patterns","text":"evaluate models test set , ’ve completely finished tuning training models. estimate performance model candidates, typically split training data one part used model fitting one part used measuring performance. distinguish set training test set, refer analysis assessment set, respectively. Typically, split training data analysis assessment sets multiple times get stable estimates model performance. Perhaps common cross-validation method V-fold cross-validation. Also known “k-fold cross-validation”, method creates V resamples splitting data V groups (also known “folds”) roughly equal size. analysis set resample made V-1 folds, remaining fold used assessment set. way, observation data used exactly one assessment set. use V-fold cross-validation rsample, use vfold_cv() function: One downside V-fold cross validation tends produce “noisy”, high-variance, estimates compared resampling methods. try reduce variance, ’s often helpful perform ’s known repeated cross-validation, effectively running V-fold resampling procedure multiple times data. perform repeated V-fold cross-validation rsample, can use repeats argument inside vfold_cv():","code":"vfold_cv(ames, v = 2) #> # 2-fold cross-validation #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2 vfold_cv(ames, v = 2, repeats = 2) #> # 2-fold cross-validation repeated 2 times #> # A tibble: 4 × 3 #> splits id id2 #> #> 1 Repeat1 Fold1 #> 2 Repeat1 Fold2 #> 3 Repeat2 Fold1 #> 4 Repeat2 Fold2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"monte-carlo-cross-validation","dir":"Articles","previous_headings":"Random Resampling","what":"Monte-Carlo Cross-Validation","title":"Common Resampling Patterns","text":"alternative V-fold cross-validation Monte-Carlo cross-validation. V-fold assigns observation data one (exactly one) assessment set, Monte-Carlo cross-validation takes random subset data assessment set, meaning observation can used 0, 1, many assessment sets. analysis set made observations weren’t selected. assessment set sampled independently, can repeat many times want. use Monte-Carlo cross-validation rsample, use mc_cv() function: Similar initial_split(), can control proportion data assigned analysis fold using prop. can also control number resamples create using times argument. Monte-Carlo cross-validation tends produce biased estimates V-fold. , computationally feasible typically recommend using five repeats 10-fold cross-validation model assessment.","code":"mc_cv(ames, prop = 0.8, times = 2) #> # Monte Carlo cross-validation (0.8/0.2) with 2 resamples #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"bootstrap-resampling","dir":"Articles","previous_headings":"Random Resampling","what":"Bootstrap Resampling","title":"Common Resampling Patterns","text":"last primary technique rsample creating resamples training data bootstrap resampling. “bootstrap sample” sample data set, size data set, taken replacement single observation might sampled multiple times. assessment set made observations weren’t selected analysis set. Generally, bootstrap resampling produces pessimistic estimates model accuracy. can create bootstrap resamples rsample using bootstraps() function. can’t control proportion data set – assessment set bootstrap resample always size training data – function otherwise works exactly like mc_cv():","code":"bootstraps(ames, times = 2) #> # Bootstrap sampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"validation-set","dir":"Articles","previous_headings":"Random Resampling","what":"Validation Set","title":"Common Resampling Patterns","text":"data vast enough reliable performance estimate just one assessment set, can three-way split data training, validation test set right start. (validation set role single assessment set.) Instead using initial_split() create binary split, can use initial_validation_split() create three-way split: prop argument two elements, specifying proportion data assigned training validation set. create rset object tuning, validation_set() bundles together training validation set, read use tune) package.","code":"three_way_split <- initial_validation_split(ames, prop = c(0.6, 0.2)) three_way_split #> #> <1758/586/586/2930> validation_set(three_way_split) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"stratified-resampling","dir":"Articles","previous_headings":"","what":"Stratified Resampling","title":"Common Resampling Patterns","text":"data heavily imbalanced (, distribution important continuous variable skewed, classes categorical variable much common others), simple random resampling may accidentally skew data even allocating “rare” observations disproportionately analysis assessment fold. situations, can useful instead use stratified resampling ensure analysis assessment folds similar distribution overall data. functions discussed far support stratified resampling strata argument. argument takes single column identifier uses stratify resampling procedure: default, rsample cut continuous variables four bins, ensure bin proportionally represented set. desired, behavior can changed using breaks argument:","code":"vfold_cv(ames, v = 2, strata = Sale_Price) #> # 2-fold cross-validation using stratification #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2 vfold_cv(ames, v = 2, strata = Sale_Price, breaks = 100) #> # 2-fold cross-validation using stratification #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"grouped-resampling","dir":"Articles","previous_headings":"","what":"Grouped Resampling","title":"Common Resampling Patterns","text":"Often, observations data “related” probable random chance, instance represent repeated measurements subject collected single location. situations, often want assign related observations either analysis assessment fold group, avoid assessment data ’s closely related data used fit model. functions discussed far “grouped resampling” variation handle situations. functions start group_ prefix, use argument group specify column used group observations. respecting groups, functions work like ungrouped variants: ’s important note , functions like group_mc_cv() still let specify proportion data analysis set (group_bootstraps() still attempts create analysis sets size original data), rsample won’t “split” groups order exactly meet proportion. functions start assigning one group random set (, group_vfold_cv(), fold) assign remaining groups, random order, whichever set brings relative sizes set closest target proportion. means resamples randomized, can safely use repeated cross-validation just ungrouped resampling, also means can wind differently sized analysis assessment sets anticipated groups unbalanced: grouped resampling functions always focused balancing proportion data analysis set, default group_vfold_cv() attempt balance number groups assigned fold. instead ’d like balance number observations fold (meaning assessment sets similar sizes, smaller groups likely assigned folds happen random chance), can use argument balance = \"observations\": ’re working spatial data, observations often related neighbors rest data set; Tobler’s first law geography puts , “everything related everything else, near things related distant things.” However, often won’t pre-defined “location” variable can use group related observations. spatialsample package provides functions spatial cross-validation using rsample syntax classes, often useful situations.","code":"resample <- group_initial_split(Orange, group = Tree) unique(training(resample)$Tree) #> [1] 1 2 3 4 #> Levels: 3 < 1 < 5 < 2 < 4 unique(testing(resample)$Tree) #> [1] 5 #> Levels: 3 < 1 < 5 < 2 < 4 set.seed(1) group_bootstraps(ames, Neighborhood, times = 2) #> # Group bootstrap sampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 group_vfold_cv(ames, Neighborhood, balance = \"observations\", v = 2) #> # Group 2-fold cross-validation #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2"},{"path":"https://rsample.tidymodels.org/dev/articles/Common_Patterns.html","id":"time-based-resampling","dir":"Articles","previous_headings":"","what":"Time-Based Resampling","title":"Common Resampling Patterns","text":"working time-based data, usually doesn’t make sense randomly resample data: random resampling likely result analysis set observations later assessment set, isn’t realistic way assess model performance. , rsample provides different functions make sure data assessment sets analysis set. First , two variants initial_split() initial_validation_split(), initial_time_split() initial_validation_time_split(), assign first rows data training set (number rows assigned determined prop): also several functions rsample help construct multiple analysis assessment sets time-based data. instance, sliding_window() create “windows” data, moving rows data frame: want create sliding windows data based specific variable, can use sliding_index() function: want set size windows based units time, instance window contain year data, can use sliding_period(): functions produce analysis sets size, start end analysis set “sliding” data frame. ’d rather analysis set get progressively larger, ’re predicting new data based upon growing set older observations, can use sliding_window() function lookback = -Inf: commonly referred “evaluation rolling forecasting origin”, colloquially, “rolling origin cross-validation”. Note time-based resampling functions deterministic: unlike rest package, running functions repeatedly different random seeds always return results.","code":"initial_time_split(Chicago) #> #> <4273/1425/5698> initial_validation_time_split(Chicago) #> #> <3418/1140/1140/5698> sliding_window(Chicago) %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice0001 #> 2 Slice0002 sliding_index(Chicago, date) %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice0001 #> 2 Slice0002 sliding_period(Chicago, date, \"year\") %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 sliding_window(Chicago, lookback = Inf) %>% head(2) #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice0001 #> 2 Slice0002"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Working with resampling sets","text":"rsample package can used create objects containing resamples original data. vignette contains demonstration objects can used data analysis. Let’s use attrition data set. documentation: data IBM Watson Analytics Lab. website describes data “Uncover factors lead employee attrition explore important questions ‘show breakdown distance home job role attrition’ ‘compare average monthly income education attrition’. fictional data set created IBM data scientists.” 1470 rows. data can accessed using","code":"library(rsample) data(\"attrition\", package = \"modeldata\") names(attrition) #> [1] \"Age\" \"Attrition\" \"BusinessTravel\" #> [4] \"DailyRate\" \"Department\" \"DistanceFromHome\" #> [7] \"Education\" \"EducationField\" \"EnvironmentSatisfaction\" #> [10] \"Gender\" \"HourlyRate\" \"JobInvolvement\" #> [13] \"JobLevel\" \"JobRole\" \"JobSatisfaction\" #> [16] \"MaritalStatus\" \"MonthlyIncome\" \"MonthlyRate\" #> [19] \"NumCompaniesWorked\" \"OverTime\" \"PercentSalaryHike\" #> [22] \"PerformanceRating\" \"RelationshipSatisfaction\" \"StockOptionLevel\" #> [25] \"TotalWorkingYears\" \"TrainingTimesLastYear\" \"WorkLifeBalance\" #> [28] \"YearsAtCompany\" \"YearsInCurrentRole\" \"YearsSinceLastPromotion\" #> [31] \"YearsWithCurrManager\" table(attrition$Attrition) #> #> No Yes #> 1233 237"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"model-assessment","dir":"Articles","previous_headings":"","what":"Model Assessment","title":"Working with resampling sets","text":"Let’s fit logistic regression model data model terms job satisfaction, gender, monthly income. fitting model entire data set, might model attrition using convenience, ’ll create formula object used later: evaluate model, use 10 repeats 10-fold cross-validation use 100 holdout samples evaluate overall accuracy model. First, let’s make splits data: Now let’s write function , resample: obtain analysis data set (.e. 90% used modeling) fit logistic regression model predict assessment data (10% used model) using broom package determine sample predicted correctly. function: example: model, .fitted value linear predictor log-odds units. compute data set 100 resamples, ’ll use map() function purrr package: Now can compute accuracy values assessment data sets: Keep mind baseline accuracy beat rate non-attrition, 0.839. great model far.","code":"glm(Attrition ~ JobSatisfaction + Gender + MonthlyIncome, data = attrition, family = binomial) mod_form <- as.formula(Attrition ~ JobSatisfaction + Gender + MonthlyIncome) library(rsample) set.seed(4622) rs_obj <- vfold_cv(attrition, v = 10, repeats = 10) rs_obj #> # 10-fold cross-validation repeated 10 times #> # A tibble: 100 × 3 #> splits id id2 #> #> 1 Repeat01 Fold01 #> 2 Repeat01 Fold02 #> 3 Repeat01 Fold03 #> 4 Repeat01 Fold04 #> 5 Repeat01 Fold05 #> 6 Repeat01 Fold06 #> 7 Repeat01 Fold07 #> 8 Repeat01 Fold08 #> 9 Repeat01 Fold09 #> 10 Repeat01 Fold10 #> # ℹ 90 more rows ## splits will be the `rsplit` object with the 90/10 partition holdout_results <- function(splits, ...) { # Fit the model to the 90% mod <- glm(..., data = analysis(splits), family = binomial) # Save the 10% holdout <- assessment(splits) # `augment` will save the predictions with the holdout data set res <- broom::augment(mod, newdata = holdout) # Class predictions on the assessment set from class probs lvls <- levels(holdout$Attrition) predictions <- factor(ifelse(res$.fitted > 0, lvls[2], lvls[1]), levels = lvls) # Calculate whether the prediction was correct res$correct <- predictions == holdout$Attrition # Return the assessment data set with the additional columns res } example <- holdout_results(rs_obj$splits[[1]], mod_form) dim(example) #> [1] 147 34 dim(assessment(rs_obj$splits[[1]])) #> [1] 147 31 ## newly added columns: example[1:10, setdiff(names(example), names(attrition))] #> # A tibble: 10 × 3 #> .rownames .fitted correct #> #> 1 11 -1.20 TRUE #> 2 24 -1.78 TRUE #> 3 30 -1.45 TRUE #> 4 39 -1.60 TRUE #> 5 53 -1.54 TRUE #> 6 72 -1.93 TRUE #> 7 73 -3.06 TRUE #> 8 80 -3.28 TRUE #> 9 83 -2.23 TRUE #> 10 90 -1.28 FALSE library(purrr) rs_obj$results <- map(rs_obj$splits, holdout_results, mod_form) rs_obj #> # 10-fold cross-validation repeated 10 times #> # A tibble: 100 × 4 #> splits id id2 results #> #> 1 Repeat01 Fold01 #> 2 Repeat01 Fold02 #> 3 Repeat01 Fold03 #> 4 Repeat01 Fold04 #> 5 Repeat01 Fold05 #> 6 Repeat01 Fold06 #> 7 Repeat01 Fold07 #> 8 Repeat01 Fold08 #> 9 Repeat01 Fold09 #> 10 Repeat01 Fold10 #> # ℹ 90 more rows rs_obj$accuracy <- map_dbl(rs_obj$results, function(x) mean(x$correct)) summary(rs_obj$accuracy) #> Min. 1st Qu. Median Mean 3rd Qu. Max. #> 0.776 0.821 0.840 0.839 0.859 0.905"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"using-the-bootstrap-to-make-comparisons","dir":"Articles","previous_headings":"","what":"Using the Bootstrap to Make Comparisons","title":"Working with resampling sets","text":"Traditionally, bootstrap primarily used empirically determine sampling distribution test statistic. Given set samples replacement, statistic can calculated analysis set results can used make inferences (confidence intervals). example, differences median monthly income genders? wanted compare genders, conduct t-test rank-based test. Instead, let’s use bootstrap see difference median incomes two groups. need simple function compute statistic resample: Now create large number bootstrap samples (say 2000+). illustration, ’ll 500 document. function computed across resample: bootstrap distribution statistic slightly bimodal skewed distribution: variation considerable statistic. One method computing confidence interval take percentiles bootstrap distribution. 95% confidence interval difference means : calculated 95% confidence interval contains zero, don’t evidence difference median income genders confidence level 95%.","code":"ggplot(attrition, aes(x = Gender, y = MonthlyIncome)) + geom_boxplot() + scale_y_log10() median_diff <- function(splits) { x <- analysis(splits) median(x$MonthlyIncome[x$Gender == \"Female\"]) - median(x$MonthlyIncome[x$Gender == \"Male\"]) } set.seed(353) bt_resamples <- bootstraps(attrition, times = 500) bt_resamples$wage_diff <- map_dbl(bt_resamples$splits, median_diff) ggplot(bt_resamples, aes(x = wage_diff)) + geom_line(stat = \"density\", adjust = 1.25) + xlab(\"Difference in Median Monthly Income (Female - Male)\") quantile(bt_resamples$wage_diff, probs = c(0.025, 0.975)) #> 2.5% 97.5% #> -189 615"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"bootstrap-estimates-of-model-coefficients","dir":"Articles","previous_headings":"","what":"Bootstrap Estimates of Model Coefficients","title":"Working with resampling sets","text":"Unless already column resample object contains fitted model, function can used fit model save model coefficients. broom package package tidy() function save coefficients data frame. Instead returning data frame row model term, save data frame single row columns model term. , purrr::map() can used estimate save values split.","code":"glm_coefs <- function(splits, ...) { ## use `analysis` or `as.data.frame` to get the analysis data mod <- glm(..., data = analysis(splits), family = binomial) as.data.frame(t(coef(mod))) } bt_resamples$betas <- map(.x = bt_resamples$splits, .f = glm_coefs, mod_form) bt_resamples #> # Bootstrap sampling #> # A tibble: 500 × 4 #> splits id wage_diff betas #> #> 1 Bootstrap001 136 #> 2 Bootstrap002 282. #> 3 Bootstrap003 470 #> 4 Bootstrap004 -213 #> 5 Bootstrap005 453 #> 6 Bootstrap006 684 #> 7 Bootstrap007 60 #> 8 Bootstrap008 286 #> 9 Bootstrap009 -30 #> 10 Bootstrap010 410 #> # ℹ 490 more rows bt_resamples$betas[[1]] #> (Intercept) JobSatisfaction.L JobSatisfaction.Q JobSatisfaction.C GenderMale #> 1 -0.939 -0.501 -0.272 0.0842 0.0989 #> MonthlyIncome #> 1 -0.000129"},{"path":"https://rsample.tidymodels.org/dev/articles/Working_with_rsets.html","id":"keeping-tidy","dir":"Articles","previous_headings":"","what":"Keeping Tidy","title":"Working with resampling sets","text":"previously mentioned, broom package contains class called tidy created representations objects can easily used analysis, plotting, etc. rsample contains tidy methods rset rsplit objects. example: ","code":"first_resample <- bt_resamples$splits[[1]] class(first_resample) #> [1] \"boot_split\" \"rsplit\" tidy(first_resample) #> # A tibble: 1,470 × 2 #> Row Data #> #> 1 2 Analysis #> 2 3 Analysis #> 3 4 Analysis #> 4 7 Analysis #> 5 9 Analysis #> 6 10 Analysis #> 7 11 Analysis #> 8 13 Analysis #> 9 18 Analysis #> 10 19 Analysis #> # ℹ 1,460 more rows class(bt_resamples) #> [1] \"bootstraps\" \"rset\" \"tbl_df\" \"tbl\" \"data.frame\" tidy(bt_resamples) #> # A tibble: 735,000 × 3 #> Row Data Resample #> #> 1 1 Analysis Bootstrap002 #> 2 1 Analysis Bootstrap004 #> 3 1 Analysis Bootstrap007 #> 4 1 Analysis Bootstrap008 #> 5 1 Analysis Bootstrap009 #> 6 1 Analysis Bootstrap010 #> 7 1 Analysis Bootstrap011 #> 8 1 Analysis Bootstrap013 #> 9 1 Analysis Bootstrap015 #> 10 1 Analysis Bootstrap016 #> # ℹ 734,990 more rows"},{"path":"https://rsample.tidymodels.org/dev/articles/rsample.html","id":"terminology","dir":"Articles","previous_headings":"","what":"Terminology","title":"Introduction to rsample","text":"define resample result two-way split data set. example, bootstrapping, one part resample sample replacement original data. part split contains instances contained bootstrap sample. Cross-validation another type resampling.","code":""},{"path":"https://rsample.tidymodels.org/dev/articles/rsample.html","id":"rset-objects-contain-many-resamples","dir":"Articles","previous_headings":"","what":"rset Objects Contain Many Resamples","title":"Introduction to rsample","text":"main class package (rset) set collection resamples. 10-fold cross-validation, set consist 10 different resamples original data. Like modelr, resamples stored data-frame-like tibble object. simple example, small set bootstraps mtcars data:","code":"library(rsample) set.seed(8584) bt_resamples <- bootstraps(mtcars, times = 3) bt_resamples #> # Bootstrap sampling #> # A tibble: 3 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Bootstrap3"},{"path":"https://rsample.tidymodels.org/dev/articles/rsample.html","id":"individual-resamples-are-rsplit-objects","dir":"Articles","previous_headings":"","what":"Individual Resamples are rsplit Objects","title":"Introduction to rsample","text":"resamples stored splits column object class rsplit. package use following terminology two partitions comprise resample: analysis data selected resample. bootstrap, sample replacement. 10-fold cross-validation, 90% data. data often used fit model calculate statistic traditional bootstrapping. assessment data usually section original data covered analysis set. , 10-fold CV, 10% held . data often used evaluate performance model fit analysis data. (Aside: might use term “training” “testing” data sets, avoid since labels often conflict data result initial partition data typically done resampling. training/test split can conducted using initial_split() function package.) Let’s look one rsplit objects indicates 32 data points analysis set, 14 instances assessment set, original data contained 32 data points. results can also determined using dim function rsplit object. obtain either data sets rsplit, .data.frame() function can used. default, analysis set returned data option can used return assessment data: Alternatively, can use shortcuts analysis(first_resample) assessment(first_resample).","code":"first_resample <- bt_resamples$splits[[1]] first_resample #> #> <32/14/32> head(as.data.frame(first_resample)) #> mpg cyl disp hp drat wt qsec vs am gear carb #> Fiat 128...1 32.4 4 78.7 66 4.08 2.20 19.5 1 1 4 1 #> Toyota Corolla...2 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1 #> Toyota Corolla...3 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1 #> AMC Javelin...4 15.2 8 304.0 150 3.15 3.44 17.3 0 0 3 2 #> Valiant...5 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1 #> Merc 450SLC...6 15.2 8 275.8 180 3.07 3.78 18.0 0 0 3 3 as.data.frame(first_resample, data = \"assessment\") #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 #> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 #> Merc 240D 24.4 4 146.7 62 3.69 3.19 20.0 1 0 4 2 #> Merc 230 22.8 4 140.8 95 3.92 3.15 22.9 1 0 4 2 #> Merc 280 19.2 6 167.6 123 3.92 3.44 18.3 1 0 4 4 #> Merc 280C 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4 #> Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3 #> Merc 450SL 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3 #> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.25 18.0 0 0 3 4 #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.34 17.4 0 0 3 4 #> Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.94 18.9 1 1 4 1 #> Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2 #> Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2"},{"path":"https://rsample.tidymodels.org/dev/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Hannah Frick. Author, maintainer. Fanny Chow. Author. Max Kuhn. Author. Michael Mahoney. Author. Julia Silge. Author. Hadley Wickham. Author. . Copyright holder, funder.","code":""},{"path":"https://rsample.tidymodels.org/dev/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Frick H, Chow F, Kuhn M, Mahoney M, Silge J, Wickham H (2024). rsample: General Resampling Infrastructure. R package version 1.2.1.9000, https://github.com/tidymodels/rsample, https://rsample.tidymodels.org.","code":"@Manual{, title = {rsample: General Resampling Infrastructure}, author = {Hannah Frick and Fanny Chow and Max Kuhn and Michael Mahoney and Julia Silge and Hadley Wickham}, year = {2024}, note = {R package version 1.2.1.9000, https://github.com/tidymodels/rsample}, url = {https://rsample.tidymodels.org}, }"},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/index.html","id":"overview","dir":"","previous_headings":"","what":"Overview","title":"General Resampling Infrastructure","text":"rsample package provides functions create different types resamples corresponding classes analysis. goal modular set methods can used : resampling estimating sampling distribution statistic estimating model performance using holdout set scope rsample provide basic building blocks creating analyzing resamples data set, package include code modeling calculating statistics. Working Resample Sets vignette gives demonstration rsample tools can used building models. Note resampled data sets created rsample directly accessible resampling object contain much overhead memory. Since original data modified, R make automatic copy. example, creating 50 bootstraps data set create object 50-fold larger memory: Created 2022-02-28 reprex package (v2.0.1) memory usage 50 bootstrap samples less 3-fold original data set.","code":"library(rsample) library(mlbench) data(LetterRecognition) lobstr::obj_size(LetterRecognition) #> 2,644,640 B set.seed(35222) boots <- bootstraps(LetterRecognition, times = 50) lobstr::obj_size(boots) #> 6,686,776 B # Object size per resample lobstr::obj_size(boots)/nrow(boots) #> 133,735.5 B # Fold increase is <<< 50 as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition)) #> [1] 2.528426"},{"path":"https://rsample.tidymodels.org/dev/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"General Resampling Infrastructure","text":"install , use: development version GitHub :","code":"install.packages(\"rsample\") # install.packages(\"pak\") pak::pak(\"rsample\")"},{"path":"https://rsample.tidymodels.org/dev/index.html","id":"contributing","dir":"","previous_headings":"","what":"Contributing","title":"General Resampling Infrastructure","text":"project released Contributor Code Conduct. contributing project, agree abide terms. questions discussions tidymodels packages, modeling, machine learning, please post Posit Community. think encountered bug, please submit issue. Either way, learn create share reprex (minimal, reproducible example), clearly communicate code. Check details contributing guidelines tidymodels packages get help.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":null,"dir":"Reference","previous_headings":"","what":"Augment a data set with resampling identifiers — add_resample_id","title":"Augment a data set with resampling identifiers — add_resample_id","text":"data set, add_resample_id() add least one new column identifies resample data came . cases, single column added resampling methods, two added.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Augment a data set with resampling identifiers — add_resample_id","text":"","code":"add_resample_id(.data, split, dots = FALSE)"},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Augment a data set with resampling identifiers — add_resample_id","text":".data data frame. split single rset object. dots single logical: id columns prefixed \".\" avoid name conflicts .data?","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Augment a data set with resampling identifiers — add_resample_id","text":"updated data frame.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/add_resample_id.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Augment a data set with resampling identifiers — add_resample_id","text":"","code":"library(dplyr) #> #> Attaching package: ‘dplyr’ #> The following objects are masked from ‘package:stats’: #> #> filter, lag #> The following objects are masked from ‘package:base’: #> #> intersect, setdiff, setequal, union set.seed(363) car_folds <- vfold_cv(mtcars, repeats = 3) analysis(car_folds$splits[[1]]) %>% add_resample_id(car_folds$splits[[1]]) %>% head() #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 #> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 #> id id2 #> Mazda RX4 Repeat1 Fold01 #> Mazda RX4 Wag Repeat1 Fold01 #> Datsun 710 Repeat1 Fold01 #> Hornet 4 Drive Repeat1 Fold01 #> Hornet Sportabout Repeat1 Fold01 #> Valiant Repeat1 Fold01 car_bt <- bootstraps(mtcars) analysis(car_bt$splits[[1]]) %>% add_resample_id(car_bt$splits[[1]]) %>% head() #> mpg cyl disp hp drat wt qsec vs am gear carb #> Toyota Corona...1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 #> Mazda RX4...2 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> Chrysler Imperial...3 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> Volvo 142E...4 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> Chrysler Imperial...5 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 #> Volvo 142E...6 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 #> id #> Toyota Corona...1 Bootstrap01 #> Mazda RX4...2 Bootstrap01 #> Chrysler Imperial...3 Bootstrap01 #> Volvo 142E...4 Bootstrap01 #> Chrysler Imperial...5 Bootstrap01 #> Volvo 142E...6 Bootstrap01"},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":null,"dir":"Reference","previous_headings":"","what":"Sampling for the Apparent Error Rate — apparent","title":"Sampling for the Apparent Error Rate — apparent","text":"building model data set re-predicting data, performance estimate predictions often called \"apparent\" performance model. estimate can wildly optimistic. \"Apparent sampling\" means analysis assessment samples . resamples sometimes used analysis bootstrap samples otherwise avoided like old sushi.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sampling for the Apparent Error Rate — apparent","text":"","code":"apparent(data, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sampling for the Apparent Error Rate — apparent","text":"data data frame. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Sampling for the Apparent Error Rate — apparent","text":"tibble single row classes apparent, rset, tbl_df, tbl, data.frame. results include column data split objects one column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/apparent.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Sampling for the Apparent Error Rate — apparent","text":"","code":"apparent(mtcars) #> # Apparent sampling #> # A tibble: 1 × 2 #> splits id #> #> 1 Apparent"},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert an rsplit object to a data frame — as.data.frame.rsplit","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"analysis assessment code can returned data frame (dictated data argument) using .data.frame.rsplit(). analysis() assessment() shortcuts.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"","code":"# S3 method for class 'rsplit' as.data.frame(x, row.names = NULL, optional = FALSE, data = \"analysis\", ...) analysis(x, ...) # Default S3 method analysis(x, ...) # S3 method for class 'rsplit' analysis(x, ...) assessment(x, ...) # Default S3 method assessment(x, ...) # S3 method for class 'rsplit' assessment(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"x rsplit object. row.names NULL character vector giving row names data frame. Missing values allowed. optional logical: column names data checked legality? data Either \"analysis\" \"assessment\" specify data returned. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/as.data.frame.rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Convert an rsplit object to a data frame — as.data.frame.rsplit","text":"","code":"library(dplyr) set.seed(104) folds <- vfold_cv(mtcars) model_data_1 <- folds$splits[[1]] %>% analysis() holdout_data_1 <- folds$splits[[1]] %>% assessment()"},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":null,"dir":"Reference","previous_headings":"","what":"Bootstrap Sampling — bootstraps","title":"Bootstrap Sampling — bootstraps","text":"bootstrap sample sample size original data set made using replacement. results analysis samples multiple replicates original rows data. assessment set defined rows original data included bootstrap sample. often referred \"--bag\" (OOB) sample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Bootstrap Sampling — bootstraps","text":"","code":"bootstraps( data, times = 25, strata = NULL, breaks = 4, pool = 0.1, apparent = FALSE, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Bootstrap Sampling — bootstraps","text":"data data frame. times number bootstrap samples. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. apparent logical. extra resample added analysis holdout subset entire data set. required estimators used summary() function require apparent error rate. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Bootstrap Sampling — bootstraps","text":"tibble classes bootstraps, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Bootstrap Sampling — bootstraps","text":"argument apparent enables option additional \"resample\" analysis assessment data sets original data set. can required types analysis bootstrap results. strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/bootstraps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Bootstrap Sampling — bootstraps","text":"","code":"bootstraps(mtcars, times = 2) #> # Bootstrap sampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 bootstraps(mtcars, times = 2, apparent = TRUE) #> # Bootstrap sampling with apparent sample #> # A tibble: 3 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Apparent library(purrr) library(modeldata) data(wa_churn) set.seed(13) resample1 <- bootstraps(wa_churn, times = 3) map_dbl( resample1$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2798523 0.2639500 0.2648019 set.seed(13) resample2 <- bootstraps(wa_churn, strata = churn, times = 3) map_dbl( resample2$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2653699 0.2653699 0.2653699 set.seed(13) resample3 <- bootstraps(wa_churn, strata = tenure, breaks = 6, times = 3) map_dbl( resample3$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2625302 0.2659378 0.2696294"},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Cluster Cross-Validation — clustering_cv","title":"Cluster Cross-Validation — clustering_cv","text":"Cluster cross-validation splits data V groups disjointed sets using k-means clustering variables. resample analysis data consists V-1 folds/clusters assessment set contains final fold/cluster. basic cross-validation (.e. repeats), number resamples equal V.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cluster Cross-Validation — clustering_cv","text":"","code":"clustering_cv( data, vars, v = 10, repeats = 1, distance_function = \"dist\", cluster_function = c(\"kmeans\", \"hclust\"), ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cluster Cross-Validation — clustering_cv","text":"data data frame. vars vector bare variable names use cluster data. v number partitions data set. repeats number times repeat clustered partitioning. distance_function function used distance calculations? Defaults stats::dist(). can also provide function; see Details. cluster_function function used clustering? Options either \"kmeans\" (use stats::kmeans()) \"hclust\" (use stats::hclust()). can also provide function; see Details. ... Extra arguments passed cluster_function.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cluster Cross-Validation — clustering_cv","text":"tibble classes rset, tbl_df, tbl, data.frame. results include column data split objects identification variable id.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Cluster Cross-Validation — clustering_cv","text":"variables vars argument used k-means clustering data disjointed sets hierarchical clustering data. clusters used folds cross-validation. Depending data distributed, may equal number points fold. can optionally provide custom function distance_function. function take data frame (created via data[vars]) return stats::dist() object distances data points. can optionally provide custom function cluster_function. function must take three arguments: dists, stats::dist() object distances data points v, length-1 numeric number folds create ..., pass additional named arguments function function return vector cluster assignments length nrow(data), element vector corresponding matching row data frame.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/clustering_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Cluster Cross-Validation — clustering_cv","text":"","code":"data(ames, package = \"modeldata\") clustering_cv(ames, vars = c(Sale_Price, First_Flr_SF, Second_Flr_SF), v = 2) #> # 2-cluster cross-validation #> # A tibble: 2 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2"},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":null,"dir":"Reference","previous_headings":"","what":"Determine the Assessment Samples — complement","title":"Determine the Assessment Samples — complement","text":"method function help find data belong analysis assessment sets.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Determine the Assessment Samples — complement","text":"","code":"complement(x, ...) # S3 method for class 'rsplit' complement(x, ...) # S3 method for class 'rof_split' complement(x, ...) # S3 method for class 'sliding_window_split' complement(x, ...) # S3 method for class 'sliding_index_split' complement(x, ...) # S3 method for class 'sliding_period_split' complement(x, ...) # S3 method for class 'apparent_split' complement(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Determine the Assessment Samples — complement","text":"x rsplit object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Determine the Assessment Samples — complement","text":"integer vector.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Determine the Assessment Samples — complement","text":"Given rsplit object, complement() determine data rows contained assessment set. save space, many rsplit objects contain indices assessment split.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/complement.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Determine the Assessment Samples — complement","text":"","code":"set.seed(28432) fold_rs <- vfold_cv(mtcars) head(fold_rs$splits[[1]]$in_id) #> [1] 2 3 4 5 6 7 fold_rs$splits[[1]]$out_id #> [1] NA complement(fold_rs$splits[[1]]) #> [1] 1 9 25 27"},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the split arguments from an rset — .get_split_args","title":"Get the split arguments from an rset — .get_split_args","text":"Get split arguments rset","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the split arguments from an rset — .get_split_args","text":"","code":".get_split_args(x, allow_strata_false = FALSE)"},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the split arguments from an rset — .get_split_args","text":"x rset initial_split object. allow_strata_false logical specify value use stratification specified. default use strata = NULL, alternative strata = FALSE.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/dot-get_split_args.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the split arguments from an rset — .get_split_args","text":"list arguments used create rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":null,"dir":"Reference","previous_headings":"","what":"Extract Predictor Names from Formula or Terms — form_pred","title":"Extract Predictor Names from Formula or Terms — form_pred","text":".vars() returns variables used formula, function returns variables explicitly used right-hand side (.e., resolve dots unless object terms data set specified).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"","code":"form_pred(object, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"object model formula stats::terms() object. ... Arguments pass .vars()","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"character vector names","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/form_pred.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extract Predictor Names from Formula or Terms — form_pred","text":"","code":"form_pred(y ~ x + z) #> [1] \"x\" \"z\" form_pred(terms(y ~ x + z)) #> [1] \"x\" \"z\" form_pred(y ~ x + log(z)) #> [1] \"x\" \"z\" form_pred(log(y) ~ x + z) #> [1] \"x\" \"z\" form_pred(y1 + y2 ~ x + z) #> [1] \"x\" \"z\" form_pred(log(y1) + y2 ~ x + z) #> [1] \"x\" \"z\" # will fail: # form_pred(y ~ .) form_pred(terms(mpg ~ (.)^2, data = mtcars)) #> [1] \"cyl\" \"disp\" \"hp\" \"drat\" \"wt\" \"qsec\" \"vs\" \"am\" \"gear\" \"carb\" form_pred(terms(~ (.)^2, data = mtcars)) #> [1] \"mpg\" \"cyl\" \"disp\" \"hp\" \"drat\" \"wt\" \"qsec\" \"vs\" \"am\" \"gear\" #> [11] \"carb\""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":null,"dir":"Reference","previous_headings":"","what":"Obtain a identifier for the resamples — .get_fingerprint","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"function returns hash (NA) attribute created rset initially constructed. can used compare resampling objects see .","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"","code":".get_fingerprint(x, ...) # Default S3 method .get_fingerprint(x, ...) # S3 method for class 'rset' .get_fingerprint(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"x rset tune_results object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"character value NA_character_ object created prior rsample version 0.1.0.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_fingerprint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Obtain a identifier for the resamples — .get_fingerprint","text":"","code":"set.seed(1) .get_fingerprint(vfold_cv(mtcars)) #> [1] \"10edc17b4467d256910fb9dc53c3599a\" set.seed(1) .get_fingerprint(vfold_cv(mtcars)) #> [1] \"10edc17b4467d256910fb9dc53c3599a\" set.seed(2) .get_fingerprint(vfold_cv(mtcars)) #> [1] \"9070fd5cd338c4757f525de2e2a7beaa\" set.seed(1) .get_fingerprint(vfold_cv(mtcars, repeats = 2)) #> [1] \"e2457324f2637e7f0f593755d1592d03\""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve individual rsplits objects from an rset — get_rsplit","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"Retrieve individual rsplits objects rset","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"","code":"get_rsplit(x, index, ...) # S3 method for class 'rset' get_rsplit(x, index, ...) # Default S3 method get_rsplit(x, index, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"x rset object retrieve rsplit . index integer indicating rsplit retrieve: 1 rsplit first row rset, 2 second, . ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"rsplit object row index rset","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/get_rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve individual rsplits objects from an rset — get_rsplit","text":"","code":"set.seed(123) (starting_splits <- group_vfold_cv(mtcars, cyl, v = 3)) #> # Group 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 get_rsplit(starting_splits, 1) #> #> <21/11/32>"},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":null,"dir":"Reference","previous_headings":"","what":"Group Bootstraps — group_bootstraps","title":"Group Bootstraps — group_bootstraps","text":"Group bootstrapping creates splits data based grouping variable (may single row associated ). common use kind resampling repeated measures subject. bootstrap sample sample size original data set made using replacement. results analysis samples multiple replicates original rows data. assessment set defined rows original data included bootstrap sample. often referred \"--bag\" (OOB) sample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Group Bootstraps — group_bootstraps","text":"","code":"group_bootstraps( data, group, times = 25, apparent = FALSE, ..., strata = NULL, pool = 0.1 )"},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Group Bootstraps — group_bootstraps","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. times number bootstrap samples. apparent logical. extra resample added analysis holdout subset entire data set. required estimators used summary() function require apparent error rate. ... dots future extensions must empty. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Group Bootstraps — group_bootstraps","text":"tibble classes group_bootstraps bootstraps, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Group Bootstraps — group_bootstraps","text":"argument apparent enables option additional \"resample\" analysis assessment data sets original data set. can required types analysis bootstrap results.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_bootstraps.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Group Bootstraps — group_bootstraps","text":"","code":"data(ames, package = \"modeldata\") set.seed(13) group_bootstraps(ames, Neighborhood, times = 3) #> # Group bootstrap sampling #> # A tibble: 3 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Bootstrap3 group_bootstraps(ames, Neighborhood, times = 3, apparent = TRUE) #> # Group bootstrap sampling with apparent sample #> # A tibble: 4 × 2 #> splits id #> #> 1 Bootstrap1 #> 2 Bootstrap2 #> 3 Bootstrap3 #> 4 Apparent"},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Group Monte Carlo Cross-Validation — group_mc_cv","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"Group Monte Carlo cross-validation creates splits data based grouping variable (may single row associated ). One resample Monte Carlo cross-validation takes random sample (without replacement) groups original data set used analysis. data points added assessment set. common use kind resampling repeated measures subject.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"","code":"group_mc_cv( data, group, prop = 3/4, times = 25, ..., strata = NULL, pool = 0.1 )"},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. prop proportion data retained modeling/analysis. times number times repeat sampling. ... dots future extensions must empty. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"tibble classes group_mc_cv, rset, tbl_df, tbl, data.frame. results include column data split objects identification variable.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_mc_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Group Monte Carlo Cross-Validation — group_mc_cv","text":"","code":"data(ames, package = \"modeldata\") set.seed(123) group_mc_cv(ames, group = Neighborhood, times = 5) #> # Group Monte Carlo cross-validation (0.75/0.25) with 5 resamples #> # A tibble: 5 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5"},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Group V-Fold Cross-Validation — group_vfold_cv","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"Group V-fold cross-validation creates splits data based grouping variable (may single row associated ). function can create many splits unique values grouping variable can create smaller set splits one group left time. common use kind resampling repeated measures subject.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"","code":"group_vfold_cv( data, group = NULL, v = NULL, repeats = 1, balance = c(\"groups\", \"observations\"), ..., strata = NULL, pool = 0.1 )"},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. v number partitions data set. left NULL (default), v set number unique values grouping variable, creating \"leave-one-group-\" splits. repeats number times repeat V-fold partitioning. balance v less number unique groups, groups combined folds? one \"groups\", assign roughly number groups fold, \"observations\", assign roughly number observations fold. ... dots future extensions must empty. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"tibble classes group_vfold_cv, rset, tbl_df, tbl, data.frame. results include column data split objects identification variable.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/group_vfold_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Group V-Fold Cross-Validation — group_vfold_cv","text":"","code":"data(ames, package = \"modeldata\") set.seed(123) group_vfold_cv(ames, group = Neighborhood, v = 5) #> # Group 5-fold cross-validation #> # A tibble: 5 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5 group_vfold_cv( ames, group = Neighborhood, v = 5, balance = \"observations\" ) #> # Group 5-fold cross-validation #> # A tibble: 5 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5 group_vfold_cv(ames, group = Neighborhood, v = 5, repeats = 2) #> # Group 5-fold cross-validation #> # A tibble: 10 × 3 #> splits id id2 #> #> 1 Repeat1 Resample1 #> 2 Repeat1 Resample2 #> 3 Repeat1 Resample3 #> 4 Repeat1 Resample4 #> 5 Repeat1 Resample5 #> 6 Repeat2 Resample1 #> 7 Repeat2 Resample2 #> 8 Repeat2 Resample3 #> 9 Repeat2 Resample4 #> 10 Repeat2 Resample5 # Leave-one-group-out CV group_vfold_cv(ames, group = Neighborhood) #> # Group 28-fold cross-validation #> # A tibble: 28 × 2 #> splits id #> #> 1 Resample01 #> 2 Resample02 #> 3 Resample03 #> 4 Resample04 #> 5 Resample05 #> 6 Resample06 #> 7 Resample07 #> 8 Resample08 #> 9 Resample09 #> 10 Resample10 #> # ℹ 18 more rows library(dplyr) data(Sacramento, package = \"modeldata\") city_strata <- Sacramento %>% group_by(city) %>% summarize(strata = mean(price)) %>% summarize(city = city, strata = cut(strata, quantile(strata), include.lowest = TRUE)) #> Warning: Returning more (or less) than 1 row per `summarise()` group was #> deprecated in dplyr 1.1.0. #> ℹ Please use `reframe()` instead. #> ℹ When switching from `summarise()` to `reframe()`, remember that #> `reframe()` always returns an ungrouped data frame and adjust #> accordingly. sacramento_data <- Sacramento %>% full_join(city_strata, by = \"city\") group_vfold_cv(sacramento_data, city, strata = strata) #> Warning: Leaving `v = NULL` while using stratification will set `v` to the number of groups present in the least common stratum. #> ℹ Set `v` explicitly to override this warning. #> # Group 14-fold cross-validation #> # A tibble: 14 × 2 #> splits id #> #> 1 Resample01 #> 2 Resample02 #> 3 Resample03 #> 4 Resample04 #> 5 Resample05 #> 6 Resample06 #> 7 Resample07 #> 8 Resample08 #> 9 Resample09 #> 10 Resample10 #> 11 Resample11 #> 12 Resample12 #> 13 Resample13 #> 14 Resample14"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Simple Training/Test Set Splitting — initial_split","title":"Simple Training/Test Set Splitting — initial_split","text":"initial_split() creates single binary split data training set testing set. initial_time_split() , takes first prop samples training, instead random selection. group_initial_split() creates splits data based grouping variable, data \"group\" assigned split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Simple Training/Test Set Splitting — initial_split","text":"","code":"initial_split(data, prop = 3/4, strata = NULL, breaks = 4, pool = 0.1, ...) initial_time_split(data, prop = 3/4, lag = 0, ...) training(x, ...) # Default S3 method training(x, ...) # S3 method for class 'rsplit' training(x, ...) testing(x, ...) # Default S3 method testing(x, ...) # S3 method for class 'rsplit' testing(x, ...) group_initial_split(data, group, prop = 3/4, ..., strata = NULL, pool = 0.1)"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Simple Training/Test Set Splitting — initial_split","text":"data data frame. prop proportion data retained modeling/analysis. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty. lag value include lag assessment analysis set. useful lagged predictors used training testing. x rsplit object produced initial_split() initial_time_split(). group variable data (single character name) used grouping observations value either analysis assessment set within fold.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Simple Training/Test Set Splitting — initial_split","text":"rsplit object can used training() testing() functions extract data split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Simple Training/Test Set Splitting — initial_split","text":"training() testing() used extract resulting data. strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_split.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Simple Training/Test Set Splitting — initial_split","text":"","code":"set.seed(1353) car_split <- initial_split(mtcars) train_data <- training(car_split) test_data <- testing(car_split) data(drinks, package = \"modeldata\") drinks_split <- initial_time_split(drinks) train_data <- training(drinks_split) test_data <- testing(drinks_split) c(max(train_data$date), min(test_data$date)) # no lag #> [1] \"2011-03-01\" \"2011-04-01\" # With 12 period lag drinks_lag_split <- initial_time_split(drinks, lag = 12) train_data <- training(drinks_lag_split) test_data <- testing(drinks_lag_split) c(max(train_data$date), min(test_data$date)) # 12 period lag #> [1] \"2011-03-01\" \"2010-04-01\" set.seed(1353) car_split <- group_initial_split(mtcars, cyl) train_data <- training(car_split) test_data <- testing(car_split)"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Create an Initial Train/Validation/Test Split — initial_validation_split","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"initial_validation_split() creates random three-way split data training set, validation set, testing set. initial_validation_time_split() , instead random selection training, validation, testing set order full data set, first observations put training set. group_initial_validation_split() creates similar random splits data based grouping variable, data \"group\" assigned partition.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"","code":"initial_validation_split( data, prop = c(0.6, 0.2), strata = NULL, breaks = 4, pool = 0.1, ... ) initial_validation_time_split(data, prop = c(0.6, 0.2), ...) group_initial_validation_split( data, group, prop = c(0.6, 0.2), ..., strata = NULL, pool = 0.1 ) # S3 method for class 'initial_validation_split' training(x, ...) # S3 method for class 'initial_validation_split' testing(x, ...) validation(x, ...) # Default S3 method validation(x, ...) # S3 method for class 'initial_validation_split' validation(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"data data frame. prop length-2 vector proportions data retained training validation data, respectively. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty. group variable data (single character name) used grouping observations value either analysis assessment set within fold. x object class initial_validation_split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"initial_validation_split object can used training(), validation(), testing() functions extract data split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"training(), validation(), testing() can used extract resulting data sets. Use validation_set() create rset object use functions tune package tune::tune_grid(). strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/initial_validation_split.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create an Initial Train/Validation/Test Split — initial_validation_split","text":"","code":"set.seed(1353) car_split <- initial_validation_split(mtcars) train_data <- training(car_split) validation_data <- validation(car_split) test_data <- testing(car_split) data(drinks, package = \"modeldata\") drinks_split <- initial_validation_time_split(drinks) train_data <- training(drinks_split) validation_data <- validation(drinks_split) c(max(train_data$date), min(validation_data$date)) #> [1] \"2007-05-01\" \"2007-06-01\" data(ames, package = \"modeldata\") set.seed(1353) ames_split <- group_initial_validation_split(ames, group = Neighborhood) train_data <- training(ames_split) validation_data <- validation(ames_split) test_data <- testing(ames_split)"},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Inner split of the analysis set for fitting a post-processor — inner_split","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"Inner split analysis set fitting post-processor","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"","code":"inner_split(x, ...) # S3 method for class 'mc_split' inner_split(x, split_args, ...) # S3 method for class 'group_mc_split' inner_split(x, split_args, ...) # S3 method for class 'vfold_split' inner_split(x, split_args, ...) # S3 method for class 'group_vfold_split' inner_split(x, split_args, ...) # S3 method for class 'boot_split' inner_split(x, split_args, ...) # S3 method for class 'group_boot_split' inner_split(x, split_args, ...) # S3 method for class 'val_split' inner_split(x, split_args, ...) # S3 method for class 'group_val_split' inner_split(x, split_args, ...) # S3 method for class 'time_val_split' inner_split(x, split_args, ...) # S3 method for class 'clustering_split' inner_split(x, split_args, ...) # S3 method for class 'apparent_split' inner_split(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"x rsplit object. ... currently used. split_args list arguments used inner split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"rsplit object.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/inner_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Inner split of the analysis set for fitting a post-processor — inner_split","text":"rsplit objects live commonly inside rset object. split_args argument can output .get_split_args() corresponding rset object, even arguments used create rset object needed inner split. mc_split group_mc_split objects, inner_split() ignore split_args$times. vfold_split group_vfold_split objects, ignore split_args$times split_args$repeats. split_args$v used set split_args$prop 1 - 1/v prop already set otherwise ignored. method group_vfold_split always use split_args$balance = NULL. boot_split group_boot_split objects, ignore split_args$times. val_split, group_val_split, time_val_split objects, interpret length-2 split_args$prop ratio training validation sets split inner analysis inner assessment set ratio. split_args$prop single value, used proportion inner analysis set. clustering_split objects, ignore split_args$repeats.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":null,"dir":"Reference","previous_headings":"","what":"Bootstrap confidence intervals — int_pctl","title":"Bootstrap confidence intervals — int_pctl","text":"Calculate bootstrap confidence intervals using various methods.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Bootstrap confidence intervals — int_pctl","text":"","code":"int_pctl(.data, ...) # S3 method for class 'bootstraps' int_pctl(.data, statistics, alpha = 0.05, ...) int_t(.data, ...) # S3 method for class 'bootstraps' int_t(.data, statistics, alpha = 0.05, ...) int_bca(.data, ...) # S3 method for class 'bootstraps' int_bca(.data, statistics, alpha = 0.05, .fn, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Bootstrap confidence intervals — int_pctl","text":".data data frame containing bootstrap resamples created using bootstraps(). t- BCa-intervals, apparent argument set TRUE. Even apparent argument set TRUE percentile method, apparent data never used calculating percentile confidence interval. ... Arguments pass .fn (int_bca() ). statistics unquoted column name dplyr selector identifies single column data set containing individual bootstrap estimates. must list column tidy tibbles (columns term estimate). t-intervals, standard tidy column (usually called std.err) required. See examples . alpha Level significance. .fn function calculate statistic interest. function take rsplit first argument ... required.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Bootstrap confidence intervals — int_pctl","text":"function returns tibble columns .lower, .estimate, .upper, .alpha, .method, term. .method type interval (eg. \"percentile\", \"student-t\", \"BCa\"). term name estimate. Note .estimate returned int_pctl() mean estimates bootstrap resamples estimate apparent model.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Bootstrap confidence intervals — int_pctl","text":"Percentile intervals standard method obtaining confidence intervals require thousands resamples accurate. T-intervals may need fewer resamples require corresponding variance estimate. Bias-corrected accelerated intervals require original function used create statistics interest computationally taxing.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"Bootstrap confidence intervals — int_pctl","text":"https://rsample.tidymodels.org/articles/Applications/Intervals.html Davison, ., & Hinkley, D. (1997). Bootstrap Methods Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/int_pctl.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Bootstrap confidence intervals — int_pctl","text":"","code":"# \\donttest{ library(broom) library(dplyr) library(purrr) library(tibble) lm_est <- function(split, ...) { lm(mpg ~ disp + hp, data = analysis(split)) %>% tidy() } set.seed(52156) car_rs <- bootstraps(mtcars, 500, apparent = TRUE) %>% mutate(results = map(splits, lm_est)) int_pctl(car_rs, results) #> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`. #> # A tibble: 3 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 (Intercept) 27.5 30.7 33.6 0.05 percentile #> 2 disp -0.0440 -0.0300 -0.0162 0.05 percentile #> 3 hp -0.0572 -0.0260 -0.00840 0.05 percentile int_t(car_rs, results) #> # A tibble: 3 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 (Intercept) 28.1 30.7 34.6 0.05 student-t #> 2 disp -0.0446 -0.0300 -0.0170 0.05 student-t #> 3 hp -0.0449 -0.0260 -0.00337 0.05 student-t int_bca(car_rs, results, .fn = lm_est) #> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`. #> # A tibble: 3 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 (Intercept) 27.7 30.7 33.7 0.05 BCa #> 2 disp -0.0446 -0.0300 -0.0172 0.05 BCa #> 3 hp -0.0576 -0.0260 -0.00843 0.05 BCa # putting results into a tidy format rank_corr <- function(split) { dat <- analysis(split) tibble( term = \"corr\", estimate = cor(dat$sqft, dat$price, method = \"spearman\"), # don't know the analytical std.err so no t-intervals std.err = NA_real_ ) } set.seed(69325) data(Sacramento, package = \"modeldata\") bootstraps(Sacramento, 1000, apparent = TRUE) %>% mutate(correlations = map(splits, rank_corr)) %>% int_pctl(correlations) #> # A tibble: 1 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 corr 0.737 0.768 0.796 0.05 percentile # }"},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":null,"dir":"Reference","previous_headings":"","what":"Find Labels from rset Object — labels.rset","title":"Find Labels from rset Object — labels.rset","text":"Produce vector resampling labels (e.g. \"Fold1\") rset object. Currently, nested_cv() supported.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find Labels from rset Object — labels.rset","text":"","code":"# S3 method for class 'rset' labels(object, make_factor = FALSE, ...) # S3 method for class 'vfold_cv' labels(object, make_factor = FALSE, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find Labels from rset Object — labels.rset","text":"object rset object. make_factor logical whether results character factor. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find Labels from rset Object — labels.rset","text":"single character factor vector.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rset.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find Labels from rset Object — labels.rset","text":"","code":"labels(vfold_cv(mtcars)) #> [1] \"Fold01\" \"Fold02\" \"Fold03\" \"Fold04\" \"Fold05\" \"Fold06\" \"Fold07\" #> [8] \"Fold08\" \"Fold09\" \"Fold10\""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Find Labels from rsplit Object — labels.rsplit","title":"Find Labels from rsplit Object — labels.rsplit","text":"Produce tibble identification variables single splits can linked particular resample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Find Labels from rsplit Object — labels.rsplit","text":"","code":"# S3 method for class 'rsplit' labels(object, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Find Labels from rsplit Object — labels.rsplit","text":"object rsplit object ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Find Labels from rsplit Object — labels.rsplit","text":"tibble.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/labels.rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Find Labels from rsplit Object — labels.rsplit","text":"","code":"cv_splits <- vfold_cv(mtcars) labels(cv_splits$splits[[1]]) #> # A tibble: 1 × 1 #> id #> #> 1 Fold01"},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Leave-One-Out Cross-Validation — loo_cv","title":"Leave-One-Out Cross-Validation — loo_cv","text":"Leave-one-(LOO) cross-validation uses one data point original set assessment data data points analysis set. LOO resampling set many resamples rows original data set.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Leave-One-Out Cross-Validation — loo_cv","text":"","code":"loo_cv(data, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Leave-One-Out Cross-Validation — loo_cv","text":"data data frame. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Leave-One-Out Cross-Validation — loo_cv","text":"tibble classes loo_cv, rset, tbl_df, tbl, data.frame. results include column data split objects one column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/loo_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Leave-One-Out Cross-Validation — loo_cv","text":"","code":"loo_cv(mtcars) #> # Leave-one-out cross-validation #> # A tibble: 32 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 #> 4 Resample4 #> 5 Resample5 #> 6 Resample6 #> 7 Resample7 #> 8 Resample8 #> 9 Resample9 #> 10 Resample10 #> # ℹ 22 more rows"},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":null,"dir":"Reference","previous_headings":"","what":"Make groupings for grouped rsplits — make_groups","title":"Make groupings for grouped rsplits — make_groups","text":"function powers grouped resampling splitting data based upon grouping variable returning assessment set indices split.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Make groupings for grouped rsplits — make_groups","text":"","code":"make_groups( data, group, v, balance = c(\"groups\", \"observations\", \"prop\"), strata = NULL, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Make groupings for grouped rsplits — make_groups","text":"data data frame. group variable data (single character name) used grouping observations value either analysis assessment set within fold. v number partitions data set. balance v less number unique groups, groups combined folds? one \"groups\", \"observations\", \"prop\". strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. ... Arguments passed balance functions.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_groups.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Make groupings for grouped rsplits — make_groups","text":"balance options accepted – make sense – resampling functions. instance, balance = \"prop\" assigns groups folds random, meaning given observation guaranteed one (one) assessment set. means balance = \"prop\" used group_vfold_cv(), option available function. Similarly, group_mc_cv() derivatives assign data one (one) assessment set, rather allow observation assessment set zero--times. result, functions balance argument, hood always specify balance = \"prop\" call make_groups().","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":null,"dir":"Reference","previous_headings":"","what":"Constructors for split objects — make_splits","title":"Constructors for split objects — make_splits","text":"Constructors split objects","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Constructors for split objects — make_splits","text":"","code":"make_splits(x, ...) # Default S3 method make_splits(x, ...) # S3 method for class 'list' make_splits(x, data, class = NULL, ...) # S3 method for class 'data.frame' make_splits(x, assessment, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Constructors for split objects — make_splits","text":"x list integers names \"analysis\" \"assessment\", data frame analysis training data. ... currently used. data data frame. class optional class give object. assessment data frame assessment testing data, can empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_splits.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Constructors for split objects — make_splits","text":"","code":"df <- data.frame( year = 1900:1999, value = 10 + 8*1900:1999 + runif(100L, 0, 100) ) split_from_indices <- make_splits( x = list(analysis = which(df$year <= 1980), assessment = which(df$year > 1980)), data = df ) split_from_data_frame <- make_splits( x = df[df$year <= 1980,], assessment = df[df$year > 1980,] ) identical(split_from_indices, split_from_data_frame) #> [1] TRUE"},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":null,"dir":"Reference","previous_headings":"","what":"Create or Modify Stratification Variables — make_strata","title":"Create or Modify Stratification Variables — make_strata","text":"function can create strata numeric data make non-numeric data conducive stratification.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create or Modify Stratification Variables — make_strata","text":"","code":"make_strata(x, breaks = 4, nunique = 5, pool = 0.1, depth = 20)"},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create or Modify Stratification Variables — make_strata","text":"x input vector. breaks single number giving number bins desired stratify numeric stratification variable. nunique integer number unique value threshold algorithm. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. depth integer used determine best number percentiles used. number bins based min(5, floor(n / depth)) n = length(x). x numeric, must least 40 rows data set (depth = 20) conduct stratified sampling.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create or Modify Stratification Variables — make_strata","text":"factor vector.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create or Modify Stratification Variables — make_strata","text":"numeric data, number unique levels less nunique, data treated categorical data. categorical inputs, function find levels x occur data percentage less pool. values groups randomly assigned remaining strata (data points missing values x). numeric data unique values nunique, data converted categorical based percentiles data. percentile groups 20 percent data group. , missing values x randomly assigned groups.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/make_strata.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create or Modify Stratification Variables — make_strata","text":"","code":"set.seed(61) x1 <- rpois(100, lambda = 5) table(x1) #> x1 #> 1 2 3 4 5 6 7 8 9 10 11 #> 3 16 8 19 14 18 11 4 5 1 1 table(make_strata(x1)) #> #> [1,3] (3,5] (5,6] (6,11] #> 27 33 18 22 set.seed(554) x2 <- rpois(100, lambda = 1) table(x2) #> x2 #> 0 1 2 3 4 #> 36 34 19 6 5 table(make_strata(x2)) #> #> 0 1 2 #> 38 40 22 # small groups are randomly assigned x3 <- factor(x2) table(x3) #> x3 #> 0 1 2 3 4 #> 36 34 19 6 5 table(make_strata(x3)) #> #> 0 1 2 #> 41 35 24 x4 <- rep(LETTERS[1:7], c(37, 26, 3, 7, 11, 10, 2)) table(x4) #> x4 #> A B C D E F G #> 37 26 3 7 11 10 2 table(make_strata(x4)) #> #> A B E F #> 40 27 14 15 table(make_strata(x4, pool = 0.1)) #> #> A B E F #> 38 29 12 17 table(make_strata(x4, pool = 0.0)) #> Warning: Stratifying groups that make up 0% of the data may be statistically risky. #> • Consider increasing `pool` to at least 0.1 #> #> A B C D E F G #> 37 26 3 7 11 10 2 # not enough data to stratify x5 <- rnorm(20) table(make_strata(x5)) #> Warning: The number of observations in each quantile is below the recommended threshold of 20. #> • Stratification will use 1 breaks instead. #> Warning: Too little data to stratify. #> • Resampling will be unstratified. #> #> strata1 #> 20 set.seed(483) x6 <- rnorm(200) quantile(x6, probs = (0:10) / 10) #> 0% 10% 20% 30% 40% 50% #> -2.9114060 -1.4508635 -0.9513821 -0.6257852 -0.3286468 -0.0364388 #> 60% 70% 80% 90% 100% #> 0.2027140 0.4278573 0.7050643 1.2471852 2.6792505 table(make_strata(x6, breaks = 10)) #> #> [-2.91,-1.45] (-1.45,-0.951] (-0.951,-0.626] (-0.626,-0.329] #> 20 20 20 20 #> (-0.329,-0.0364] (-0.0364,0.203] (0.203,0.428] (0.428,0.705] #> 20 20 20 20 #> (0.705,1.25] (1.25,2.68] #> 20 20"},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":null,"dir":"Reference","previous_headings":"","what":"Manual resampling — manual_rset","title":"Manual resampling — manual_rset","text":"manual_rset() used constructing minimal rset possible. can useful custom rsplit objects built make_splits(), want create new rset splits contained within existing rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Manual resampling — manual_rset","text":"","code":"manual_rset(splits, ids)"},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Manual resampling — manual_rset","text":"splits list \"rsplit\" objects. easiest create using make_splits(). ids character vector ids. length ids must length splits.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/manual_rset.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Manual resampling — manual_rset","text":"","code":"df <- data.frame(x = c(1, 2, 3, 4, 5, 6)) # Create an rset from custom indices indices <- list( list(analysis = c(1L, 2L), assessment = 3L), list(analysis = c(4L, 5L), assessment = 6L) ) splits <- lapply(indices, make_splits, data = df) manual_rset(splits, c(\"Split 1\", \"Split 2\")) #> # Manual resampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Split 1 #> 2 Split 2 # You can also use this to create an rset from a subset of an # existing rset resamples <- vfold_cv(mtcars) best_split <- resamples[5, ] manual_rset(best_split$splits, best_split$id) #> # Manual resampling #> # A tibble: 1 × 2 #> splits id #> #> 1 Fold05"},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Monte Carlo Cross-Validation — mc_cv","title":"Monte Carlo Cross-Validation — mc_cv","text":"One resample Monte Carlo cross-validation takes random sample (without replacement) original data set used analysis. data points added assessment set.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Monte Carlo Cross-Validation — mc_cv","text":"","code":"mc_cv(data, prop = 3/4, times = 25, strata = NULL, breaks = 4, pool = 0.1, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Monte Carlo Cross-Validation — mc_cv","text":"data data frame. prop proportion data retained modeling/analysis. times number times repeat sampling. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Monte Carlo Cross-Validation — mc_cv","text":"tibble classes mc_cv, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Monte Carlo Cross-Validation — mc_cv","text":"strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/mc_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Monte Carlo Cross-Validation — mc_cv","text":"","code":"mc_cv(mtcars, times = 2) #> # Monte Carlo cross-validation (0.75/0.25) with 2 resamples #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 mc_cv(mtcars, prop = .5, times = 2) #> # Monte Carlo cross-validation (0.5/0.5) with 2 resamples #> # A tibble: 2 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 library(purrr) data(wa_churn, package = \"modeldata\") set.seed(13) resample1 <- mc_cv(wa_churn, times = 3, prop = .5) map_dbl( resample1$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2709458 0.2621414 0.2632775 set.seed(13) resample2 <- mc_cv(wa_churn, strata = churn, times = 3, prop = .5) map_dbl( resample2$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2652655 0.2652655 0.2652655 set.seed(13) resample3 <- mc_cv(wa_churn, strata = tenure, breaks = 6, times = 3, prop = .5) map_dbl( resample3$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2636364 0.2599432 0.2576705"},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Nested or Double Resampling — nested_cv","title":"Nested or Double Resampling — nested_cv","text":"nested_cv() can used take results one resampling procedure conduct resamples within split. type resampling used rsample can used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Nested or Double Resampling — nested_cv","text":"","code":"nested_cv(data, outside, inside)"},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Nested or Double Resampling — nested_cv","text":"data data frame. outside initial resampling specification. can already created object expression new object (see examples ). latter used, data argument need specified , given, ignored. inside expression type resampling conducted within initial procedure.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Nested or Double Resampling — nested_cv","text":"tibble nested_cv class classes outer resampling process normally contains. results include column outer data split objects, one id columns, column nested tibbles called inner_resamples additional resamples.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Nested or Double Resampling — nested_cv","text":"bad idea use bootstrapping outer resampling procedure (see example )","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/nested_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Nested or Double Resampling — nested_cv","text":"","code":"## Using expressions for the resampling procedures: nested_cv(mtcars, outside = vfold_cv(v = 3), inside = bootstraps(times = 5)) #> # Nested resampling: #> # outer: 3-fold cross-validation #> # inner: Bootstrap sampling #> # A tibble: 3 × 3 #> splits id inner_resamples #> #> 1 Fold1 #> 2 Fold2 #> 3 Fold3 ## Using an existing object: folds <- vfold_cv(mtcars) nested_cv(mtcars, folds, inside = bootstraps(times = 5)) #> # Nested resampling: #> # outer: `folds` #> # inner: Bootstrap sampling #> # A tibble: 10 × 3 #> splits id inner_resamples #> #> 1 Fold01 #> 2 Fold02 #> 3 Fold03 #> 4 Fold04 #> 5 Fold05 #> 6 Fold06 #> 7 Fold07 #> 8 Fold08 #> 9 Fold09 #> 10 Fold10 ## The dangers of outer bootstraps: set.seed(2222) bad_idea <- nested_cv(mtcars, outside = bootstraps(times = 5), inside = vfold_cv(v = 3) ) #> Warning: Using bootstrapping as the outer resample is dangerous since the inner resample might have the same data point in both the analysis and assessment set. first_outer_split <- get_rsplit(bad_idea, 1) outer_analysis <- analysis(first_outer_split) sum(grepl(\"Camaro Z28\", rownames(outer_analysis))) #> [1] 3 ## For the 3-fold CV used inside of each bootstrap, how are the replicated ## `Camaro Z28` data partitioned? first_inner_split <- get_rsplit(bad_idea$inner_resamples[[1]], 1) inner_analysis <- analysis(first_inner_split) inner_assess <- assessment(first_inner_split) sum(grepl(\"Camaro Z28\", rownames(inner_analysis))) #> [1] 1 sum(grepl(\"Camaro Z28\", rownames(inner_assess))) #> [1] 2"},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":null,"dir":"Reference","previous_headings":"","what":"Constructor for new rset objects — new_rset","title":"Constructor for new rset objects — new_rset","text":"Constructor new rset objects","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Constructor for new rset objects — new_rset","text":"","code":"new_rset(splits, ids, attrib = NULL, subclass = character())"},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Constructor for new rset objects — new_rset","text":"splits list column rsplits tibble single column called \"splits\" list column rsplits. ids character vector tibble one columns begin \"id\". attrib optional named list attributes add object. subclass character vector subclasses add.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Constructor for new rset objects — new_rset","text":"rset object.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/new_rset.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Constructor for new rset objects — new_rset","text":"new rset constructed, additional attribute called \"fingerprint\" added hash rset. can used make sure objects exact resamples.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":null,"dir":"Reference","previous_headings":"","what":"Permutation sampling — permutations","title":"Permutation sampling — permutations","text":"permutation sample size original data set made permuting/shuffling one columns. results analysis samples columns original order columns permuted random order. Unlike sampling functions rsample, assessment set calling assessment() permutation split throw error.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Permutation sampling — permutations","text":"","code":"permutations(data, permute = NULL, times = 25, apparent = FALSE, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Permutation sampling — permutations","text":"data data frame. permute One columns shuffle. argument supports tidyselect selectors. Multiple expressions can combined c(). Variable names can used positions data frame, expressions like x:y can used select range variables. See language details. times number permutation samples. apparent logical. extra resample added analysis standard data set. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Permutation sampling — permutations","text":"tibble classes permutations, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Permutation sampling — permutations","text":"argument apparent enables option additional \"resample\" analysis data set original data set. Permutation-based resampling can especially helpful computing statistic null hypothesis (e.g. t-statistic). forms basis permutation test, computes test statistic possible permutations data.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/permutations.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Permutation sampling — permutations","text":"","code":"permutations(mtcars, mpg, times = 2) #> # Permutation sampling #> # Permuted columns: [mpg] #> # A tibble: 2 × 2 #> splits id #> #> 1 Permutations1 #> 2 Permutations2 permutations(mtcars, mpg, times = 2, apparent = TRUE) #> # Permutation sampling with apparent sample #> # Permuted columns: [mpg] #> # A tibble: 3 × 2 #> splits id #> #> 1 Permutations1 #> 2 Permutations2 #> 3 Apparent library(purrr) resample1 <- permutations(mtcars, starts_with(\"c\"), times = 1) resample1$splits[[1]] %>% analysis() #> mpg cyl disp hp drat wt qsec vs am gear carb #> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 1 #> Mazda RX4 Wag 21.0 4 160.0 110 3.90 2.875 17.02 0 1 4 2 #> Datsun 710 22.8 8 108.0 93 3.85 2.320 18.61 1 1 4 4 #> Hornet 4 Drive 21.4 8 258.0 110 3.08 3.215 19.44 1 0 3 4 #> Hornet Sportabout 18.7 4 360.0 175 3.15 3.440 17.02 0 0 3 1 #> Valiant 18.1 8 225.0 105 2.76 3.460 20.22 1 0 3 4 #> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 2 #> Merc 240D 24.4 8 146.7 62 3.69 3.190 20.00 1 0 4 3 #> Merc 230 22.8 8 140.8 95 3.92 3.150 22.90 1 0 4 4 #> Merc 280 19.2 8 167.6 123 3.92 3.440 18.30 1 0 4 2 #> Merc 280C 17.8 8 167.6 123 3.92 3.440 18.90 1 0 4 2 #> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 4 #> Merc 450SL 17.3 6 275.8 180 3.07 3.730 17.60 0 0 3 4 #> Merc 450SLC 15.2 6 275.8 180 3.07 3.780 18.00 0 0 3 4 #> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 8 #> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 2 #> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 3 #> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 2 #> Honda Civic 30.4 6 75.7 52 4.93 1.615 18.52 1 1 4 1 #> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 2 #> Toyota Corona 21.5 6 120.1 97 3.70 2.465 20.01 1 0 3 4 #> Dodge Challenger 15.5 4 318.0 150 2.76 3.520 16.87 0 0 3 1 #> AMC Javelin 15.2 4 304.0 150 3.15 3.435 17.30 0 0 3 2 #> Camaro Z28 13.3 4 350.0 245 3.73 3.840 15.41 0 0 3 2 #> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 3 #> Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 #> Porsche 914-2 26.0 6 120.3 91 4.43 2.140 16.70 0 1 5 4 #> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 1 #> Ford Pantera L 15.8 4 351.0 264 4.22 3.170 14.50 0 1 5 1 #> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 #> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 4 #> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 resample2 <- permutations(mtcars, hp, times = 10, apparent = TRUE) map_dbl(resample2$splits, function(x) { t.test(hp ~ vs, data = analysis(x))$statistic }) #> [1] 1.831884490 0.360219662 -1.271345514 -1.086517310 0.884050160 #> [6] 1.130681222 0.369342268 -2.595445455 0.007920257 0.562836352 #> [11] 6.290837794"},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":null,"dir":"Reference","previous_headings":"","what":"Add Assessment Indices — populate","title":"Add Assessment Indices — populate","text":"Many rsplit rset objects contain indicators assessment samples. populate() can used fill slot appropriate indices.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add Assessment Indices — populate","text":"","code":"populate(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add Assessment Indices — populate","text":"x rsplit rset object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Add Assessment Indices — populate","text":"object kind integer indices.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/populate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add Assessment Indices — populate","text":"","code":"set.seed(28432) fold_rs <- vfold_cv(mtcars) fold_rs$splits[[1]]$out_id #> [1] NA complement(fold_rs$splits[[1]]) #> [1] 1 9 25 27 populate(fold_rs$splits[[1]])$out_id #> [1] 1 9 25 27 fold_rs_all <- populate(fold_rs) fold_rs_all$splits[[1]]$out_id #> [1] 1 9 25 27"},{"path":"https://rsample.tidymodels.org/dev/reference/reexports.html","id":null,"dir":"Reference","previous_headings":"","what":"Objects exported from other packages — reexports","title":"Objects exported from other packages — reexports","text":"objects imported packages. Follow links see documentation. generics tidy tidyselect all_of, any_of, contains, ends_with, everything, last_col, matches, num_range, starts_with","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":null,"dir":"Reference","previous_headings":"","what":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"convenience function confidence intervals linear-ish parametric models","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"","code":"reg_intervals( formula, data, model_fn = \"lm\", type = \"student-t\", times = NULL, alpha = 0.05, filter = term != \"(Intercept)\", keep_reps = FALSE, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"formula R model formula one outcome least one predictor. data data frame. model_fn model fit. Allowable values \"lm\", \"glm\", \"survreg\", \"coxph\". latter two require survival package installed. type type bootstrap confidence interval. Values \"student-t\" \"percentile\" allowed. times single integer number bootstrap samples. left NULL, 1,001 used t-intervals 2,001 percentile intervals. alpha Level significance. filter logical expression used remove rows final result, NULL keep rows. keep_reps individual parameter estimates bootstrap sample retained? ... Options pass model function (family stats::glm()).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"tibble columns \"term\", \".lower\", \".estimate\", \".upper\", \".alpha\", \".method\". keep_reps = TRUE, additional list column called \".replicates\" also returned.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"references","dir":"Reference","previous_headings":"","what":"References","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"Davison, ., & Hinkley, D. (1997). Bootstrap Methods Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843 Bootstrap Confidence Intervals, https://rsample.tidymodels.org/articles/Applications/Intervals.html","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/reg_intervals.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"A convenience function for confidence intervals with linear-ish parametric models — reg_intervals","text":"","code":"# \\donttest{ set.seed(1) reg_intervals(mpg ~ I(1 / sqrt(disp)), data = mtcars) #> # A tibble: 1 × 6 #> term .lower .estimate .upper .alpha .method #> #> 1 I(1/sqrt(disp)) 207. 249. 290. 0.05 student-t set.seed(1) reg_intervals(mpg ~ I(1 / sqrt(disp)), data = mtcars, keep_reps = TRUE) #> # A tibble: 1 × 7 #> term .lower .estimate .upper .alpha .method .replicates #> #> 1 I(1/sqrt(disp)) 207. 249. 290. 0.05 student-t [1,001 × 2] # }"},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":null,"dir":"Reference","previous_headings":"","what":"","title":"","text":"function re-generates rset object, using arguments used generate original.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"","text":"","code":"reshuffle_rset(rset)"},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"","text":"rset rset object reshuffled","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"","text":"rset class rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reshuffle_rset.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"","text":"","code":"set.seed(123) (starting_splits <- group_vfold_cv(mtcars, cyl, v = 3)) #> # Group 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3 reshuffle_rset(starting_splits) #> # Group 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Resample1 #> 2 Resample2 #> 3 Resample3"},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":null,"dir":"Reference","previous_headings":"","what":"Reverse the analysis and assessment sets — reverse_splits","title":"Reverse the analysis and assessment sets — reverse_splits","text":"functions \"swaps\" analysis assessment sets either single rsplit rsplits splits column rset object.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Reverse the analysis and assessment sets — reverse_splits","text":"","code":"reverse_splits(x, ...) # Default S3 method reverse_splits(x, ...) # S3 method for class 'permutations' reverse_splits(x, ...) # S3 method for class 'perm_split' reverse_splits(x, ...) # S3 method for class 'rsplit' reverse_splits(x, ...) # S3 method for class 'rset' reverse_splits(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Reverse the analysis and assessment sets — reverse_splits","text":"x rset rsplit object. ... currently used.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Reverse the analysis and assessment sets — reverse_splits","text":"object class x","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/reverse_splits.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Reverse the analysis and assessment sets — reverse_splits","text":"","code":"set.seed(123) starting_splits <- vfold_cv(mtcars, v = 3) reverse_splits(starting_splits) #> # 3-fold cross-validation #> # A tibble: 3 × 2 #> splits id #> #> 1 Fold1 #> 2 Fold2 #> 3 Fold3 reverse_splits(starting_splits$splits[[1]]) #> #> <11/21/32>"},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":null,"dir":"Reference","previous_headings":"","what":"Rolling Origin Forecast Resampling — rolling_origin","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"resampling method useful data set strong time component. resamples random contain data points consecutive values. function assumes original data set sorted time order. function superseded sliding_window(), sliding_index(), sliding_period() provide flexibility control. Superseded functions go away, active development focused new functions.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"","code":"rolling_origin( data, initial = 5, assess = 1, cumulative = TRUE, skip = 0, lag = 0, ... )"},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"data data frame. initial number samples used analysis/modeling initial resample. assess number samples used assessment resample. cumulative logical. analysis resample grow beyond size specified initial resample?. skip integer indicating many () additional resamples skip thin total amount data points analysis resample. See example . lag value include lag assessment analysis set. useful lagged predictors used training testing. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"tibble classes rolling_origin, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"main options, initial assess, control number data points original data analysis assessment set, respectively. cumulative = TRUE, analysis set grow resampling continues assessment set size always remain static. skip enables function use every data point resamples. skip = 0, resampling data sets increment one position. Suppose rows data set consecutive days. Using skip = 6 make analysis data set operate weeks instead days. assessment set size affected option.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/rolling_origin.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Rolling Origin Forecast Resampling — rolling_origin","text":"","code":"set.seed(1131) ex_data <- data.frame(row = 1:20, some_var = rnorm(20)) dim(rolling_origin(ex_data)) #> [1] 15 2 dim(rolling_origin(ex_data, skip = 2)) #> [1] 5 2 dim(rolling_origin(ex_data, skip = 2, cumulative = FALSE)) #> [1] 5 2 # You can also roll over calendar periods by first nesting by that period, # which is especially useful for irregular series where a fixed window # is not useful. This example slides over 5 years at a time. library(dplyr) library(tidyr) data(drinks, package = \"modeldata\") drinks_annual <- drinks %>% mutate(year = as.POSIXlt(date)$year + 1900) %>% nest(data = c(-year)) multi_year_roll <- rolling_origin(drinks_annual, cumulative = FALSE) analysis(multi_year_roll$splits[[1]]) #> # A tibble: 5 × 2 #> year data #> #> 1 1992 #> 2 1993 #> 3 1994 #> 4 1995 #> 5 1996 assessment(multi_year_roll$splits[[1]]) #> # A tibble: 1 × 2 #> year data #> #> 1 1997 "},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":null,"dir":"Reference","previous_headings":"","what":"Compatibility with dplyr — rsample-dplyr","title":"Compatibility with dplyr — rsample-dplyr","text":"page lays compatibility rsample dplyr. rset objects rsample specific subclass tibbles, hence standard dplyr operations like joins well row column modifications work. However, whether operation returns rset tibble depends details operation. overarching principle operation leaves specific characteristics rset intact return rset. operation modifies following characteristics, result tibble rather rset: Rows: number rows needs remain unchanged retain rset property. example, 10-fold CV object without 10 rows. order rows can changed though object remains rset. Columns: splits column id column(s) required rset need remain untouched. dropped, renamed, modified result remain rset.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":"joins","dir":"Reference","previous_headings":"","what":"Joins","title":"Compatibility with dplyr — rsample-dplyr","text":"following affect dplyr joins, left_join(), right_join(), full_join(), inner_join(). resulting object rset number rows unaffected. Rows can reordered added removed, otherwise resulting object tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":"row-operations","dir":"Reference","previous_headings":"","what":"Row Operations","title":"Compatibility with dplyr — rsample-dplyr","text":"resulting object rset number rows unaffected. Rows can reordered added removed, otherwise resulting object tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-dplyr.html","id":"column-operations","dir":"Reference","previous_headings":"","what":"Column Operations","title":"Compatibility with dplyr — rsample-dplyr","text":"resulting object rset required splits id columns remain unaltered. Otherwise resulting object tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-package.html","id":null,"dir":"Reference","previous_headings":"","what":"rsample: General Resampling Infrastructure — rsample-package","title":"rsample: General Resampling Infrastructure — rsample-package","text":"Classes functions create summarize different types resampling objects (e.g. bootstrap, cross-validation).","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/rsample-package.html","id":"author","dir":"Reference","previous_headings":"","what":"Author","title":"rsample: General Resampling Infrastructure — rsample-package","text":"Maintainer: Hannah Frick hannah@posit.co (ORCID) Authors: Fanny Chow fannybchow@gmail.com Max Kuhn max@posit.co Michael Mahoney mike.mahoney.218@gmail.com (ORCID) Julia Silge julia.silge@posit.co (ORCID) Hadley Wickham hadley@posit.co contributors: Posit Software, PBC [copyright holder, funder]","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert Resampling Objects to Other Formats — rsample2caret","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"functions can convert resampling objects rsample caret.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"","code":"rsample2caret(object, data = c(\"analysis\", \"assessment\")) caret2rsample(ctrl, data = NULL)"},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"object rset object. Currently, nested_cv() supported. data data originally used produce ctrl object. ctrl object produced caret::trainControl() index indexOut elements populated integers. One method getting extract control objects object produced train.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rsample2caret.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert Resampling Objects to Other Formats — rsample2caret","text":"rsample2caret() returns list mimics index indexOut elements trainControl object. caret2rsample() returns rset object appropriate class.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":null,"dir":"Reference","previous_headings":"","what":"Extending rsample with new rset subclasses — rset_reconstruct","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"rset_reconstruct() encapsulates logic allowing new rset subclasses work properly vctrs (vctrs::vec_restore()) dplyr (dplyr::dplyr_reconstruct()). intended developer tool, required normal usage rsample.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"","code":"rset_reconstruct(x, to)"},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"x data frame restore rset subclass. rset subclass restore .","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"x restored rset subclass .","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"rset objects considered \"reconstructable\" vctrs/dplyr operation : x identical column named \"splits\" (column row order matter). x identical columns prefixed \"id\" (column row order matter).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/rset_reconstruct.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Extending rsample with new rset subclasses — rset_reconstruct","text":"","code":"to <- bootstraps(mtcars, times = 25) # Imitate a vctrs/dplyr operation, # where the class might be lost along the way x <- tibble::as_tibble(to) # Say we added a new column to `x`. Here we mock a `mutate()`. x$foo <- \"bar\" # This is still reconstructable to `to` rset_reconstruct(x, to) #> # Bootstrap sampling #> # A tibble: 25 × 3 #> splits id foo #> #> 1 Bootstrap01 bar #> 2 Bootstrap02 bar #> 3 Bootstrap03 bar #> 4 Bootstrap04 bar #> 5 Bootstrap05 bar #> 6 Bootstrap06 bar #> 7 Bootstrap07 bar #> 8 Bootstrap08 bar #> 9 Bootstrap09 bar #> 10 Bootstrap10 bar #> # ℹ 15 more rows # Say we lose the first row x <- x[-1, ] # This is no longer reconstructable to `to`, as `x` is no longer an rset # bootstraps object with 25 bootstraps if one is lost! rset_reconstruct(x, to) #> # A tibble: 24 × 3 #> splits id foo #> #> 1 Bootstrap02 bar #> 2 Bootstrap03 bar #> 3 Bootstrap04 bar #> 4 Bootstrap05 bar #> 5 Bootstrap06 bar #> 6 Bootstrap07 bar #> 7 Bootstrap08 bar #> 8 Bootstrap09 bar #> 9 Bootstrap10 bar #> 10 Bootstrap11 bar #> # ℹ 14 more rows"},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":null,"dir":"Reference","previous_headings":"","what":"Time-based Resampling — slide-resampling","title":"Time-based Resampling — slide-resampling","text":"resampling functions focused various forms time series resampling. sliding_window() uses row number computing resampling indices. independent time index, useful completely regular series. sliding_index() computes resampling indices relative index column. often Date POSIXct column, . useful resampling irregular series, using irregular lookback periods lookback = lubridate::years(1) daily data (number days year may vary). sliding_period() first breaks index less granular groups based period, uses construct resampling indices. extremely useful constructing rolling monthly yearly windows daily data.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Time-based Resampling — slide-resampling","text":"","code":"sliding_window( data, ..., lookback = 0L, assess_start = 1L, assess_stop = 1L, complete = TRUE, step = 1L, skip = 0L ) sliding_index( data, index, ..., lookback = 0L, assess_start = 1L, assess_stop = 1L, complete = TRUE, step = 1L, skip = 0L ) sliding_period( data, index, period, ..., lookback = 0L, assess_start = 1L, assess_stop = 1L, complete = TRUE, step = 1L, skip = 0L, every = 1L, origin = NULL )"},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Time-based Resampling — slide-resampling","text":"data data frame. ... dots future extensions must empty. lookback number elements look back current element computing resampling indices analysis set. current row always included analysis set. sliding_window(), single integer defining number rows look back current row. sliding_index(), single object subtracted index index - lookback define boundary start searching rows include current resample. often integer value corresponding number days look back, lubridate Period object. sliding_period(), single integer defining number groups look back current group, groups defined breaking index according period. cases, Inf also allowed force expanding window. assess_start, assess_stop combination arguments determines far future look constructing assessment set. Together construct range [index + assess_start, index + assess_stop] search rows include assessment set. Generally, assess_start always 1 indicate first value potentially include assessment set start one element current row, can increased larger value create \"gaps\" analysis assessment set worried high levels correlation short term forecasting. sliding_window(), single integers defining number rows look forward current row. sliding_index(), single objects added index compute range search rows include assessment set. often integer value corresponding number days look forward, lubridate Period object. sliding_period(), single integers defining number groups look forward current group, groups defined breaking index according period. complete single logical. using lookback compute analysis sets, complete windows considered? set FALSE, partial windows used possible create complete window (based lookback). way use expanding window certain point, switch sliding window. step single positive integer. computing resampling indices, step used thin results selecting every step-th result subsetting indices seq(1L, n_indices, = step). step applied skip. Note step independent time index used. skip single positive integer, zero. computing resampling indices, first skip results dropped subsetting indices seq(skip + 1L, n_indices). can especially useful combined lookback = Inf, creates expanding window starting first row. skipping forward, can drop first windows data points. skip applied step. Note skip independent time index used. index index compute resampling indices relative , specified bare column name. must existing column data. sliding_index(), commonly date vector, required. sliding_period(), required Date POSIXct vector. index must increasing vector, duplicate values allowed. Additionally, index contain missing values. period period group index . specified single string, \"year\" \"month\". See .period argument slider::slide_period() full list options explanation. every single positive integer. number periods group together. example, period set \"year\" every value 2, years 1970 1971 placed group. origin reference date time value. default left NULL epoch time 1970-01-01 00:00:00, time zone index. generally used define anchor time count , relevant every value > 1.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/slide-resampling.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Time-based Resampling — slide-resampling","text":"","code":"library(vctrs) #> #> Attaching package: ‘vctrs’ #> The following object is masked from ‘package:tibble’: #> #> data_frame #> The following object is masked from ‘package:dplyr’: #> #> data_frame library(tibble) library(modeldata) data(\"Chicago\") index <- new_date(c(1, 3, 4, 7, 8, 9, 13, 15, 16, 17)) df <- tibble(x = 1:10, index = index) df #> # A tibble: 10 × 2 #> x index #> #> 1 1 1970-01-02 #> 2 2 1970-01-04 #> 3 3 1970-01-05 #> 4 4 1970-01-08 #> 5 5 1970-01-09 #> 6 6 1970-01-10 #> 7 7 1970-01-14 #> 8 8 1970-01-16 #> 9 9 1970-01-17 #> 10 10 1970-01-18 # Look back two rows beyond the current row, for a total of three rows # in each analysis set. Each assessment set is composed of the two rows after # the current row. sliding_window(df, lookback = 2, assess_stop = 2) #> # Sliding window resampling #> # A tibble: 6 × 2 #> splits id #> #> 1 Slice1 #> 2 Slice2 #> 3 Slice3 #> 4 Slice4 #> 5 Slice5 #> 6 Slice6 # Same as before, but step forward by 3 rows between each resampling slice, # rather than just by 1. rset <- sliding_window(df, lookback = 2, assess_stop = 2, step = 3) rset #> # Sliding window resampling #> # A tibble: 2 × 2 #> splits id #> #> 1 Slice1 #> 2 Slice2 analysis(rset$splits[[1]]) #> # A tibble: 3 × 2 #> x index #> #> 1 1 1970-01-02 #> 2 2 1970-01-04 #> 3 3 1970-01-05 analysis(rset$splits[[2]]) #> # A tibble: 3 × 2 #> x index #> #> 1 4 1970-01-08 #> 2 5 1970-01-09 #> 3 6 1970-01-10 # Now slide relative to the `index` column in `df`. This time we look back # 2 days from the current row's `index` value, and 2 days forward from # it to construct the assessment set. Note that this series is irregular, # so it produces different results than `sliding_window()`. Additionally, # note that it is entirely possible for the assessment set to contain no # data if you have a highly irregular series and \"look forward\" into a # date range where no data points actually exist! sliding_index(df, index, lookback = 2, assess_stop = 2) #> # Sliding index resampling #> # A tibble: 7 × 2 #> splits id #> #> 1 Slice1 #> 2 Slice2 #> 3 Slice3 #> 4 Slice4 #> 5 Slice5 #> 6 Slice6 #> 7 Slice7 # With `sliding_period()`, we can break up our date index into less granular # chunks, and slide over them instead of the index directly. Here we'll use # the Chicago data, which contains daily data spanning 16 years, and we'll # break it up into rolling yearly chunks. Three years worth of data will # be used for the analysis set, and one years worth of data will be held out # for performance assessment. sliding_period( Chicago, date, \"year\", lookback = 2, assess_stop = 1 ) #> # Sliding period resampling #> # A tibble: 13 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 #> 3 Slice03 #> 4 Slice04 #> 5 Slice05 #> 6 Slice06 #> 7 Slice07 #> 8 Slice08 #> 9 Slice09 #> 10 Slice10 #> 11 Slice11 #> 12 Slice12 #> 13 Slice13 # Because `lookback = 2`, three years are required to form a \"complete\" # window of data. To allow partial windows, set `complete = FALSE`. # Here that first constructs two expanding windows until a complete three # year window can be formed, at which point we switch to a sliding window. sliding_period( Chicago, date, \"year\", lookback = 2, assess_stop = 1, complete = FALSE ) #> # Sliding period resampling #> # A tibble: 15 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 #> 3 Slice03 #> 4 Slice04 #> 5 Slice05 #> 6 Slice06 #> 7 Slice07 #> 8 Slice08 #> 9 Slice09 #> 10 Slice10 #> 11 Slice11 #> 12 Slice12 #> 13 Slice13 #> 14 Slice14 #> 15 Slice15 # Alternatively, you could break the resamples up by month. Here we'll # use an expanding monthly window by setting `lookback = Inf`, and each # assessment set will contain two months of data. To ensure that we have # enough data to fit our models, we'll `skip` the first 4 expanding windows. # Finally, to thin out the results, we'll `step` forward by 2 between # each resample. sliding_period( Chicago, date, \"month\", lookback = Inf, assess_stop = 2, skip = 4, step = 2 ) #> # Sliding period resampling #> # A tibble: 91 × 2 #> splits id #> #> 1 Slice01 #> 2 Slice02 #> 3 Slice03 #> 4 Slice04 #> 5 Slice05 #> 6 Slice06 #> 7 Slice07 #> 8 Slice08 #> 9 Slice09 #> 10 Slice10 #> # ℹ 81 more rows"},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":null,"dir":"Reference","previous_headings":"","what":"Tidy Resampling Object — tidy.rsplit","title":"Tidy Resampling Object — tidy.rsplit","text":"tidy() function broom package can used rset rsplit objects generate tibbles rows analysis assessment sets.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tidy Resampling Object — tidy.rsplit","text":"","code":"# S3 method for class 'rsplit' tidy(x, unique_ind = TRUE, ...) # S3 method for class 'rset' tidy(x, unique_ind = TRUE, ...) # S3 method for class 'vfold_cv' tidy(x, ...) # S3 method for class 'nested_cv' tidy(x, unique_ind = TRUE, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tidy Resampling Object — tidy.rsplit","text":"x rset rsplit object unique_ind unique row identifiers returned? example, FALSE bootstrapping results include multiple rows sample row original data. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tidy Resampling Object — tidy.rsplit","text":"tibble columns Row Data. latter possible values \"Analysis\" \"Assessment\". rset inputs, identification columns also returned names values depend type resampling. vfold_cv(), contains column \"Fold\" , repeats used, another called \"Repeats\". bootstraps() mc_cv() use column \"Resample\".","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Tidy Resampling Object — tidy.rsplit","text":"Note nested resampling, rows inner resample, named inner_Row, relative row indices correspond rows original data set.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/tidy.rsplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tidy Resampling Object — tidy.rsplit","text":"","code":"library(ggplot2) theme_set(theme_bw()) set.seed(4121) cv <- tidy(vfold_cv(mtcars, v = 5)) ggplot(cv, aes(x = Fold, y = Row, fill = Data)) + geom_tile() + scale_fill_brewer() set.seed(4121) rcv <- tidy(vfold_cv(mtcars, v = 5, repeats = 2)) ggplot(rcv, aes(x = Fold, y = Row, fill = Data)) + geom_tile() + facet_wrap(~Repeat) + scale_fill_brewer() set.seed(4121) mccv <- tidy(mc_cv(mtcars, times = 5)) ggplot(mccv, aes(x = Resample, y = Row, fill = Data)) + geom_tile() + scale_fill_brewer() set.seed(4121) bt <- tidy(bootstraps(mtcars, time = 5)) ggplot(bt, aes(x = Resample, y = Row, fill = Data)) + geom_tile() + scale_fill_brewer() dat <- data.frame(day = 1:30) # Resample by week instead of day ts_cv <- rolling_origin(dat, initial = 7, assess = 7, skip = 6, cumulative = FALSE ) ts_cv <- tidy(ts_cv) ggplot(ts_cv, aes(x = Resample, y = factor(Row), fill = Data)) + geom_tile() + scale_fill_brewer()"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a Validation Split for Tuning — validation_set","title":"Create a Validation Split for Tuning — validation_set","text":"validation_set() creates validation split model tuning.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a Validation Split for Tuning — validation_set","text":"","code":"validation_set(split, ...) # S3 method for class 'val_split' analysis(x, ...) # S3 method for class 'val_split' assessment(x, ...) # S3 method for class 'val_split' training(x, ...) # S3 method for class 'val_split' validation(x, ...) # S3 method for class 'val_split' testing(x, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a Validation Split for Tuning — validation_set","text":"split object class initial_validation_split, resulting initial_validation_split() group_initial_validation_split(). ... dots future extensions must empty. x rsplit object produced validation_set().","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a Validation Split for Tuning — validation_set","text":"tibble classes validation_set, rset, tbl_df, tbl, data.frame. results include column data split object column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_set.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a Validation Split for Tuning — validation_set","text":"","code":"set.seed(1353) car_split <- initial_validation_split(mtcars) car_set <- validation_set(car_split)"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a Validation Set — validation_split","title":"Create a Validation Set — validation_split","text":"function deprecated part approach constructing training, validation, testing set sequence two binary splits: testing / -testing (initial_split() one variants) -testing split training/validation validation_split(). Instead, now use initial_validation_split() one variants construct three sets via one 3-way split. validation_split() takes single random sample (without replacement) original data set used analysis. data points added assessment set (used validation set). validation_time_split() , takes first prop samples training, instead random selection. group_validation_split() creates splits data based grouping variable, data \"group\" assigned split. Note input data validation_split(), validation_time_split(), group_validation_split() contain testing data. create three-way split directly entire data set, use initial_validation_split().","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a Validation Set — validation_split","text":"","code":"validation_split(data, prop = 3/4, strata = NULL, breaks = 4, pool = 0.1, ...) validation_time_split(data, prop = 3/4, lag = 0, ...) group_validation_split(data, group, prop = 3/4, ..., strata = NULL, pool = 0.1)"},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a Validation Set — validation_split","text":"data data frame. prop proportion data retained modeling/analysis. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty. lag value include lag assessment analysis set. useful lagged predictors used training testing. group variable data (single character name) used grouping observations value either analysis assessment set within fold.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a Validation Set — validation_split","text":"tibble classes validation_split, rset, tbl_df, tbl, data.frame. results include column data split objects column called id character string resample identifier.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a Validation Set — validation_split","text":"strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":[]},{"path":"https://rsample.tidymodels.org/dev/reference/validation_split.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a Validation Set — validation_split","text":"","code":"cars_split <- initial_split(mtcars) cars_not_testing <- training(cars_split) validation_split(cars_not_testing, prop = .9) #> Warning: `validation_split()` was deprecated in rsample 1.2.0. #> ℹ Please use `initial_validation_split()` instead. #> # Validation Set Split (0.9/0.1) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation group_validation_split(cars_not_testing, cyl) #> Warning: `group_validation_split()` was deprecated in rsample 1.2.0. #> ℹ Please use `group_initial_validation_split()` instead. #> # Group Validation Set Split (0.75/0.25) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation data(drinks, package = \"modeldata\") validation_time_split(drinks[1:200,]) #> Warning: `validation_time_split()` was deprecated in rsample 1.2.0. #> ℹ Please use `initial_validation_time_split()` instead. #> # Validation Set Split (0.75/0.25) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation # Alternative cars_split_3 <- initial_validation_split(mtcars) validation_set(cars_split_3) #> # A tibble: 1 × 2 #> splits id #> #> 1 validation"},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"V-Fold Cross-Validation — vfold_cv","title":"V-Fold Cross-Validation — vfold_cv","text":"V-fold cross-validation (also known k-fold cross-validation) randomly splits data V groups roughly equal size (called \"folds\"). resample analysis data consists V-1 folds assessment set contains final fold. basic V-fold cross-validation (.e. repeats), number resamples equal V.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"V-Fold Cross-Validation — vfold_cv","text":"","code":"vfold_cv(data, v = 10, repeats = 1, strata = NULL, breaks = 4, pool = 0.1, ...)"},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"V-Fold Cross-Validation — vfold_cv","text":"data data frame. v number partitions data set. repeats number times repeat V-fold partitioning. strata variable data (single character name) used conduct stratified sampling. NULL, resample created within stratification variable. Numeric strata binned quartiles. breaks single number giving number bins desired stratify numeric stratification variable. pool proportion data used determine particular group small pooled another group. recommend decreasing argument default 0.1 dangers stratifying groups small. ... dots future extensions must empty.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"V-Fold Cross-Validation — vfold_cv","text":"tibble classes vfold_cv, rset, tbl_df, tbl, data.frame. results include column data split objects one identification variables. single repeat, one column called id character string fold identifier. repeats, id repeat number additional column called id2 contains fold information (within repeat).","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"V-Fold Cross-Validation — vfold_cv","text":"one repeat, basic V-fold cross-validation conducted time. example, three repeats used v = 10, total 30 splits: three groups 10 generated separately. strata argument, random sampling conducted within stratification variable. can help ensure resamples equivalent proportions original data set. categorical variable, sampling conducted separately within class. numeric stratification variable, strata binned quartiles, used stratify. Strata 10% total pooled together; see make_strata() details.","code":""},{"path":"https://rsample.tidymodels.org/dev/reference/vfold_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"V-Fold Cross-Validation — vfold_cv","text":"","code":"vfold_cv(mtcars, v = 10) #> # 10-fold cross-validation #> # A tibble: 10 × 2 #> splits id #> #> 1 Fold01 #> 2 Fold02 #> 3 Fold03 #> 4 Fold04 #> 5 Fold05 #> 6 Fold06 #> 7 Fold07 #> 8 Fold08 #> 9 Fold09 #> 10 Fold10 vfold_cv(mtcars, v = 10, repeats = 2) #> # 10-fold cross-validation repeated 2 times #> # A tibble: 20 × 3 #> splits id id2 #> #> 1 Repeat1 Fold01 #> 2 Repeat1 Fold02 #> 3 Repeat1 Fold03 #> 4 Repeat1 Fold04 #> 5 Repeat1 Fold05 #> 6 Repeat1 Fold06 #> 7 Repeat1 Fold07 #> 8 Repeat1 Fold08 #> 9 Repeat1 Fold09 #> 10 Repeat1 Fold10 #> 11 Repeat2 Fold01 #> 12 Repeat2 Fold02 #> 13 Repeat2 Fold03 #> 14 Repeat2 Fold04 #> 15 Repeat2 Fold05 #> 16 Repeat2 Fold06 #> 17 Repeat2 Fold07 #> 18 Repeat2 Fold08 #> 19 Repeat2 Fold09 #> 20 Repeat2 Fold10 library(purrr) data(wa_churn, package = \"modeldata\") set.seed(13) folds1 <- vfold_cv(wa_churn, v = 5) map_dbl( folds1$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2649982 0.2660632 0.2609159 0.2679681 0.2669033 set.seed(13) folds2 <- vfold_cv(wa_churn, strata = churn, v = 5) map_dbl( folds2$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2653532 0.2653532 0.2653532 0.2653532 0.2654365 set.seed(13) folds3 <- vfold_cv(wa_churn, strata = tenure, breaks = 6, v = 5) map_dbl( folds3$splits, function(x) { dat <- as.data.frame(x)$churn mean(dat == \"Yes\") } ) #> [1] 0.2656250 0.2661104 0.2652228 0.2638396 0.2660518"},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-development-version","dir":"Changelog","previous_headings":"","what":"rsample (development version)","title":"rsample (development version)","text":"new inner_split() function methods various resamples usage tune create inner resample analysis set fit preprocessor model one part post-processor part (#483, #488, #489). Started moving error messages cli (#499, #502). contributions @JamesHWade (#518). Fixed example nested_cv() (@seb09, #520). rolling_origin() now superseded sliding_window(), sliding_index(), sliding_period() provide flexibility control (@nmercadeb, #524). Removed trailing space printing mc_cv() objects (@ccani007, #464). Improved documentation initial_split() friends (@laurabrianna, #519). Formatting improvement: package names now backticks anymore (@agmurray, #525). Improved documentation formatting: function names now easily identifiable either () end links function documentation (@brshallo , #521).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"bug-fixes-development-version","dir":"Changelog","previous_headings":"","what":"Bug fixes","title":"rsample (development version)","text":"vfold_cv() now utilizes breaks argument correctly repeated cross-validation (@ZWael, #471). Grouped resampling functions now work explicit strata = NULL instead strata either name missing (#485).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"breaking-changes-development-version","dir":"Changelog","previous_headings":"","what":"Breaking changes","title":"rsample (development version)","text":"class grouped MC splits now group_mc_split instead grouped_mc_split, aligning grouped splits (#478). rsplit objects apparent() split now correct class inheritance structure. order now apparent_split rsplit rather way around (#477).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-121","dir":"Changelog","previous_headings":"","what":"rsample 1.2.1","title":"rsample 1.2.1","text":"CRAN release: 2024-03-25 nested_cv() longer errors outside long call (#459, #461). validation_set class now pretty() method (#456).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-120","dir":"Changelog","previous_headings":"","what":"rsample 1.2.0","title":"rsample 1.2.0","text":"CRAN release: 2023-08-23 new initial_validation_split(), along variants initial_validation_time_split() group_initial_validation_split(), generates three-way split data training, validation, test sets. new validation_set(), can turned rset object tuning (#403, #446). validation_split(), validation_time_split(), group_validation_split() soft-deprecated favor new functions implementing 3-way split (initial_validation_split(), initial_validation_time_split(), group_initial_validation_split()) (#449). Functions don’t use ellipsis ... now enforce empty dots (#429). make_splits() gained example documentation (@AngelFelizR, #432). training(), testing(), analysis(), assessment() now S3 generics methods rsplit objects. Previously manually required input rsplit object (#384). int_*() functions now S3 generics corresponding methods class bootstraps (#435). underlying mechanics data splitting changed Surv objects maintain class. change affects row names resulting objects; reindexed one instead subset original row names (#443). rsample re-export gather() anymore (#451).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-111","dir":"Changelog","previous_headings":"","what":"rsample 1.1.1","title":"rsample 1.1.1","text":"CRAN release: 2022-12-07 grouped resampling functions (group_vfold_cv(), group_mc_cv(), group_initial_split() group_validation_split(), group_bootstraps()) now support stratification. Strata must constant within group (@mikemahoney218, #317, #360, #363, #364, #365). Added new function, clustering_cv(), blocked cross-validation various predictor spaces. flexible function, taking arguments distance_function cluster_function, allowing used spatial clustering well potentially phylogenetic forms clustering (@mikemahoney218, #351). bootstraps() group_bootstraps() now warn resampling returns empty assessment sets. Previously, bootstraps() silent group_bootstraps() errored (@mikemahoney218, #356, #357). assessment set validation_time_split() now also contains lagged observations (#376). new helper get_rsplit() lets conveniently access rsplit objects inside rset objects (@mikemahoney218, #399). result initial_time_split() now subclass \"initial_time_split\", addition existing classes (#397). dependency ellipsis package removed (#393). Removed overly strict test preparation dplyr 1.1.0 (#380).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-110","dir":"Changelog","previous_headings":"","what":"rsample 1.1.0","title":"rsample 1.1.0","text":"CRAN release: 2022-08-08 rset objects now include parameters used create attributes (#329). Objects returned sliding functions now index attribute, appropriate, containing column name used index (#329). Objects returned permutations() now permutes attribute containing column name used permutation (#329). Added breaks pool attributes functions support stratification (#329). Changed “strata” attribute rset objects now either character vector identifying column used stratify data, present (set NULL) stratification used. (#329) Added new function, reshuffle_rset(), takes rset object generates new version using arguments current random seed. (#79, #329) Added arguments control group_vfold_cv() combines groups. Use balance = \"groups\" assign (roughly) number groups fold, balance = \"observations\" assign (roughly) number observations fold. Added repeats argument group_vfold_cv() (#330). Added new functions grouped resampling: group_mc_cv() (#313), group_initial_split() group_validation_split() (#315), group_bootstraps() (#316). Added new function, reverse_splits(), swap analysis assessment splits (#319, #284). Improved error thrown calling assessment() perm_split object created permutations() (#321, #322).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-100","dir":"Changelog","previous_headings":"","what":"rsample 1.0.0","title":"rsample 1.0.0","text":"CRAN release: 2022-06-24 Fixed nested_cv() handles call objects variables environment can used specifying resampling schemes (#81). Updated testthat 3e (#280) added better checking vfold_cv() (#293). Finally removed gather() method rset objects. Use tidyr::pivot_longer() instead (#280). Changed initial_split() avoid calling tidyselect twice strata (#296). fix stops initial_split() generating messages like: Added better printing methods initial split objects.","code":"Note: Using an external vector in selections is ambiguous. i Use `all_of(strata)` instead of `strata` to silence this message. i See ."},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-011","dir":"Changelog","previous_headings":"","what":"rsample 0.1.1","title":"rsample 0.1.1","text":"CRAN release: 2021-11-08 Updated documentation stratified sampling (#245). Changed make_splits() S3 generic, original functionality method list new method dataframes allows users create split existing analysis & assessment sets (@LiamBlake, #246). Added validation_time_split() single validation sample taking first samples training (@mine-cetinkaya-rundel, #256). Escalated deprecation gather() method rset objects hard deprecation. Use tidyr::pivot_longer() instead (#257). Changed resample “fingerprint” hash indices rather entire resample result (including data object). much faster still ensure resample original data object (#259).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-010","dir":"Changelog","previous_headings":"","what":"rsample 0.1.0","title":"rsample 0.1.0","text":"CRAN release: 2021-05-08 Fixed mc_cv(), initial_split(), validation_split() use prop argument first compute assessment indices, rather analysis indices. minor breaking change situations; previous implementation cause inconsistency sizes generated analysis assessment sets compared prop documented function (#217, @issactoast). Fixed problem creation apparent() (#223) caret2rsample() (#232) resamples. Re-licensed package GPL-2 MIT. See consent copyright holders . Attempts stratify Surv object now error informatively (#230). Exposed pool argument make_strata() user-facing resampling functions (#229). Deprecated gather() method rset objects favor tidyr::pivot_longer() (#233). Fixed bug make_strata() numeric variables NA values (@brian-j-smith, #236).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-009","dir":"Changelog","previous_headings":"","what":"rsample 0.0.9","title":"rsample 0.0.9","text":"CRAN release: 2021-02-17 New rset_reconstruct(), developer tool ease creation new rset subclasses (#210). Added permutations(), function creating permutation resamples performing column-wise shuffling (@mattwarkentin, #198). Fixed issue empty assessment sets couldn’t created make_splits() (#188). rset objects now contain “fingerprint” attribute can used check see object uses resamples. reg_intervals() function convenience function lm(), glm(), survreg(), coxph() models (#206). internal functions exported rsample-adjacent packages can use underlying code. obj_sum() method rsplit objects updated (#215). Changed inheritance structure rsplit objects specific general simplified methods complement() generic (#216).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-008","dir":"Changelog","previous_headings":"","what":"rsample 0.0.8","title":"rsample 0.0.8","text":"CRAN release: 2020-09-23 New manual_rset() constructing rset objects manually custom rsplits (tidymodels/tune#273). Three new time based resampling functions added: sliding_window(), sliding_index(), sliding_period(), flexibility pre-existing rolling_origin(). Correct alpha parameter handling bootstrap CI functions (#179, #184).","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-007","dir":"Changelog","previous_headings":"","what":"rsample 0.0.7","title":"rsample 0.0.7","text":"CRAN release: 2020-06-04 Lower threshold pooling strata 10% (15%) (#149). print() methods rsplit val_split objects adjusted show \"\" , respectively. drinks, attrition, two_class_dat data sets removed. modeldata package. Compatability dplyr 1.0.0.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-006","dir":"Changelog","previous_headings":"","what":"rsample 0.0.6","title":"rsample 0.0.6","text":"CRAN release: 2020-03-31 Added validation_set() making single resample. Correct tidy method bootstraps (#115). Changes upcoming `tibble release. Exported constructors rset split objects (#40) initial_time_split() rolling_origin() now lag parameter ensures previous data available lagged variables can calculated. (#135, #136)","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-005","dir":"Changelog","previous_headings":"","what":"rsample 0.0.5","title":"rsample 0.0.5","text":"CRAN release: 2019-07-12 Added three functions compute different bootstrap confidence intervals. new function (add_resample_id()) augments data frame columns resampling identifier. Updated initial_split(), mc_cv(), vfold_cv(), bootstraps(), group_vfold_cv() use tidyselect stratification variable. Updated initial_split(), mc_cv(), vfold_cv(), bootstraps() new breaks parameter specifies number bins stratify numeric stratification variable.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-004","dir":"Changelog","previous_headings":"","what":"rsample 0.0.4","title":"rsample 0.0.4","text":"CRAN release: 2019-01-07 Small maintenance release.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"minor-improvements-and-fixes-0-0-4","dir":"Changelog","previous_headings":"","what":"Minor improvements and fixes","title":"rsample 0.0.4","text":"fill() removed per deprecation warning. Small changes made new version tibble.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-003","dir":"Changelog","previous_headings":"","what":"rsample 0.0.3","title":"rsample 0.0.3","text":"CRAN release: 2018-11-20","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"new-features-0-0-3","dir":"Changelog","previous_headings":"","what":"New features","title":"rsample 0.0.3","text":"Added function initial_time_split() ordered initial sampling appropriate time series data.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"minor-improvements-and-fixes-0-0-3","dir":"Changelog","previous_headings":"","what":"Minor improvements and fixes","title":"rsample 0.0.3","text":"fill() renamed populate() avoid conflict tidyr::fill(). Changed R version requirement R >= 3.1 instead 3.3.3. recipes-related prepper() function moved recipes package. makes rsample install footprint much smaller. rsplit objects shown differently inside tibble. Moved broom package generics package.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-002","dir":"Changelog","previous_headings":"","what":"rsample 0.0.2","title":"rsample 0.0.2","text":"CRAN release: 2017-11-12 initial_split, training, testing added training/testing splits prior resampling. Another resampling method, group_vfold_cv, added. caret2rsample rsample2caret can convert rset objects used caret::trainControl vice-versa. function called form_pred can used determine original names predictors formula terms object. vignette function (prepper) included facilitate using recipes rsample. gather method added rset objects. labels method added rsplit objects. can help identify resample used even whole rset object available. variety dplyr methods added (e.g. filter(), mutate(), etc) work without dropping classes attributes rsample objects.","code":""},{"path":"https://rsample.tidymodels.org/dev/news/index.html","id":"rsample-001-2017-07-08","dir":"Changelog","previous_headings":"","what":"rsample 0.0.1 (2017-07-08)","title":"rsample 0.0.1 (2017-07-08)","text":"CRAN release: 2017-07-08 Initial public version CRAN","code":""}]