Skip to content

rep_slice_sample on groups with multiple n values #527

@adrie-stclair

Description

@adrie-stclair

Hello package maintainers!
I am building confidence intervals for groups with bootstrapped values and I'm having trouble creating multiple re-sampled datasets from which to build my confidence intervals.

Using the palmerpenguins library as an example:

library(tidyverse)
library(infer)
library(palmerpenguins)

There are 344 total observations and each species has a different number of observations:

nrow(penguins)
# [1] 344

penguins %>% group_by(species) %>% count()

# A tibble: 3 × 2
# Groups:   species [3]
#  species       n
  <fct>     <int>
#1 Adelie      152
#2 Chinstrap    68
#3 Gentoo      124

I want to be able to group by the species, and for each species pull multiple samples while using the original number of observations per each group.

set.seed(100)

slices <- penguins2 %>% 
    group_by(species) %>% 
    rep_slice_sample(prop = 1, replace = TRUE, reps = 10)

That should give me 344 * 10 = 3440 lines in the full new data set. This is true, but when you look at the data you can see that each replicate has a different number of observations. For all of the Adelie, n per sample should be 152, chinstrap should be 68, and Gentoo should be 124. Instead we find this:

slices %>% group_by(species, replicate) %>% count()

# A tibble: 30 × 3
# Groups:   species, replicate [30]
#   species replicate     n
#   <fct>       <int> <int>
#1 Adelie          1   148
#2 Adelie          2   147
# 3 Adelie          3   148
# 4 Adelie          4   151
# 5 Adelie          5   138
# 6 Adelie          6   157
# 7 Adelie          7   161
# 8 Adelie          8   157
# 9 Adelie          9   151
#10 Adelie         10   138
# ℹ 20 more rows
# ℹ Use `print(n = ...)` to see more rows

What am I missing?
thanks for your insight.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions