Skip to content

Clarify number of samples taken in the resample function #17

@tompollard

Description

@tompollard

In https://github.com/carpentries-incubator/machine-learning-novice-python/blob/gh-pages/_episodes/07-bootstrapping.md, the following chunk is used to resample the datasets for bootstrapping:

X_bs, y_bs = resample(x_train, y_train, replace=True)

The number of samples isn't specified in the function call, so it is unclear how many samples are being taken.

According to the documentation at https://scikit-learn.org/stable/modules/generated/sklearn.utils.resample.html the number of samples is specified in the n_samples argument:

"n_samples int, default=None
Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays. If replace is False it should not be larger than the length of arrays."

By default resample will use the length of the array as the number of samples. We should either: (1) note this default or (2) provide the n_samples argument.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions