Sample Size for shap-select

Hello, 

I’m trying to use this shap-select code and would like to know how many samples are needed to calculate SHAP values in this method. I’ve included the relevant code below—in the select_shap method, it appears that only **10%** of all samples are used to compute SHAP values. Is that correct?

Thank you for your help.

X, y = shap.datasets.diabetes()
feature_names = X.columns.tolist()

X_train, X_val, y_train, y_val = train_test_split( X, y, **test_size=0.1**, random_state=42)

import xgboost as xgb

dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
params = {
        "objective": "reg:squarederror",
        "eval_metric": "rmse",
        "verbosity": 0,
    }

model = xgb.train(
    params, dtrain, num_boost_round=1000, evals= [(dval, "valid")], early_stopping_rounds=50
)
selected_features_df = shap_select(model, X_val, y_val, task="regression", threshold=0.05)


Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sample Size for shap-select #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sample Size for shap-select #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions