-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hello,
I’m trying to use this shap-select code and would like to know how many samples are needed to calculate SHAP values in this method. I’ve included the relevant code below—in the select_shap method, it appears that only 10% of all samples are used to compute SHAP values. Is that correct?
Thank you for your help.
X, y = shap.datasets.diabetes()
feature_names = X.columns.tolist()
X_train, X_val, y_train, y_val = train_test_split( X, y, test_size=0.1, random_state=42)
import xgboost as xgb
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
params = {
"objective": "reg:squarederror",
"eval_metric": "rmse",
"verbosity": 0,
}
model = xgb.train(
params, dtrain, num_boost_round=1000, evals= [(dval, "valid")], early_stopping_rounds=50
)
selected_features_df = shap_select(model, X_val, y_val, task="regression", threshold=0.05)
Thanks