-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pred_contribs in 2.1.1 takes significantly more gpu memory than in 1.4.2 #10936
Comments
Thank you for raising the issue. I just did a simple test. I think different models and imprecise measuring caused the change in observed output. I generated a sample model using the latest XGBoost and ran the SHAP prediction using the latest and the 1.7 branches. The results from both runs are consistent, with peak memory around 4.8-4.9GB. Following is the screenshot of |
Thank you for the answer. I will keep this in mind for any future cases. |
Hi again @trivialfis, I've rerun the tests based upon your feedback. I do agree that there is no degradation between 1.7.6 and 2.1.2 versions of the library, however, I still do find such degradation between 1.4.2 and later versions. What I have done is to have a script for training and saving a model and then another script that is profiled, which simply loads the saved model and calculates shap values. Here are the results with scripts also provided below: -1.4.2 peak: -1.7.6 peak: -2.1.2 peak: And here are the scripts: from typing import Tuple
import pandas as pd
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier, __version__ as xgb_version
def download_data() -> Tuple[pd.DataFrame, pd.DataFrame]:
# fetch dataset
diabetes_binary = fetch_ucirepo(id=891)
# data (as pandas dataframes)
X = diabetes_binary.data.features
y = diabetes_binary.data.targets
return X, y
def prep_dataset(X: pd.DataFrame, y: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame, pd.DataFrame]:
# split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
return X_train, X_test, y_train, y_test
def train_save_model(X_train: pd.DataFrame, y_train: pd.DataFrame) -> XGBClassifier:
# train a model
xgb_params = {
"objective": "binary:logistic",
"n_estimators": 2000,
"max_depth": 13,
"learning_rate": 0.1,
"tree_method": "gpu_hist",
}
model = XGBClassifier(**xgb_params)
model.fit(X_train, y_train["Diabetes_binary"])
model.save_model("xgb_model.json")
return model
if __name__ == '__main__':
if xgb_version != '1.4.2':
print("Training only on 1.4.2.")
exit(1)
X, y = download_data()
X_train, X_test, y_train, y_test = prep_dataset(X, y)
model = train_save_model(X_train, y_train) Shap calc: from typing import Tuple
import pandas as pd
from ucimlrepo import fetch_ucirepo
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier, DMatrix, __version__ as xgb_version
def download_data() -> Tuple[pd.DataFrame, pd.DataFrame]:
# fetch dataset
diabetes_binary = fetch_ucirepo(id=891)
# data (as pandas dataframes)
X = diabetes_binary.data.features
y = diabetes_binary.data.targets
return X, y
def prep_dataset(X: pd.DataFrame, y: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame, pd.DataFrame]:
# split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
return X_train, X_test, y_train, y_test
def load_model(file_path: str) -> XGBClassifier:
model = XGBClassifier()
model.load_model(file_path)
return model
def call_shap_values(model: XGBClassifier, test_data: pd.DataFrame) -> pd.DataFrame:
booster = model.get_booster()
booster.set_param({"predictor": "gpu_predictor"})
dmatrix = DMatrix(test_data)
shap_values = booster.predict(dmatrix, pred_contribs=True)
shap_values_df = pd.DataFrame(shap_values[:, :-1], columns=test_data.columns)
shap_values_df["base_value"] = shap_values[:, -1]
shap_values_df.to_csv(f"shap_values_{xgb_version}.csv", index=False)
return shap_values_df
if __name__ == '__main__':
X, y = download_data()
X_train, X_test, y_train, y_test = prep_dataset(X, y)
model = load_model("./xgb_model.json")
calc_save_shap_vals = call_shap_values(model, X_test) As you can probably see from the screenshots these tests were run against a Win10 machine, as I had some trouble running nsight-system on the remote instance I previously used. I've found out that this problem appears from as soon as the 1.5.0 version. Looking at the patch notes there is the following sentence "Most of the other features, including prediction, SHAP value computation, feature In any case, please inform me if there is an issue with the testing methodology, I think it is more precise this time. As a side note, I have another question -- isn't using nsight-system equilvalent (for the purposes of memory usage measurement) to calling nvidia-smi with high enough frequency (like 10kHZ) and logging the results? |
Thank you for sharing the info and reminding me of the categorical feature support. Yes, I can confirm the memory usage increase and it is indeed caused by categorical support. Specifically, this member variable xgboost/src/predictor/gpu_predictor.cu Line 427 in 197c0ae
Probably, the underlying mechanism of event sampling is beyond my knowledge. |
I've noticed that using pred_contribs to generate shap values takes significantly more gpu memory in XGBoost 2.1.1 vs 1.4.2.
This can lead to having issues with generating shap values, where no issue was previously present.
GPU memory comparison:
1.4.2 - 3090
1.7.6 - 4214
2.1.1 - 5366
Short example used to demonstrate:
with the following bash script used for generating memory usage:
All tests run on Ubuntu 20.04.6 LTS.
Requirements with only the xgb version (and the device/tree method parameters) being changed between tests:
The text was updated successfully, but these errors were encountered: