-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
Problem
xgboost changed the order of the RabitTracker constructor parameters in 2.1.0
In 2.0.3, host_ip comes first
In 2.1.0, host_ip is second.
This breaks the call here.
Steps to reproduce
-
Create a new venv. Tested with python 3.10.13
-
Install packages
pip install xgboost_ray==0.1.19 xgboost==2.1.0 scikit-learn ray[train]
- Run example:
from xgboost_ray import RayDMatrix, RayParams, train
from sklearn.datasets import load_breast_cancer
train_x, train_y = load_breast_cancer(return_X_y=True)
train_set = RayDMatrix(train_x, train_y)
evals_result = {}
bst = train(
{
"objective": "binary:logistic",
"eval_metric": ["logloss", "error"],
},
train_set,
evals_result=evals_result,
evals=[(train_set, "train")],
verbose_eval=False,
ray_params=RayParams(
num_actors=2, # Number of remote actors
cpus_per_actor=1))
bst.save_model("model.xgb")
print("Final training error: {:.4f}".format(
evals_result["train"]["error"][-1]))
Error
2024-07-09 15:35:11,003 INFO main.py:1191 -- [RayXGBoost] Starting XGBoost training.
Traceback (most recent call last):
File "/home/jovyan/run.py", line 10, in <module>
bst = train(
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 1612, in train
bst, train_evals_result, train_additional_results = _train(
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 1194, in _train
rabit_process, rabit_args = _start_rabit_tracker(alive_actors)
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost_ray/main.py", line 261, in _start_rabit_tracker
rabit_tracker = _RabitTracker(host, num_workers)
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost/tracker.py", line 64, in __init__
get_family(host_ip) # use python socket to stop early for invalid address
File "/home/jovyan/venv/lib/python3.10/site-packages/xgboost/tracker.py", line 14, in get_family
return socket.getaddrinfo(addr, None)[0][0]
File "/opt/conda/lib/python3.10/socket.py", line 955, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
TypeError: getaddrinfo() argument 1 must be string or None
Proposed solution
Pin the xgboost dependency to <2.1.0
OR
change this line to
rabit_tracker = _RabitTracker(host_ip=host, n_workers=num_workers)
neggert and alexanderhanboli
Metadata
Metadata
Assignees
Labels
No labels