Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to XGBoost 2.1.0 #167

Merged
merged 123 commits into from
Jul 26, 2024
Merged

Upgrade to XGBoost 2.1.0 #167

merged 123 commits into from
Jul 26, 2024

Conversation

hcho3
Copy link
Contributor

@hcho3 hcho3 commented Jun 29, 2024

Checklist

Fixes #172

@hcho3
Copy link
Contributor Author

hcho3 commented Jun 29, 2024

@conda-forge-admin, please rerender

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

  • It looks like the 'libxgboost' output doesn't have any tests.
  • It looks like the '_py-xgboost-mutex' output doesn't have any tests.
  • It looks like the 'py-xgboost-cpu' output doesn't have any tests.
  • It looks like the 'py-xgboost-gpu' output doesn't have any tests.
  • It looks like the 'xgboost' output doesn't have any tests.
  • It looks like the '_r-xgboost-mutex' output doesn't have any tests.
  • It looks like the 'r-xgboost-cpu' output doesn't have any tests.
  • It looks like the 'r-xgboost-gpu' output doesn't have any tests.

@jakirkham
Copy link
Member

@conda-forge-admin , please restart CI

@jakirkham
Copy link
Member

Did the source layout change somehow? CI is noting that it had trouble finding the xgboost directory

Not seeing this issue when building for the previous release

@trivialfis
Copy link

The rabit directory is removed.

This directory doesn't exist in XGBoost 2.1.0.
@jakirkham
Copy link
Member

Still seeing the same error on CI:

cp: cannot stat ‘/home/conda/feedstock_root/build_artifacts/xgboost-split_1719641630231/work/xgboost’: No such file or directory

Are there other layout changes that occurred?

recipe/meta.yaml Outdated Show resolved Hide resolved
@trivialfis
Copy link

Is It possible that we skip R build on this release? The rewrite is still on going.

recipe/bld.bat Outdated
Comment on lines 6 to 18
:: CCCL has different logic depending on the `__cplusplus` macro.
:: However Visual Studio leaves `__cplusplus` as `199711L` by default.
:: As a result code paths needing new C++ standard won't be run.
:: This happens in spite of the fact that VS 2019 & 2022 default to C++14.
:: To make sure `__cplusplus` matches, we pass an additional flag to VC.
:: Ideally other tooling would do this for us, for now we do it ourselves.
::
:: https://learn.microsoft.com/en-us/cpp/build/reference/zc-cplusplus?view=msvc-160
:: https://learn.microsoft.com/en-us/cpp/build/reference/zc-cplusplus?view=msvc-170
:: https://github.com/NVIDIA/cccl/blob/82a3ed0282893d6316abde04394f59a9a37a3747/libcudacxx/include/nv/target#L27-L30
:: https://github.com/NVIDIA/cutlass/issues/1403#issuecomment-2130383392
:: https://gitlab.kitware.com/cmake/cmake/-/issues/18837
set "CXXFLAGS=%CXXFLAGS% /Zc:__cplusplus"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok did a bit of a deep dive and have a theory of what is going on. Tried to explain this in the comment above

Basically some C++ detection logic in libraries (like CCCL) doesn't work correct with Visual Studio due to its default behavior. Have added a flag to Visual Studio to make it conform more to typical C++ compiler behavior (setting __cplusplus correctly)

Based on similar experiences from Visual Studio users with CUTLASS ( NVIDIA/cutlass#1403 (comment) ), who saw a similar error to us and fixed it with this flag, am hopeful this will help in our case too

Ideally other tooling would handle this for us, but it appears CMake hasn't done this yet and Visual Studio continues to require users to add this flag if they want a correct __cplusplus value. So for now adding to this build script. If this works, we may want to add this to XGBoost as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have raised an upstream CCCL issue: NVIDIA/cccl#2072

Though after looking more closely at the code there, they are probably ok with their current approach. This would be a potential simplification for them

@jakirkham
Copy link
Member

jakirkham commented Jul 25, 2024

Let's consider dropping 11.8 for Windows then?

If the latest change doesn't work, then we can turn this into an issue and skip it

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/xgboost-feedstock/actions/runs/10088820119.

r_base:
- '4.1'
python:
- 3.8.* *_cpython
target_platform:
- win-64
Copy link
Member

@jakirkham jakirkham Jul 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Windows CUDA 11.8 file. It should be removed by the skip, but it is not for some reason. Will need to take a closer look

Note part of the issue here is even if the build doesn't create any packages simply downloading the CTK can take over 20mins. So will want to make sure this job is properly removed so it isn't wasting CI cycles

Edit: Looks like the build is still running, which is also an issue. So the skip needs a fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakirkham I added back noarch: python and re-rendered the recipe. It looks like our replacement for noarch: python somehow broke the selector logic. Once I added back noarch: python to meta.yaml, the build for Windows CUDA 11.8 is finally removed.

recipe/meta.yaml Outdated Show resolved Hide resolved
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found some lint.

Here's what I've got...

For recipe/meta.yaml:

  • noarch packages can't have skips with selectors. If the selectors are necessary, please remove noarch: python.

recipe/meta.yaml Outdated
Comment on lines 27 to 32
noarch: python
# Windows CUDA 11.8 fails due to a compilation error
# xref: https://github.com/conda-forge/xgboost-feedstock/issues/173
skip: True # [win and cuda_compiler_version == "11.8"]
{% if python is defined %}
skip: {{ python.split(".")[:2] != min_python.split(".")[:2] }}
{% endif %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good intuition Hyunsu! 🙏

Think we can fix this up to handle both cases then and fix the linter

Though ok doing this as a follow up

@jakirkham
Copy link
Member

Thanks Hyunsu

Looks good. Please feel free to add the automerge label when you are ready

The bot will then merge when CI completes. If that doesn't happen for some reason, it will leave a comment letting us know why

@hcho3 hcho3 added the automerge Merge the PR when CI passes label Jul 26, 2024
Copy link
Contributor

Hi! This is the friendly conda-forge automerge bot!

I considered the following status checks when analyzing this PR:

  • linter: failed
  • azure: passed

Thus the PR was not passing and not merged.

@jakirkham jakirkham merged commit be9dd6a into conda-forge:main Jul 26, 2024
14 of 15 checks passed
@jakirkham
Copy link
Member

Thanks Hyunsu! 🙏

@trivialfis
Copy link

trivialfis commented Jul 27, 2024

Hi @jakirkham @hcho3 is it normal to have this under a CUDA 12.5 environment:

  + libxgboost                                   2.1.0  cuda120_h9dfd3e9_1                 conda-forge/linux-64       93MB
  + xgboost                                      2.1.0  cuda118_pyh98e67c5_1               conda-forge/noarch         17kB

I'm updating an environment that has XGBoost as a dependency. The above two lines are part of the mamba output, and it seems a bit weird to me to have mismatched CUDA versions.

Second question, since all C++ includes are distributed, maybe it makes sense to distribute the CMake config files as well?

@jakirkham
Copy link
Member

jakirkham commented Jul 31, 2024

@trivialfis this is a good question

AIUI in terms of CUDA usage, only libxgboost has CUDA bits. So xgboost doesn't have CUDA bits (as it is pure Python)

That said, it certainly is confusing to have xgboost mislabeled in this way. Also users that constrain xgboost based on CUDA version may also expect this to carry through to libxgboost

So agree we should fix this. Though think the risk is relatively low. Started doing this in PR: #188

@hcho3 hcho3 deleted the xgb_210 branch July 31, 2024 17:36
@jakirkham
Copy link
Member

Second question, since all C++ includes are distributed, maybe it makes sense to distribute the CMake config files as well?

Also sorry for not answering the CMake question earlier

Thank you for opening a new issue to discuss: #189

Hopefully the answers there are helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge the PR when CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Package XGBoost 2.1.0
5 participants