Skip to content

[FEA] Convert Dask into an optional dependency #5934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
beckernick opened this issue Jun 14, 2024 · 1 comment · May be fixed by #6668
Open

[FEA] Convert Dask into an optional dependency #5934

beckernick opened this issue Jun 14, 2024 · 1 comment · May be fixed by #6668
Assignees
Labels
cuml-cpu Cython / Python Cython or Python issue feature request New feature or request

Comments

@beckernick
Copy link
Member

Dask is the primary runtime through which cuML provides multi-GPU functionality in Python. As a result, it's historically been a required cuML dependency. This is a non-issue for users who want the multi-GPU capabilities in cuML.

However, for users who only want to use cuML for single GPU use cases, this can cause unnecessary dependency alignment challenges in their environments -- because it's common practice to tightly pin Dask versions to avoid breakages in stable releases and platforms. We don't expect this behavior to change.

We should explore converting Dask into an optional dependency only necessary if someone attempts to use the Dask multi-GPU capabilities in cuML.

@beckernick beckernick added feature request New feature or request Cython / Python Cython or Python issue labels Jun 14, 2024
@dantegd
Copy link
Member

dantegd commented Jun 14, 2024

@beckernick the code in cuML already has this capability (it enables cuml-cpu), all the way to the import structure. The main challenge is making it super easy to users to install the correct versions of: dask, dask-cuda, raft-dask.

There are a few ways we can tackle this problem that I can think off of the top of my head:

  • Simply add the instructions to install the correct versions of those packages to documentation and warnings of cuML. This is the easiest from an engineering perspective, but probably the least ideal from a user perspective.
  • Create a conda metapackage, cuml-dask, and perhaps an option for pip wheels (kind of like cuml[optionals] or cuml[dask]) to make it very easy. Has some additional engineering cost, but as far as I can see it, it's not particularly big, since it's not that much more than what we already do.
  • Perhaps other solutions I have not thought of yet.

What are your thoughts about that @beckernick ?

@jcrist jcrist self-assigned this Jan 21, 2025
rapids-bot bot pushed a commit that referenced this issue Apr 29, 2025
This PR started because I noticed most of the functions in `import_utils.py` were effectively dead code. It spiraled out a bit from there to remove _almost all_ of `import_utils.py` in favor of:

- Using `pytest.importorskip` to handle conditional imports in tests. This is the proper `pytest` pattern to do this, and is both easier to get correct and uses fewer lines of code.
- Localized import checks (see `hdbscan.pyx`). Keeping the import check local to the module is easier to read IMO, and also helps avoid dead code accumulating in a global utils file.
- No import checks. Things like `cupy` are required dependencies and don't need to be gated at all.

The remaining functions will be removed in other PRs:

- `has_dask` will go away once `dask` is made optional (see #5934)
- `has_scipy` is removed in #6596
- `has_sklearn` will go away once `sklearn` is made a required dependency

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: #6599
@jcrist jcrist linked a pull request May 9, 2025 that will close this issue
Ofek-Haim pushed a commit to Ofek-Haim/cuml that referenced this issue May 13, 2025
This PR started because I noticed most of the functions in `import_utils.py` were effectively dead code. It spiraled out a bit from there to remove _almost all_ of `import_utils.py` in favor of:

- Using `pytest.importorskip` to handle conditional imports in tests. This is the proper `pytest` pattern to do this, and is both easier to get correct and uses fewer lines of code.
- Localized import checks (see `hdbscan.pyx`). Keeping the import check local to the module is easier to read IMO, and also helps avoid dead code accumulating in a global utils file.
- No import checks. Things like `cupy` are required dependencies and don't need to be gated at all.

The remaining functions will be removed in other PRs:

- `has_dask` will go away once `dask` is made optional (see rapidsai#5934)
- `has_scipy` is removed in rapidsai#6596
- `has_sklearn` will go away once `sklearn` is made a required dependency

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: rapidsai#6599
rapids-bot bot pushed a commit that referenced this issue May 23, 2025
This is split out from #6668 and contains just the code changes. After this PR `dask`, `distributed`, and friends won't be imported on `import cuml`, they'll only be imported after `import cuml.dask`. This speeds up import time, simplifies our codebase a bit, and should be a non-controversial step towards #5934.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Tim Head (https://github.com/betatim)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #6788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuml-cpu Cython / Python Cython or Python issue feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants