Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Expose runtime configurability of default stream behavior #17626

Open
vyasr opened this issue Dec 19, 2024 · 0 comments
Open

[FEA] Expose runtime configurability of default stream behavior #17626

vyasr opened this issue Dec 19, 2024 · 0 comments
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. pylibcudf Issues specific to the pylibcudf package

Comments

@vyasr
Copy link
Contributor

vyasr commented Dec 19, 2024

Is your feature request related to a problem? Please describe.
Currently all of libcudf operates on the default stream (stream 0) by default, and on cudaStreamPerThread if compiled with CUDF_USE_PER_THREAD_DEFAULT_STREAM. Some consumers of libcudf who wish to use the per-thread default stream instead for various reasons such as improved performance. Historically, we have supported this by compiling with CUDA_API_PER_THREAD_DEFAULT_STREAM and CUDF_USE_PER_THREAD_DEFAULT_STREAM because compile-time control was the only reasonable way to achieve this, and consumers like spark-rapids leverage this. However, as #13744 comes to a close we will have a fully stream-ordered API that is also completely tested to ensure that streams are being passed through everywhere to ensure that nothing is unintentionally running on the default stream if the user provides one. This fact affords us some additional options when it comes to enabling PTDS behavior.

Describe the solution you'd like
We should modify get_default_stream to support runtime configurability of its behavior to mean PTDS instead of every thread running on the default stream. This could easily be done in a thread-safe manner using a function-local static

rmm::cuda_stream_view const get_default_stream() {
  static const default_stream = []() {
    if(getenv("PTDS")) {
      return rmm::cuda_stream_per_thread;
    } else {
      return rmm::cuda_stream_legacy;
    }
  }();
return default_stream; }

The above uses an environment variable, but we could just as easily expose a public API that would set some configuration that must be called before the first call to get_default_stream. The end result would be that we would entirely control the default stream behavior at runtime without needing to build separate binaries to support PTDS. This would allow us to support various newer higher-level APIs, such as pylibcudf, while still supporting Spark's needs.

Describe alternatives you've considered
We could ship separate binaries compiled for PTDS, or change our default to always build with PTDS. The former has generally been rejected on the grounds of requiring double the resources, though, while the latter was previously attempted but rejected due to PTDS builds of cudf not being safe drop-in replacements for non-PTDS builds because PTDS allows for race conditions that would not be possible with non-PTDS builds.

@vyasr vyasr added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. pylibcudf Issues specific to the pylibcudf package labels Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. pylibcudf Issues specific to the pylibcudf package
Projects
Status: Todo
Development

No branches or pull requests

1 participant