[FEA] Expose runtime configurability of default stream behavior #17626

vyasr · 2024-12-19T01:52:22Z

Is your feature request related to a problem? Please describe.
Currently all of libcudf operates on the default stream (stream 0) by default, and on cudaStreamPerThread if compiled with CUDF_USE_PER_THREAD_DEFAULT_STREAM. Some consumers of libcudf who wish to use the per-thread default stream instead for various reasons such as improved performance. Historically, we have supported this by compiling with CUDA_API_PER_THREAD_DEFAULT_STREAM and CUDF_USE_PER_THREAD_DEFAULT_STREAM because compile-time control was the only reasonable way to achieve this, and consumers like spark-rapids leverage this. However, as #13744 comes to a close we will have a fully stream-ordered API that is also completely tested to ensure that streams are being passed through everywhere to ensure that nothing is unintentionally running on the default stream if the user provides one. This fact affords us some additional options when it comes to enabling PTDS behavior.

Describe the solution you'd like
We should modify get_default_stream to support runtime configurability of its behavior to mean PTDS instead of every thread running on the default stream. This could easily be done in a thread-safe manner using a function-local static

rmm::cuda_stream_view const get_default_stream() {
  static const default_stream = []() {
    if(getenv("PTDS")) {
      return rmm::cuda_stream_per_thread;
    } else {
      return rmm::cuda_stream_legacy;
    }
  }();
return default_stream; }

The above uses an environment variable, but we could just as easily expose a public API that would set some configuration that must be called before the first call to get_default_stream. The end result would be that we would entirely control the default stream behavior at runtime without needing to build separate binaries to support PTDS. This would allow us to support various newer higher-level APIs, such as pylibcudf, while still supporting Spark's needs.

Describe alternatives you've considered
We could ship separate binaries compiled for PTDS, or change our default to always build with PTDS. The former has generally been rejected on the grounds of requiring double the resources, though, while the latter was previously attempted but rejected due to PTDS builds of cudf not being safe drop-in replacements for non-PTDS builds because PTDS allows for race conditions that would not be possible with non-PTDS builds.

The text was updated successfully, but these errors were encountered:

vyasr added feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. pylibcudf Issues specific to the pylibcudf package labels Dec 19, 2024

github-project-automation bot added this to cuDF Python Dec 19, 2024

github-project-automation bot moved this to Todo in cuDF Python Dec 19, 2024

vyasr mentioned this issue Dec 20, 2024

RFC: Enable per-thread default stream in free-threading builds NVIDIA/cuda-python#133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Expose runtime configurability of default stream behavior #17626

[FEA] Expose runtime configurability of default stream behavior #17626

vyasr commented Dec 19, 2024

[FEA] Expose runtime configurability of default stream behavior #17626

[FEA] Expose runtime configurability of default stream behavior #17626

Comments

vyasr commented Dec 19, 2024