Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Expose public stream-ordered C++ APIs #13744

Open
vyasr opened this issue Jul 24, 2023 · 5 comments · Fixed by #16317
Open

[FEA] Expose public stream-ordered C++ APIs #13744

vyasr opened this issue Jul 24, 2023 · 5 comments · Fixed by #16317
Labels
2 - In Progress Currently a work in progress feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.

Comments

@vyasr
Copy link
Contributor

vyasr commented Jul 24, 2023

Is your feature request related to a problem? Please describe.
Currently most libcudf APIs operate implicitly on the default CUDA stream. As documented in #925, we would like to expose stream-ordered APIs publicly. Prior to doing so, we needed to do a great deal of work to ensure that libcudf's internals were properly stream-ordered. Among the important tasks here were removing implicit APIs that ran on a stream (see #11968) and implementing a strategy for verifying that streams are properly forwarded from APIs all the way down to the lowest level functions (see #11943). Now that all such work that we are aware of has been completed, libcudf should start exposing streams in public APIs.

We originally considered adding stream support via a monolithic feature branch where APIs would be modified one at a time. From various discussions, however, we decided that the overhead of maintaining a feature branch would grow too large to be worthwhile, so we have instead decided to proceed by incrementally exposing streams in libcudf APIs over time. PRs adding new APIs should always include a stream in the public API. Existing APIs will be transitioned one header file at a time, with PRs going directly into the current main branch of cudf.

The purpose of this PR is to document progress on the task of transitioning existing headers, as well as to describe the process by which new stream-ordered APIs should be added.

Describe the solution you'd like
When adding stream support to an API, the stream parameter should be placed just before the mr parameter with a default value of cudf::get_default_stream(). A new test should be added that validates that the API properly forwards the stream to all CUDA calls, which can be done by following the instructions at the bottom of libcudf's testing developer docs.

To divide up the work, we consider public headers organized by directories (all relative to cpp/include/cudf). Devs may assign themselves a directory within which to own all headers and the public APIs contained therein.

/cudf APIs

/nvtext APIs

Additional context
Before we began making public stream parameters, libcudf public APIs would be paired with detail APIs that are nearly identical except for having a required stream parameter. In some cases, though, the public API may not currently have a detail "mirror" function but instead call multiple detail APIs. In those cases we may need to make a case-by-case determination of whether a new detail function should also be added. Some discussion of whether we want this pattern to be present everywhere, and the reasons why we might want that, are discussed in #14276.

Public stream API readiness:

  • revamp the entire test suite to use "test streams"
  • create a few multi-stream benchmarks
  • create a multi-stream example
@vyasr vyasr added feature request New feature or request Needs Triage Need team to review and classify labels Jul 24, 2023
@vyasr vyasr added this to the Enable streams milestone Jul 24, 2023
@GregoryKimball GregoryKimball added libcudf Affects libcudf (C++/CUDA) code. 2 - In Progress Currently a work in progress and removed Needs Triage Need team to review and classify labels Jul 25, 2023
@GregoryKimball GregoryKimball moved this to Story Issue in libcudf Aug 2, 2023
rapids-bot bot pushed a commit that referenced this issue Sep 18, 2023
Add stream parameter to public strings APIs:
- `cudf::strings::capitalize()` 
- `cudf::strings::title()`
- `cudf::strings::is_title()`
- `cudf::strings::to_lower()`
- `cudf::strings::to_upper()`
- `cudf::strings::swapcase()`

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14056
rapids-bot bot pushed a commit that referenced this issue Sep 21, 2023
Add stream parameter to public APIs:

- `cudf::strings::find()`
- `cudf::strings::rfind()`
- `cudf::strings::contains()`
- `cudf::strings::starts_with()`
- `cudf::strings::ends_with()`
- `cudf::strings::findall()`
- `cudf::strings::find_multiple()`

Also cleaned up some of the doxygen comments. 

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #14060
rapids-bot bot pushed a commit that referenced this issue Sep 22, 2023
Add stream parameter to public APIs:

- `nvtext::generate_ngrams()`
- `nvtext::generate_character_ngrams()`
- `nvtext::hash_character_ngrams()`
- `nvtext::ngrams_tokenize()`

Also cleaned up some of the doxygen comments.
And also fixed a spelling mistake in the jaccard.cu source that was bothering me. 

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14061
rapids-bot bot pushed a commit that referenced this issue Sep 25, 2023
This PR adds stream parameter to public dictionary APIs, which include:

1. `cudf::dictionary::encode`
2. `cudf::dictionary::decode`
3. `cudf::dictionary::get_index`
4. `cudf::dictionary::add_keys`
5. `cudf::dictionary::remove_keys`
6. `cudf::dictionary::remove_unused_keys`
7. `cudf::dictionary::set_keys` 
8. `cudf::dictionary::match_dictionaries`

Reference [13744](#13744)

Authors:
  - Suraj Aralihalli (https://github.com/SurajAralihalli)
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14115
rapids-bot bot pushed a commit that referenced this issue Oct 10, 2023
Add stream parameter to public APIs:

- `cudf::strings::strip()`
- `cudf::strings::slice_strings()`
- `cudf::strings::pad()`
- `cudf::strings::zfill()`
- `cudf::strings::wrap()`

Also cleaned up some of the doxygen comments and added stream-tests.

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14260
rapids-bot bot pushed a commit that referenced this issue Oct 11, 2023
Follow on to PR #13997 which did not include all the split APIs or a stream test.
Add stream parameter to public APIs:

- `cudf::strings::partition()`
- `cudf::strings::rpartition()`
- `cudf::strings::split_re()`
- `cudf::strings::rsplit_re()`
- `cudf::strings::split_record_re()`
- `cudf::strings::rsplit_record_re()`

Also cleaned up some of the doxygen comments. 

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Mark Harris (https://github.com/harrism)
  - Bradley Dice (https://github.com/bdice)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14247
rapids-bot bot pushed a commit that referenced this issue Oct 12, 2023
Add stream parameter to public APIs:

- `cudf::strings::replace()` (x2)
- `cudf::strings::replace_slice()`
- `cudf::strings::replace_re()` (x2)
- `cudf::strings::replace_with_backrefs()`

Also cleaned up some of the doxygen comments and added stream-tests.

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14261
rapids-bot bot pushed a commit that referenced this issue Oct 16, 2023
Add stream parameter to public APIs:

- `cudf::strings::to_booleans()`
- `cudf::strings::from_booleans()`
- `cudf::strings::to_timestamps()`
- `cudf::strings::from_timestamps()`
- `cudf::strings::is_timestamp()`
- `cudf::strings::to_durations()`
- `cudf::strings::from_durations()`
- `cudf::strings::to_fixed_point()`
- `cudf::strings::from_fixed_point()`
- `cudf::strings::to_floats()`
- `cudf::strings::from_floats()`
- `cudf::strings::is_float()`
- `cudf::strings::to_integers()`
- `cudf::strings::from_integers()`
- `cudf::strings::is_integer()`
- `cudf::strings::hex_to_integers()`
- `cudf::strings::integers_to_hex()`
- `cudf::strings::is_hex()`
- `cudf::strings::ipv4_to_integers()`
- `cudf::strings::integers_to_ipv4()`
- `cudf::strings::is_ipv4()`
- `cudf::strings::url_encode()`
- `cudf::strings::url_decode()`
- `cudf::strings::format_list_column()`

Also cleaned up some of the doxygen comments and removed some default parameters.

Reference #13744

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14255
rapids-bot bot pushed a commit that referenced this issue Oct 16, 2023
This PR introduces the stream parameter to the List Sorting and Filtering APIs.

Sorting and Filtering (`extract.hpp`, `filling.hpp`, `gather.hpp`, `reverse.hpp`, `sorting.hpp`, `stream_compaction.hpp`)

```
extract_list_element - index
extract_list_element - indices
segmented_gather
sequences - without steps
sequences - with steps
reverse
sort_lists
stable_sort_lists
apply_boolean_mask
distinct
```

Reference [13744](#13744)

Authors:
  - Suraj Aralihalli (https://github.com/SurajAralihalli)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: #14272
@GregoryKimball
Copy link
Contributor

Thanks @lamarrr for reworking this issue!

rapids-bot bot pushed a commit that referenced this issue Sep 20, 2024
Adds stream ordering to the public join APIs:

- `inner_join`
- `left_join`
- `full_join`
- `left_semi_join`
- `left_anti_join`
- `cross_join`
- `conditional_inner_join`
- `conditional_left_join`
- `conditional_full_join`
- `conditional_left_semi_join`
- `conditional_left_anti_join`
- `mixed_inner_join`
- `mixed_left_join`
- `mixed_full_join`
- `mixed_left_semi_join`
- `mixed_left_anti_join`
- `mixed_inner_join_size`
- `mixed_left_join_size`
- `conditional_inner_join_size`
- `conditional_left_join_size`
- `conditional_left_semi_join_size`
- `conditional_left_anti_join_size`

closes #16792 
follows up #13744

Authors:
  - Basit Ayantunde (https://github.com/lamarrr)

Approvers:
  - Paul Mattione (https://github.com/pmattione-nvidia)
  - Nghia Truong (https://github.com/ttnghia)
  - David Wendt (https://github.com/davidwendt)

URL: #16793
rapids-bot bot pushed a commit that referenced this issue Sep 21, 2024
This merge request exposes stream-ordering to the public-facing datetime APIs.

- `extract_year`
- `extract_month`
- `extract_day`
- `extract_weekday`
- `extract_hour`
- `extract_minute`
- `extract_second`
- `extract_millisecond_fraction`
- `extract_microsecond_fraction`
- `extract_nanosecond_fraction`
- `last_day_of_month`
- `day_of_year`
- `add_calendrical_months`
- `is_leap_year`
- `days_in_month`
- `extract_quarter`
- `ceil_datetimes`
- `floor_datetimes`
- `round_datetimes`



 Follows-up #13744
Closes #16775

Authors:
  - Basit Ayantunde (https://github.com/lamarrr)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #16774
rapids-bot bot pushed a commit that referenced this issue Oct 31, 2024
Add stream parameter to public APIs:
```
cudf::partition
cudf::round_robin_partition
```
Added stream gtest for above two functions and for `hash_partition`.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #17213
rapids-bot bot pushed a commit that referenced this issue Nov 1, 2024
Contributes to #13744

Authors:
  - Matthew Murray (https://github.com/Matt711)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #16925
rapids-bot bot pushed a commit that referenced this issue Nov 4, 2024
Add stream parameter to public APIs:
```
nvtext::subword_tokenize
nvtext::load_vocabulary_file
```
Added stream gtest.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - David Wendt (https://github.com/davidwendt)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: #17206
rapids-bot bot pushed a commit that referenced this issue Nov 12, 2024
Adds stream parameter to 
```
cudf::quantile
cudf::quantiles
cudf::percentile_approx
```
Added stream gtests to verify correct stream forwarding.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - Paul Mattione (https://github.com/pmattione-nvidia)
  - David Wendt (https://github.com/davidwendt)

URL: #17257
rapids-bot bot pushed a commit that referenced this issue Nov 13, 2024
Adds stream parameter to `cudf::transpose`.
Verifies correct stream forwarding with stream gtests.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - Nghia Truong (https://github.com/ttnghia)
  - David Wendt (https://github.com/davidwendt)

URL: #17294
rapids-bot bot pushed a commit that referenced this issue Nov 21, 2024
Adds stream parameter to
```
cudf::from_dlpack
cudf::to_dlpack
```

Added stream gtests to verify correct stream forwarding.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Tianyu Liu (https://github.com/kingcrimsontianyu)

URL: #17397
rapids-bot bot pushed a commit that referenced this issue Nov 25, 2024
Adds stream parameter to
```
cudf::groupby::scan
cudf::groupby::aggregate
cudf::groupby::shift
cudf::groupby::get_groups
cudf::groupby::replace_nulls
```

Added stream gtests to verify correct stream forwarding.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - Paul Mattione (https://github.com/pmattione-nvidia)
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #17324
rapids-bot bot pushed a commit that referenced this issue Nov 25, 2024
Adds stream parameter to
```
cudf::strings::count_characters
cudf::strings::count_bytes
cudf::strings::code_points
```
Added stream gtests to verify correct stream forwarding.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

URL: #17398
rapids-bot bot pushed a commit that referenced this issue Dec 1, 2024
Adds stream parameter to
```
cudf::normalize_nans_and_zeros
```
Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)

Approvers:
  - MithunR (https://github.com/mythrocks)
  - Nghia Truong (https://github.com/ttnghia)

URL: #17436
rapids-bot bot pushed a commit that referenced this issue Dec 4, 2024
Adds stream parameter to
```
cudf::nvtext::byte_pair_encoding
```
Added stream gtests to verify correct stream forwarding.

Reference: #13744

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)

URL: #17446
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress feature request New feature or request libcudf Affects libcudf (C++/CUDA) code.
Projects
Status: Story Issue
Development

Successfully merging a pull request may close this issue.

5 participants