Releases · rapidsai/cudf

29 Jun 13:28

raydouglass

v23.06.01

6a548b0

v23.06.01

🚨 Breaking Changes

Fix batch processing for parquet writer (#13438) @ttnghia
Use <NA> instead of null to match pandas. (#13415) @bdice
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Use std::overflow_error when output would exceed column size limit (#13323) @davidwendt
Remove null mask and null count from column_view constructors (#13311) @vyasr
Change default value of the observed= argument in groupby to True to reflect the actual behaviour (#13296) @shwina
Throw error if UNINITIALIZED is passed to cudf::state_null_count (#13292) @davidwendt
Remove default null-count parameter from cudf::make_strings_column factory (#13227) @davidwendt
Remove UNKNOWN_NULL_COUNT where it can be easily computed (#13205) @vyasr
Update minimum Python version to Python 3.9 (#13196) @shwina
Refactor contiguous_split API into contiguous_split.hpp (#13186) @abellina
Cleanup Parquet chunked writer (#13094) @ttnghia
Cleanup ORC chunked writer (#13091) @ttnghia
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Remove deprecated regex functions from libcudf (#13067) @davidwendt
[REVIEW] Upgrade to arrow-11 (#12757) @galipremsagar
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🐛 Bug Fixes

Fix valid count computation in offset_bitmask_binop kernel (#13489) @davidwendt
Fix writing of ORC files with empty rowgroups (#13466) @vuule
Fix cudf::repeat logic when count is zero (#13459) @davidwendt
Fix batch processing for parquet writer (#13438) @ttnghia
Fix invalid use of std::exclusive_scan in Parquet writer (#13434) @etseidl
Patch numba if it is imported first to ensure minor version compatibility works. (#13433) @bdice
Fix cudf::strings::replace_with_backrefs hang on empty match result (#13418) @davidwendt
Use <NA> instead of null to match pandas. (#13415) @bdice
Fix tokenize with non-space delimiter (#13403) @shwina
Fix groupby head/tail for empty dataframe (#13398) @shwina
Default to closed="right" in IntervalIndex constructor (#13394) @shwina
Correctly reorder and reindex scan groupbys with null keys (#13389) @wence-
Fix unused argument errors in nvcc 11.5 (#13387) @abellina
Updates needed to work with jitify that leverages libcudacxx (#13383) @robertmaynard
Fix unused parameter warning/error in parquet/page_data.cu (#13367) @davidwendt
Fix page size estimation in Parquet writer (#13364) @etseidl
Fix subword_tokenize error when input contains no tokens (#13320) @davidwendt
Support gcc 12 as the C++ compiler (#13316) @robertmaynard
Correctly set bitmask size in from_column_view (#13315) @wence-
Fix approach to detecting assignment for gte/lte operators (#13285) @vyasr
Fix parquet schema interpretation issue (#13277) @hyperbolic2346
Fix 64bit shift bug in avro reader (#13276) @karthikeyann
Fix unused variables/parameters in parquet/writer_impl.cu (#13263) @davidwendt
Clean up buffers in case AssertionError (#13262) @razajafri
Allow empty input table in ast compute_column (#13245) @wence-
Fix structs_column_wrapper constructors to copy input column wrappers (#13243) @davidwendt
Fix the row index stream order in ORC reader (#13242) @vuule
Make is_decompression_disabled and is_compression_disabled thread-safe (#13240) @vuule
Add [[maybe_unused]] to nvbench environment. (#13219) @bdice
Fix race in ORC string dictionary creation (#13214) @revans2
Add scalar argtypes to udf cache keys (#13194) @brandon-b-miller
Fix unused parameter warning/error in grouped_rolling.cu (#13192) @davidwendt
Avoid skbuild 0.17.2 which affected the cmake -DPython_LIBRARY string (#13188) @sevagh
Fix hostdevice_vector::subspan (#13187) @ttnghia
Use custom nvbench entry point to ensure cudf::nvbench_base_fixture usage (#13183) @robertmaynard
Fix slice_strings to return empty strings for stop < start indices (#13178) @davidwendt
Allow compilation with any GTest version 1.11+ (#13153) @robertmaynard
Fix a few clang-format style check errors (#13146) @davidwendt
[REVIEW] Fix Series and DataFrame constructors to validate index lengths (#13122) @galipremsagar
Fix hash join when the input tables have nulls on only one side (#13120) @ttnghia
Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURES in Python package build. (#13117) @davidwendt
Adds checks to make sure json reader won't overflow (#13115) @elstehle
Fix null_count of columns returned by chunked_parquet_reader (#13111) @vuule
Fixes sliced list and struct column bug in JSON chunked writer (#13108) @karthikeyann
[REVIEW] Fix missing confluent kafka version (#13101) @galipremsagar
Use make_empty_lists_column instead of make_empty_column(type_id::LIST) (#13099) @davidwendt
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Fix column selection read_parquet benchmarks (#13082) @vuule
Fix bugs in iterative groupby apply algorithm (#13078) @brandon-b-miller
Add algorithm include in data_sink.hpp (#13068) @ahendriksen
Fix tests/identify_stream_usage.cpp (#13066) @ahendriksen
Prevent overflow with skip_rows in ORC and Parquet readers (#13063) @vuule
Add except declaration in Cython interface for regex_program::create (#13054) @davidwendt
[REVIEW] Fix branch version in CI scripts (#13029) @galipremsagar
Fix OOB memory access in CSV reader when reading without NA values (#13011) @vuule
Fix read_avro() skip_rows and num_rows. (#12912) @tpn
Purge nonempty nulls from byte_cast list outputs. (#11971) @bdice
Fix consumption of CPU-backed interchange protocol dataframes (#11392) @shwina

🚀 New Features

Remove numba JIT kernel usage from dataframe copy tests (#13385) @brandon-b-miller
Add JNI for ORC/Parquet writer compression statistics (#13376) @ttnghia
Use _compile_or_get in JIT groupby apply (#13350) @brandon-b-miller
cuDF numba cuda 12 updates (#13337) @brandon-b-miller
Add tz_convert method to convert between timestamps (#13328) @shwina
Optionally return compression statistics from ORC and Parquet writers (#13294) @vuule
Support the case=False argument to str.contains (#13290) @shwina
Add an event handler for ColumnVector.close (#13279) @abellina
JNI api for cudf::chunked_pack (#13278) @abellina
Implement a chunked_pack API (#13260) @abellina
Update cudf recipes to use GTest version to >=1.13 (#13207) @robertmaynard
JNI changes for range-extents in window functions. (#13199) @mythrocks
Add support for DatetimeTZDtype and tz_localize (#13163) @shwina
Add IS_NULL operator to AST (#13145) @karthikeyann
STRING order-by column for RANGE window functions (#13143) @mythrocks
Update contains_table to experimental row hasher and equality comparator (#13119) @divyegala
Automatically select GroupBy.apply algorithm based on if the UDF is jittable (#13113) @brandon-b-miller
Refactor Parquet chunked writer (#13076) @ttnghia
Add Python bindings for string literal support in AST (#13073) @karthikeyann
Add Java bindings for string literal support in AST (#13072) @karthikeyann
Add string scalar support in AST (#13061) @karthikeyann
Log cuIO warnings using the libcudf logger (#13043) @vuule
Update mixed_join to use experimental row hasher and comparator (#13028) @divyegala
Support structs of lists in row lexicographic comparator (#13005) @ttnghia
Adding hostdevice_span that is a span createable from hostdevice_vector (#12981) @hyperbolic2346
Add nvtext::minhash function (#12961) @davidwendt
Support lists of structs in row lexicographic comparator (#12953) @ttnghia
Update join to use experimental row hasher and comparator (#12787) @divyegala
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🛠️ Improvements

Bump typing_extensions minimum version to 4.0.0 (#13618) @shwina
Drop extraneous dependencies from cudf conda recipe. (#13406) @bdice
Handle some corner-cases in indexing with boolean masks (#13402) @wence-
Add cudf::stable_distinct public API, tests, and benchmarks. (#13392) @bdice
[JNI] Pass this ColumnVector to the onClosed event handler (#13386) @abellina
Fix JNI method with mismatched parameter list (#13384) @ttnghia
Split up experimental_row_operator_tests.cu to improve its compile time (#13382) @davidwendt
Deprecate cudf::strings::slice_strings APIs that accept delimiters (#13373) @davidwendt
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Move some nvtext benchmarks to nvbench (#13368) @davidwendt
run docs nightly too (#13366) @AyodeAwe
Add warning for default dtype parameter in get_dummies (#13365) @galipremsagar
Add log messages about kvikIO compatibility mode (#13363) @vuule
Switch back to using primary shared-action-workflows branch (#13362) @vyasr
Deprecate StringIndex and use Index instead (#13361) @galipremsagar
Ensure columns have valid null counts in CUDF JNI. (#13355) @mythrocks
Expunge most uses of TypeVar(bound="Foo") (#13346) @wence-
Remove all references to UNKNOWN_NULL_COUNT in Python (#13345) @vyasr
Improve distinct_count with cuco::static_set (#13343) @PointKernel
Fix contiguous_split performance (#13342) @ttnghia
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Update mypy to 1.3 (#13340) @wence-
[Java] Purge non-empty nulls when setting validity (#13335) @razajafri
Add row-wise filtering step to read_parquet (#13334) @rjzamora
Performance improvement for nvtext::minhash (#13333) @davidwendt
Fix some libcudf functions to set the null count on returning columns (#13331) @davidwendt
Change cudf::detail::concatenate_masks to return null-count (#13330) @davidwendt
Move meta calculation in `dask_cu...

Contributors

robertmaynard, gmarkall, and 28 other contributors

Assets 2

07 Jun 15:25

raydouglass

v23.06.00

f881d40

v23.06.00

🚨 Breaking Changes

Fix batch processing for parquet writer (#13438) @ttnghia
Use <NA> instead of null to match pandas. (#13415) @bdice
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Use std::overflow_error when output would exceed column size limit (#13323) @davidwendt
Remove null mask and null count from column_view constructors (#13311) @vyasr
Change default value of the observed= argument in groupby to True to reflect the actual behaviour (#13296) @shwina
Throw error if UNINITIALIZED is passed to cudf::state_null_count (#13292) @davidwendt
Remove default null-count parameter from cudf::make_strings_column factory (#13227) @davidwendt
Remove UNKNOWN_NULL_COUNT where it can be easily computed (#13205) @vyasr
Update minimum Python version to Python 3.9 (#13196) @shwina
Refactor contiguous_split API into contiguous_split.hpp (#13186) @abellina
Cleanup Parquet chunked writer (#13094) @ttnghia
Cleanup ORC chunked writer (#13091) @ttnghia
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Remove deprecated regex functions from libcudf (#13067) @davidwendt
[REVIEW] Upgrade to arrow-11 (#12757) @galipremsagar
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🐛 Bug Fixes

Fix valid count computation in offset_bitmask_binop kernel (#13489) @davidwendt
Fix writing of ORC files with empty rowgroups (#13466) @vuule
Fix cudf::repeat logic when count is zero (#13459) @davidwendt
Fix batch processing for parquet writer (#13438) @ttnghia
Fix invalid use of std::exclusive_scan in Parquet writer (#13434) @etseidl
Patch numba if it is imported first to ensure minor version compatibility works. (#13433) @bdice
Fix cudf::strings::replace_with_backrefs hang on empty match result (#13418) @davidwendt
Use <NA> instead of null to match pandas. (#13415) @bdice
Fix tokenize with non-space delimiter (#13403) @shwina
Fix groupby head/tail for empty dataframe (#13398) @shwina
Default to closed="right" in IntervalIndex constructor (#13394) @shwina
Correctly reorder and reindex scan groupbys with null keys (#13389) @wence-
Fix unused argument errors in nvcc 11.5 (#13387) @abellina
Updates needed to work with jitify that leverages libcudacxx (#13383) @robertmaynard
Fix unused parameter warning/error in parquet/page_data.cu (#13367) @davidwendt
Fix page size estimation in Parquet writer (#13364) @etseidl
Fix subword_tokenize error when input contains no tokens (#13320) @davidwendt
Support gcc 12 as the C++ compiler (#13316) @robertmaynard
Correctly set bitmask size in from_column_view (#13315) @wence-
Fix approach to detecting assignment for gte/lte operators (#13285) @vyasr
Fix parquet schema interpretation issue (#13277) @hyperbolic2346
Fix 64bit shift bug in avro reader (#13276) @karthikeyann
Fix unused variables/parameters in parquet/writer_impl.cu (#13263) @davidwendt
Clean up buffers in case AssertionError (#13262) @razajafri
Allow empty input table in ast compute_column (#13245) @wence-
Fix structs_column_wrapper constructors to copy input column wrappers (#13243) @davidwendt
Fix the row index stream order in ORC reader (#13242) @vuule
Make is_decompression_disabled and is_compression_disabled thread-safe (#13240) @vuule
Add [[maybe_unused]] to nvbench environment. (#13219) @bdice
Fix race in ORC string dictionary creation (#13214) @revans2
Add scalar argtypes to udf cache keys (#13194) @brandon-b-miller
Fix unused parameter warning/error in grouped_rolling.cu (#13192) @davidwendt
Avoid skbuild 0.17.2 which affected the cmake -DPython_LIBRARY string (#13188) @sevagh
Fix hostdevice_vector::subspan (#13187) @ttnghia
Use custom nvbench entry point to ensure cudf::nvbench_base_fixture usage (#13183) @robertmaynard
Fix slice_strings to return empty strings for stop < start indices (#13178) @davidwendt
Allow compilation with any GTest version 1.11+ (#13153) @robertmaynard
Fix a few clang-format style check errors (#13146) @davidwendt
[REVIEW] Fix Series and DataFrame constructors to validate index lengths (#13122) @galipremsagar
Fix hash join when the input tables have nulls on only one side (#13120) @ttnghia
Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURES in Python package build. (#13117) @davidwendt
Adds checks to make sure json reader won't overflow (#13115) @elstehle
Fix null_count of columns returned by chunked_parquet_reader (#13111) @vuule
Fixes sliced list and struct column bug in JSON chunked writer (#13108) @karthikeyann
[REVIEW] Fix missing confluent kafka version (#13101) @galipremsagar
Use make_empty_lists_column instead of make_empty_column(type_id::LIST) (#13099) @davidwendt
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Fix column selection read_parquet benchmarks (#13082) @vuule
Fix bugs in iterative groupby apply algorithm (#13078) @brandon-b-miller
Add algorithm include in data_sink.hpp (#13068) @ahendriksen
Fix tests/identify_stream_usage.cpp (#13066) @ahendriksen
Prevent overflow with skip_rows in ORC and Parquet readers (#13063) @vuule
Add except declaration in Cython interface for regex_program::create (#13054) @davidwendt
[REVIEW] Fix branch version in CI scripts (#13029) @galipremsagar
Fix OOB memory access in CSV reader when reading without NA values (#13011) @vuule
Fix read_avro() skip_rows and num_rows. (#12912) @tpn
Purge nonempty nulls from byte_cast list outputs. (#11971) @bdice
Fix consumption of CPU-backed interchange protocol dataframes (#11392) @shwina

🚀 New Features

Remove numba JIT kernel usage from dataframe copy tests (#13385) @brandon-b-miller
Add JNI for ORC/Parquet writer compression statistics (#13376) @ttnghia
Use _compile_or_get in JIT groupby apply (#13350) @brandon-b-miller
cuDF numba cuda 12 updates (#13337) @brandon-b-miller
Add tz_convert method to convert between timestamps (#13328) @shwina
Optionally return compression statistics from ORC and Parquet writers (#13294) @vuule
Support the case=False argument to str.contains (#13290) @shwina
Add an event handler for ColumnVector.close (#13279) @abellina
JNI api for cudf::chunked_pack (#13278) @abellina
Implement a chunked_pack API (#13260) @abellina
Update cudf recipes to use GTest version to >=1.13 (#13207) @robertmaynard
JNI changes for range-extents in window functions. (#13199) @mythrocks
Add support for DatetimeTZDtype and tz_localize (#13163) @shwina
Add IS_NULL operator to AST (#13145) @karthikeyann
STRING order-by column for RANGE window functions (#13143) @mythrocks
Update contains_table to experimental row hasher and equality comparator (#13119) @divyegala
Automatically select GroupBy.apply algorithm based on if the UDF is jittable (#13113) @brandon-b-miller
Refactor Parquet chunked writer (#13076) @ttnghia
Add Python bindings for string literal support in AST (#13073) @karthikeyann
Add Java bindings for string literal support in AST (#13072) @karthikeyann
Add string scalar support in AST (#13061) @karthikeyann
Log cuIO warnings using the libcudf logger (#13043) @vuule
Update mixed_join to use experimental row hasher and comparator (#13028) @divyegala
Support structs of lists in row lexicographic comparator (#13005) @ttnghia
Adding hostdevice_span that is a span createable from hostdevice_vector (#12981) @hyperbolic2346
Add nvtext::minhash function (#12961) @davidwendt
Support lists of structs in row lexicographic comparator (#12953) @ttnghia
Update join to use experimental row hasher and comparator (#12787) @divyegala
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🛠️ Improvements

Drop extraneous dependencies from cudf conda recipe. (#13406) @bdice
Handle some corner-cases in indexing with boolean masks (#13402) @wence-
Add cudf::stable_distinct public API, tests, and benchmarks. (#13392) @bdice
[JNI] Pass this ColumnVector to the onClosed event handler (#13386) @abellina
Fix JNI method with mismatched parameter list (#13384) @ttnghia
Split up experimental_row_operator_tests.cu to improve its compile time (#13382) @davidwendt
Deprecate cudf::strings::slice_strings APIs that accept delimiters (#13373) @davidwendt
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Move some nvtext benchmarks to nvbench (#13368) @davidwendt
run docs nightly too (#13366) @AyodeAwe
Add warning for default dtype parameter in get_dummies (#13365) @galipremsagar
Add log messages about kvikIO compatibility mode (#13363) @vuule
Switch back to using primary shared-action-workflows branch (#13362) @vyasr
Deprecate StringIndex and use Index instead (#13361) @galipremsagar
Ensure columns have valid null counts in CUDF JNI. (#13355) @mythrocks
Expunge most uses of TypeVar(bound="Foo") (#13346) @wence-
Remove all references to UNKNOWN_NULL_COUNT in Python (#13345) @vyasr
Improve distinct_count with cuco::static_set (#13343) @PointKernel
Fix contiguous_split performance (#13342) @ttnghia
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Update mypy to 1.3 (#13340) @wence-
[Java] Purge non-empty nulls when setting validity (#13335) @razajafri
Add row-wise filtering step to read_parquet (#13334) @rjzamora
Performance improvement for nvtext::minhash (#13333) @davidwendt
Fix some libcudf functions to set the null count on returning columns (#13331) @davidwendt
Change cudf::detail::concatenate_masks to return null-count (#13330) @davidwendt
Move meta calculation in dask_cudf.read_parquet (#13327) @rjzamora
Changes to support Numpy >...

Contributors

robertmaynard, gmarkall, and 28 other contributors

Assets 2

07 Nov 21:43

raydouglass

v23.04.01

7e070fc

v23.04.01

🚨 Breaking Changes

Pin dask and distributed for release (#13070) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Update minimum pandas and numpy pinnings (#12887) @galipremsagar
Deprecate names & dtype in Index.copy (#12825) @galipremsagar
Deprecate Index.is_* methods (#12820) @galipremsagar
Deprecate datetime_is_numeric from describe (#12818) @galipremsagar
Deprecate na_sentinel in factorize (#12817) @galipremsagar
Make string methods return a Series with a useful Index (#12814) @shwina
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Remove cudf::strings::repeat_strings_output_sizes and optional parameter from cudf::strings::repeat_strings (#12609) @davidwendt
Replace message parsing with throwing more specific exceptions (#12426) @vyasr

🐛 Bug Fixes

Pin curand version (#13127) @vyasr
Fix memcheck script to execute only _TEST files found in bin/gtests/libcudf (#13006) @davidwendt
Fix DataFrame constructor to broadcast scalar inputs properly (#12997) @galipremsagar
Drop force_nullable_schema from chunked parquet writer (#12996) @galipremsagar
Fix gtest column utility comparator diff reporting (#12995) @davidwendt
Handle index names while performing groupby (#12992) @galipremsagar
Fix __setitem__ on string columns when the scalar value ends in a null byte (#12991) @wence-
Fix sort_values when column is all empty strings (#12988) @eriknw
Remove unused variable and fix memory issue in ORC writer (#12984) @ttnghia
Pre-emptive fix for upstream dask.dataframe.read_parquet changes (#12983) @rjzamora
Remove MANIFEST.in use auto-generated one for sdists and package_data for wheels (#12960) @vyasr
Update to use rapids-export(COMPONENTS) feature. (#12959) @robertmaynard
cudftestutil supports static gtest dependencies (#12957) @robertmaynard
Include gtest in build environment. (#12956) @vyasr
Correctly handle scalar indices in Index.__getitem__ (#12955) @wence-
Avoid building cython twice (#12945) @galipremsagar
Fix set index error for Series rolling window operations (#12942) @galipremsagar
Fix calculation of null counts for Parquet statistics (#12938) @etseidl
Preserve integer dtype of hive-partitioned column containing nulls (#12930) @rjzamora
Use get_current_device_resource for intermediate allocations in COLLECT_LIST window code (#12927) @karthikeyann
Mark dlpack tensor deleter as noexcept to match PyCapsule_Destructor signature. (#12921) @bdice
Fix conda recipe post-link.sh typo (#12916) @pentschev
min_rows and num_rows are swapped in ComputePageSizes declaration in Parquet reader (#12886) @etseidl
Expect cupy to now support bool arrays for dlpack. (#12883) @vyasr
Use python -m pytest for nightly wheel tests (#12871) @bdice
Parquet writer column_size() should return a size_t (#12870) @etseidl
Fix cudf::hash_partition kernel launch error with decimal128 types (#12863) @davidwendt
Fix an issue with parquet chunked reader undercounting string lengths. (#12859) @nvdbaranec
Remove tokenizers pre-install pinning. (#12854) @vyasr
Fix parquet RangeIndex bug (#12838) @rjzamora
Remove KAFKA_HOST_TEST from compute-sanitizer check (#12831) @davidwendt
Make string methods return a Series with a useful Index (#12814) @shwina
Tell cudf_kafka to use header-only fmt (#12796) @vyasr
Add GroupBy.dtypes (#12783) @galipremsagar
Fix a leak in a test and clarify some test names (#12781) @revans2
Fix bug in all-null list due to join_list_elements special handling (#12767) @karthikeyann
Add try/except for expected null-schema error in read_parquet (#12756) @rjzamora
Throw an exception if an unsupported page encoding is detected in Parquet reader (#12754) @etseidl
Fix a bug with num_keys in _scatter_by_slice (#12749) @thomcom
Bump pinned rapids wheel deps to 23.4 (#12735) @sevagh
Rework logic in cudf::strings::split_record to improve performance (#12729) @davidwendt
Add always_nullable flag to Dremel encoding (#12727) @divyegala
Fix memcheck read error in compound segmented reduce (#12722) @davidwendt
Fix faulty conditional logic in JIT GroupBy.apply (#12706) @brandon-b-miller
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Handle parquet list data corner case (#12698) @nvdbaranec
Fix missing trailing comma in json writer (#12688) @karthikeyann
Remove child fom newCudaAsyncMemoryResource (#12681) @abellina
Handle bool types in round API (#12670) @galipremsagar
Ensure all of device bitmask is initialized in from_arrow (#12668) @wence-
Fix from_arrow to load a sliced arrow table (#12665) @galipremsagar
Fix dask-cudf read_parquet bug for multi-file aggregation (#12663) @rjzamora
Fix AllocateLikeTest gtests reading uninitialized null-mask (#12643) @davidwendt
Fix find_common_dtype and values to handle complex dtypes (#12537) @galipremsagar
Fix fetching of MultiIndex values when a label is passed (#12521) @galipremsagar
Fix Series comparison vs scalars (#12519) @brandon-b-miller
Allow casting from UDFString back to StringView to call methods in strings_udf (#12363) @brandon-b-miller

📖 Documentation

Fix GroupBy.apply doc examples rendering (#12994) @brandon-b-miller
add sphinx building and s3 uploading for dask-cudf docs (#12982) @quasiben
Add developer documentation forbidding default parameters in detail APIs (#12978) @vyasr
Add README symlink for dask-cudf. (#12946) @bdice
Remove return type from @return doxygen tags (#12908) @davidwendt
Fix docs build to be pydata-sphinx-theme=0.13.0 compatible (#12874) @galipremsagar
Add skeleton API and prose documentation for dask-cudf (#12725) @wence-
Enable doctests for GroupBy methods (#12658) @brandon-b-miller
Add comment about CUB patch for SegmentedSortInt.Bool gtest (#12611) @davidwendt

🚀 New Features

Add JNI method for strings::replace multi variety (#12979) @NVnavkumar
Add nunique aggregation support for cudf::segmented_reduce (#12972) @davidwendt
Refactor orc chunked writer (#12949) @ttnghia
Make Parquet writer nullable option application to single table writes (#12933) @vuule
Refactor io::orc::ProtobufWriter (#12877) @ttnghia
Make timezone table independent from ORC (#12805) @vuule
Cache JIT GroupBy.apply functions (#12802) @brandon-b-miller
Implement initial support for avro logical types (#6482) (#12788) @tpn
Update tests/column_utilities to use experimental::equality row comparator (#12777) @divyegala
Update distinct/unique_count to experimental::row hasher/comparator (#12776) @divyegala
Update hash_partition to use experimental::row::row_hasher (#12761) @divyegala
Update is_sorted to use experimental::row::lexicographic (#12752) @divyegala
Update default data source in cuio reader benchmarks (#12740) @PointKernel
Reenable stream identification library in CI (#12714) @vyasr
Add regex_program strings splitting java APIs and tests (#12713) @cindyyuanjiang
Add regex_program strings replacing java APIs and tests (#12701) @cindyyuanjiang
Add regex_program strings extract java APIs and tests (#12699) @cindyyuanjiang
Variable fragment sizes for Parquet writer (#12685) @etseidl
Add segmented reduction support for fixed-point types (#12680) @davidwendt
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Add regex_program searching APIs and related java classes (#12666) @cindyyuanjiang
Add logging to libcudf (#12637) @vuule
Add compound aggregations to cudf::segmented_reduce (#12573) @davidwendt
Convert rank to use to experimental row comparators (#12481) @divyegala
Use rapids-cmake parallel testing feature (#12451) @robertmaynard
Enable detection of undesired stream usage (#12089) @vyasr

🛠️ Improvements

Pin dask and distributed for release (#13070) @galipremsagar
Pin cupy in wheel tests to supported versions (#13041) @vyasr
Pin numba version (#13001) @vyasr
Rework gtests SequenceTest to remove using namepace cudf (#12985) @davidwendt
Stop setting package version attribute in wheels (#12977) @vyasr
Move detail reduction functions to cudf::reduction::detail namespace (#12971) @davidwendt
Remove default detail mrs: part7 (#12970) @vyasr
Remove default detail mrs: part6 (#12969) @vyasr
Remove default detail mrs: part5 (#12968) @vyasr
Remove default detail mrs: part4 (#12967) @vyasr
Remove default detail mrs: part3 (#12966) @vyasr
Remove default detail mrs: part2 (#12965) @vyasr
Remove default detail mrs: part1 (#12964) @vyasr
Add force_nullable_schema parameter to Parquet writer. (#12952) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Remove remaining default stream parameters (#12943) @vyasr
Fix cudf::segmented_reduce gtest for ANY aggregation (#12940) @davidwendt
Implement groupby.head and groupby.tail (#12939) @wence-
Fix libcudf gtests to pass null-count=0 for empty validity masks (#12923) @davidwendt
Migrate parquet encoding to use experimental row operators (#12918) @PointKernel
Fix benchmarks coded in namespace cudf and using namespace cudf (#12915) @karthikeyann
Fix io/text gtests coded in namespace cudf::test (#12914) @karthikeyann
Pass SCCACHE_S3_USE_SSL to conda builds (#12910) @ajschmidt8
Fix FST, JSON gtests & benchmarks coded in namespace cudf::test (#12907) @karthikeyann
Generate pyproject dependencies using dfg (#12906) @vyasr
Update libcudf counting functions to specify cudf::size_type (#12904) @davidwendt
Fix moto env vars & pass AWS_SESSION_TOKEN to conda builds (#12902) @ajschmidt8
Rewrite CSV wri...

Contributors

robertmaynard, thomcom, and 38 other contributors

Assets 2

12 Apr 14:26

raydouglass

v23.04.00

4d31a6f

v23.04.00

🚨 Breaking Changes

Pin dask and distributed for release (#13070) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Update minimum pandas and numpy pinnings (#12887) @galipremsagar
Deprecate names & dtype in Index.copy (#12825) @galipremsagar
Deprecate Index.is_* methods (#12820) @galipremsagar
Deprecate datetime_is_numeric from describe (#12818) @galipremsagar
Deprecate na_sentinel in factorize (#12817) @galipremsagar
Make string methods return a Series with a useful Index (#12814) @shwina
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Remove cudf::strings::repeat_strings_output_sizes and optional parameter from cudf::strings::repeat_strings (#12609) @davidwendt
Replace message parsing with throwing more specific exceptions (#12426) @vyasr

🐛 Bug Fixes

Fix memcheck script to execute only _TEST files found in bin/gtests/libcudf (#13006) @davidwendt
Fix DataFrame constructor to broadcast scalar inputs properly (#12997) @galipremsagar
Drop force_nullable_schema from chunked parquet writer (#12996) @galipremsagar
Fix gtest column utility comparator diff reporting (#12995) @davidwendt
Handle index names while performing groupby (#12992) @galipremsagar
Fix __setitem__ on string columns when the scalar value ends in a null byte (#12991) @wence-
Fix sort_values when column is all empty strings (#12988) @eriknw
Remove unused variable and fix memory issue in ORC writer (#12984) @ttnghia
Pre-emptive fix for upstream dask.dataframe.read_parquet changes (#12983) @rjzamora
Remove MANIFEST.in use auto-generated one for sdists and package_data for wheels (#12960) @vyasr
Update to use rapids-export(COMPONENTS) feature. (#12959) @robertmaynard
cudftestutil supports static gtest dependencies (#12957) @robertmaynard
Include gtest in build environment. (#12956) @vyasr
Correctly handle scalar indices in Index.__getitem__ (#12955) @wence-
Avoid building cython twice (#12945) @galipremsagar
Fix set index error for Series rolling window operations (#12942) @galipremsagar
Fix calculation of null counts for Parquet statistics (#12938) @etseidl
Preserve integer dtype of hive-partitioned column containing nulls (#12930) @rjzamora
Use get_current_device_resource for intermediate allocations in COLLECT_LIST window code (#12927) @karthikeyann
Mark dlpack tensor deleter as noexcept to match PyCapsule_Destructor signature. (#12921) @bdice
Fix conda recipe post-link.sh typo (#12916) @pentschev
min_rows and num_rows are swapped in ComputePageSizes declaration in Parquet reader (#12886) @etseidl
Expect cupy to now support bool arrays for dlpack. (#12883) @vyasr
Use python -m pytest for nightly wheel tests (#12871) @bdice
Parquet writer column_size() should return a size_t (#12870) @etseidl
Fix cudf::hash_partition kernel launch error with decimal128 types (#12863) @davidwendt
Fix an issue with parquet chunked reader undercounting string lengths. (#12859) @nvdbaranec
Remove tokenizers pre-install pinning. (#12854) @vyasr
Fix parquet RangeIndex bug (#12838) @rjzamora
Remove KAFKA_HOST_TEST from compute-sanitizer check (#12831) @davidwendt
Make string methods return a Series with a useful Index (#12814) @shwina
Tell cudf_kafka to use header-only fmt (#12796) @vyasr
Add GroupBy.dtypes (#12783) @galipremsagar
Fix a leak in a test and clarify some test names (#12781) @revans2
Fix bug in all-null list due to join_list_elements special handling (#12767) @karthikeyann
Add try/except for expected null-schema error in read_parquet (#12756) @rjzamora
Throw an exception if an unsupported page encoding is detected in Parquet reader (#12754) @etseidl
Fix a bug with num_keys in _scatter_by_slice (#12749) @thomcom
Bump pinned rapids wheel deps to 23.4 (#12735) @sevagh
Rework logic in cudf::strings::split_record to improve performance (#12729) @davidwendt
Add always_nullable flag to Dremel encoding (#12727) @divyegala
Fix memcheck read error in compound segmented reduce (#12722) @davidwendt
Fix faulty conditional logic in JIT GroupBy.apply (#12706) @brandon-b-miller
Produce useful guidance on overflow error in to_csv (#12705) @wence-
Handle parquet list data corner case (#12698) @nvdbaranec
Fix missing trailing comma in json writer (#12688) @karthikeyann
Remove child fom newCudaAsyncMemoryResource (#12681) @abellina
Handle bool types in round API (#12670) @galipremsagar
Ensure all of device bitmask is initialized in from_arrow (#12668) @wence-
Fix from_arrow to load a sliced arrow table (#12665) @galipremsagar
Fix dask-cudf read_parquet bug for multi-file aggregation (#12663) @rjzamora
Fix AllocateLikeTest gtests reading uninitialized null-mask (#12643) @davidwendt
Fix find_common_dtype and values to handle complex dtypes (#12537) @galipremsagar
Fix fetching of MultiIndex values when a label is passed (#12521) @galipremsagar
Fix Series comparison vs scalars (#12519) @brandon-b-miller
Allow casting from UDFString back to StringView to call methods in strings_udf (#12363) @brandon-b-miller

📖 Documentation

Fix GroupBy.apply doc examples rendering (#12994) @brandon-b-miller
add sphinx building and s3 uploading for dask-cudf docs (#12982) @quasiben
Add developer documentation forbidding default parameters in detail APIs (#12978) @vyasr
Add README symlink for dask-cudf. (#12946) @bdice
Remove return type from @return doxygen tags (#12908) @davidwendt
Fix docs build to be pydata-sphinx-theme=0.13.0 compatible (#12874) @galipremsagar
Add skeleton API and prose documentation for dask-cudf (#12725) @wence-
Enable doctests for GroupBy methods (#12658) @brandon-b-miller
Add comment about CUB patch for SegmentedSortInt.Bool gtest (#12611) @davidwendt

🚀 New Features

Add JNI method for strings::replace multi variety (#12979) @NVnavkumar
Add nunique aggregation support for cudf::segmented_reduce (#12972) @davidwendt
Refactor orc chunked writer (#12949) @ttnghia
Make Parquet writer nullable option application to single table writes (#12933) @vuule
Refactor io::orc::ProtobufWriter (#12877) @ttnghia
Make timezone table independent from ORC (#12805) @vuule
Cache JIT GroupBy.apply functions (#12802) @brandon-b-miller
Implement initial support for avro logical types (#6482) (#12788) @tpn
Update tests/column_utilities to use experimental::equality row comparator (#12777) @divyegala
Update distinct/unique_count to experimental::row hasher/comparator (#12776) @divyegala
Update hash_partition to use experimental::row::row_hasher (#12761) @divyegala
Update is_sorted to use experimental::row::lexicographic (#12752) @divyegala
Update default data source in cuio reader benchmarks (#12740) @PointKernel
Reenable stream identification library in CI (#12714) @vyasr
Add regex_program strings splitting java APIs and tests (#12713) @cindyyuanjiang
Add regex_program strings replacing java APIs and tests (#12701) @cindyyuanjiang
Add regex_program strings extract java APIs and tests (#12699) @cindyyuanjiang
Variable fragment sizes for Parquet writer (#12685) @etseidl
Add segmented reduction support for fixed-point types (#12680) @davidwendt
Move strings_udf code into cuDF (#12669) @brandon-b-miller
Add regex_program searching APIs and related java classes (#12666) @cindyyuanjiang
Add logging to libcudf (#12637) @vuule
Add compound aggregations to cudf::segmented_reduce (#12573) @davidwendt
Convert rank to use to experimental row comparators (#12481) @divyegala
Use rapids-cmake parallel testing feature (#12451) @robertmaynard
Enable detection of undesired stream usage (#12089) @vyasr

🛠️ Improvements

Pin dask and distributed for release (#13070) @galipremsagar
Pin cupy in wheel tests to supported versions (#13041) @vyasr
Pin numba version (#13001) @vyasr
Rework gtests SequenceTest to remove using namepace cudf (#12985) @davidwendt
Stop setting package version attribute in wheels (#12977) @vyasr
Move detail reduction functions to cudf::reduction::detail namespace (#12971) @davidwendt
Remove default detail mrs: part7 (#12970) @vyasr
Remove default detail mrs: part6 (#12969) @vyasr
Remove default detail mrs: part5 (#12968) @vyasr
Remove default detail mrs: part4 (#12967) @vyasr
Remove default detail mrs: part3 (#12966) @vyasr
Remove default detail mrs: part2 (#12965) @vyasr
Remove default detail mrs: part1 (#12964) @vyasr
Add force_nullable_schema parameter to Parquet writer. (#12952) @galipremsagar
Declare a different name for nan_equality.UNEQUAL to prevent Cython warnings. (#12947) @bdice
Remove remaining default stream parameters (#12943) @vyasr
Fix cudf::segmented_reduce gtest for ANY aggregation (#12940) @davidwendt
Implement groupby.head and groupby.tail (#12939) @wence-
Fix libcudf gtests to pass null-count=0 for empty validity masks (#12923) @davidwendt
Migrate parquet encoding to use experimental row operators (#12918) @PointKernel
Fix benchmarks coded in namespace cudf and using namespace cudf (#12915) @karthikeyann
Fix io/text gtests coded in namespace cudf::test (#12914) @karthikeyann
Pass SCCACHE_S3_USE_SSL to conda builds (#12910) @ajschmidt8
Fix FST, JSON gtests & benchmarks coded in namespace cudf::test (#12907) @karthikeyann
Generate pyproject dependencies using dfg (#12906) @vyasr
Update libcudf counting functions to specify cudf::size_type (#12904) @davidwendt
Fix moto env vars & pass AWS_SESSION_TOKEN to conda builds (#12902) @ajschmidt8
Rewrite CSV writer benchmark with nvbench (#12901) @PointKernel
Rework some code logic to reduce iterator and comparator inlining to improve compile time (#12900) @davidwendt
Deprecate `line_te...

Contributors

robertmaynard, thomcom, and 38 other contributors

Assets 2

29 Jun 13:28

rapids-bot

v23.06.00a

302054d

[NIGHTLY] v23.06.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Fix batch processing for parquet writer (#13438) @ttnghia
Use <NA> instead of null to match pandas. (#13415) @bdice
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Use std::overflow_error when output would exceed column size limit (#13323) @davidwendt
Remove null mask and null count from column_view constructors (#13311) @vyasr
Change default value of the observed= argument in groupby to True to reflect the actual behaviour (#13296) @shwina
Throw error if UNINITIALIZED is passed to cudf::state_null_count (#13292) @davidwendt
Remove default null-count parameter from cudf::make_strings_column factory (#13227) @davidwendt
Remove UNKNOWN_NULL_COUNT where it can be easily computed (#13205) @vyasr
Update minimum Python version to Python 3.9 (#13196) @shwina
Refactor contiguous_split API into contiguous_split.hpp (#13186) @abellina
Cleanup Parquet chunked writer (#13094) @ttnghia
Cleanup ORC chunked writer (#13091) @ttnghia
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Remove deprecated regex functions from libcudf (#13067) @davidwendt
[REVIEW] Upgrade to arrow-11 (#12757) @galipremsagar
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🐛 Bug Fixes

Fix valid count computation in offset_bitmask_binop kernel (#13489) @davidwendt
Fix writing of ORC files with empty rowgroups (#13466) @vuule
Fix cudf::repeat logic when count is zero (#13459) @davidwendt
Fix batch processing for parquet writer (#13438) @ttnghia
Fix invalid use of std::exclusive_scan in Parquet writer (#13434) @etseidl
Patch numba if it is imported first to ensure minor version compatibility works. (#13433) @bdice
Fix cudf::strings::replace_with_backrefs hang on empty match result (#13418) @davidwendt
Use <NA> instead of null to match pandas. (#13415) @bdice
Fix tokenize with non-space delimiter (#13403) @shwina
Fix groupby head/tail for empty dataframe (#13398) @shwina
Default to closed="right" in IntervalIndex constructor (#13394) @shwina
Correctly reorder and reindex scan groupbys with null keys (#13389) @wence-
Fix unused argument errors in nvcc 11.5 (#13387) @abellina
Updates needed to work with jitify that leverages libcudacxx (#13383) @robertmaynard
Fix unused parameter warning/error in parquet/page_data.cu (#13367) @davidwendt
Fix page size estimation in Parquet writer (#13364) @etseidl
Fix subword_tokenize error when input contains no tokens (#13320) @davidwendt
Support gcc 12 as the C++ compiler (#13316) @robertmaynard
Correctly set bitmask size in from_column_view (#13315) @wence-
Fix approach to detecting assignment for gte/lte operators (#13285) @vyasr
Fix parquet schema interpretation issue (#13277) @hyperbolic2346
Fix 64bit shift bug in avro reader (#13276) @karthikeyann
Fix unused variables/parameters in parquet/writer_impl.cu (#13263) @davidwendt
Clean up buffers in case AssertionError (#13262) @razajafri
Allow empty input table in ast compute_column (#13245) @wence-
Fix structs_column_wrapper constructors to copy input column wrappers (#13243) @davidwendt
Fix the row index stream order in ORC reader (#13242) @vuule
Make is_decompression_disabled and is_compression_disabled thread-safe (#13240) @vuule
Add [[maybe_unused]] to nvbench environment. (#13219) @bdice
Fix race in ORC string dictionary creation (#13214) @revans2
Add scalar argtypes to udf cache keys (#13194) @brandon-b-miller
Fix unused parameter warning/error in grouped_rolling.cu (#13192) @davidwendt
Avoid skbuild 0.17.2 which affected the cmake -DPython_LIBRARY string (#13188) @sevagh
Fix hostdevice_vector::subspan (#13187) @ttnghia
Use custom nvbench entry point to ensure cudf::nvbench_base_fixture usage (#13183) @robertmaynard
Fix slice_strings to return empty strings for stop < start indices (#13178) @davidwendt
Allow compilation with any GTest version 1.11+ (#13153) @robertmaynard
Fix a few clang-format style check errors (#13146) @davidwendt
[REVIEW] Fix Series and DataFrame constructors to validate index lengths (#13122) @galipremsagar
Fix hash join when the input tables have nulls on only one side (#13120) @ttnghia
Fix GPU_ARCHS setting in Java CMake build and CMAKE_CUDA_ARCHITECTURES in Python package build. (#13117) @davidwendt
Adds checks to make sure json reader won't overflow (#13115) @elstehle
Fix null_count of columns returned by chunked_parquet_reader (#13111) @vuule
Fixes sliced list and struct column bug in JSON chunked writer (#13108) @karthikeyann
[REVIEW] Fix missing confluent kafka version (#13101) @galipremsagar
Use make_empty_lists_column instead of make_empty_column(type_id::LIST) (#13099) @davidwendt
Raise NotImplementedError when attempting to construct cuDF objects from timezone-aware datetimes (#13086) @shwina
Fix column selection read_parquet benchmarks (#13082) @vuule
Fix bugs in iterative groupby apply algorithm (#13078) @brandon-b-miller
Add algorithm include in data_sink.hpp (#13068) @ahendriksen
Fix tests/identify_stream_usage.cpp (#13066) @ahendriksen
Prevent overflow with skip_rows in ORC and Parquet readers (#13063) @vuule
Add except declaration in Cython interface for regex_program::create (#13054) @davidwendt
[REVIEW] Fix branch version in CI scripts (#13029) @galipremsagar
Fix OOB memory access in CSV reader when reading without NA values (#13011) @vuule
Fix read_avro() skip_rows and num_rows. (#12912) @tpn
Purge nonempty nulls from byte_cast list outputs. (#11971) @bdice
Fix consumption of CPU-backed interchange protocol dataframes (#11392) @shwina

🚀 New Features

Remove numba JIT kernel usage from dataframe copy tests (#13385) @brandon-b-miller
Add JNI for ORC/Parquet writer compression statistics (#13376) @ttnghia
Use _compile_or_get in JIT groupby apply (#13350) @brandon-b-miller
cuDF numba cuda 12 updates (#13337) @brandon-b-miller
Add tz_convert method to convert between timestamps (#13328) @shwina
Optionally return compression statistics from ORC and Parquet writers (#13294) @vuule
Support the case=False argument to str.contains (#13290) @shwina
Add an event handler for ColumnVector.close (#13279) @abellina
JNI api for cudf::chunked_pack (#13278) @abellina
Implement a chunked_pack API (#13260) @abellina
Update cudf recipes to use GTest version to >=1.13 (#13207) @robertmaynard
JNI changes for range-extents in window functions. (#13199) @mythrocks
Add support for DatetimeTZDtype and tz_localize (#13163) @shwina
Add IS_NULL operator to AST (#13145) @karthikeyann
STRING order-by column for RANGE window functions (#13143) @mythrocks
Update contains_table to experimental row hasher and equality comparator (#13119) @divyegala
Automatically select GroupBy.apply algorithm based on if the UDF is jittable (#13113) @brandon-b-miller
Refactor Parquet chunked writer (#13076) @ttnghia
Add Python bindings for string literal support in AST (#13073) @karthikeyann
Add Java bindings for string literal support in AST (#13072) @karthikeyann
Add string scalar support in AST (#13061) @karthikeyann
Log cuIO warnings using the libcudf logger (#13043) @vuule
Update mixed_join to use experimental row hasher and comparator (#13028) @divyegala
Support structs of lists in row lexicographic comparator (#13005) @ttnghia
Adding hostdevice_span that is a span createable from hostdevice_vector (#12981) @hyperbolic2346
Add nvtext::minhash function (#12961) @davidwendt
Support lists of structs in row lexicographic comparator (#12953) @ttnghia
Update join to use experimental row hasher and comparator (#12787) @divyegala
Implement Python drop_duplicates with cudf::stable_distinct. (#11656) @brandon-b-miller

🛠️ Improvements

Bump typing_extensions minimum version to 4.0.0 (#13618) @shwina
Drop extraneous dependencies from cudf conda recipe. (#13406) @bdice
Handle some corner-cases in indexing with boolean masks (#13402) @wence-
Add cudf::stable_distinct public API, tests, and benchmarks. (#13392) @bdice
[JNI] Pass this ColumnVector to the onClosed event handler (#13386) @abellina
Fix JNI method with mismatched parameter list (#13384) @ttnghia
Split up experimental_row_operator_tests.cu to improve its compile time (#13382) @davidwendt
Deprecate cudf::strings::slice_strings APIs that accept delimiters (#13373) @davidwendt
Remove UNKNOWN_NULL_COUNT (#13372) @vyasr
Move some nvtext benchmarks to nvbench (#13368) @davidwendt
run docs nightly too (#13366) @AyodeAwe
Add warning for default dtype parameter in get_dummies (#13365) @galipremsagar
Add log messages about kvikIO compatibility mode (#13363) @vuule
Switch back to using primary shared-action-workflows branch (#13362) @vyasr
Deprecate StringIndex and use Index instead (#13361) @galipremsagar
Ensure columns have valid null counts in CUDF JNI. (#13355) @mythrocks
Expunge most uses of TypeVar(bound="Foo") (#13346) @wence-
Remove all references to UNKNOWN_NULL_COUNT in Python (#13345) @vyasr
Improve distinct_count with cuco::static_set (#13343) @PointKernel
Fix contiguous_split performance (#13342) @ttnghia
Remove default UNKNOWN_NULL_COUNT from cudf::column member functions (#13341) @davidwendt
Update mypy to 1.3 (#13340) @wence-
[Java] Purge non-empty nulls when setting validity (#13335) @razajafri
Add row-wise filtering step to read_parquet (#13334) @rjzamora
Performance improvement for nvtext::minhash (#13333) @davidwendt
Fix some libcudf functions to ...

Contributors

robertmaynard, gmarkall, and 28 other contributors

Assets 2

09 Feb 16:14

raydouglass

v23.02.00

5ad4a85

v23.02.00

🚨 Breaking Changes

Pin dask and distributed for release (#12695) @galipremsagar
Change ways to access ptr in Buffer (#12587) @galipremsagar
Remove column names (#12578) @vuule
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Remove deprecated code for 23.02 (#12281) @vyasr
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Floor division uses integer division for integral arguments (#12131) @wence-

🐛 Bug Fixes

Fix a mask data corruption in UDF (#12647) @galipremsagar
pre-commit: Update isort version to 5.12.0 (#12645) @wence-
tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
Revert regex program java APIs and tests (#12639) @cindyyuanjiang
Fix leaks in ColumnVectorTest (#12625) @jlowe
Handle when spillable buffers own each other (#12607) @madsbk
Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
lists: Transfer dtypes correctly through list.get (#12586) @wence-
timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
partition_by_hash(): support index (#12554) @madsbk
Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
Update List Lexicographical Comparator (#12538) @divyegala
Dynamically read PTX version (#12534) @brandon-b-miller
build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
Loosen runtime arrow pinning (#12522) @vyasr
Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
Fix issues with parquet chunked reader (#12488) @nvdbaranec
Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
Rename libcudf substring source files to slice (#12484) @davidwendt
Fix compile issue with arrow 10 (#12465) @ttnghia
Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
Fix xfail incompatibilities (#12423) @vyasr
Fix bug in Parquet column index encoding (#12404) @etseidl
When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
Fix get_json_object to return empty column on empty input (#12384) @davidwendt
Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
Fix reductions any/all return value for empty input (#12374) @davidwendt
Fix debug compile errors in parquet.hpp (#12372) @davidwendt
Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
Use correct memory resource in io::make_column (#12364) @vyasr
Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
Fix NumericPairIteratorTest for float values (#12306) @davidwendt
Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
Change reductions any/all to return valid values for empty input (#12279) @davidwendt
Only exclude join keys that are indices from key columns (#12271) @wence-
Fix spill to device limit (#12252) @madsbk
Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
Fix page size calculation in Parquet writer (#12182) @etseidl
Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
Floor division uses integer division for integral arguments (#12131) @wence-

📖 Documentation

Fix link to NVTX (#12598) @sameerz
Include missing groupby functions in documentation (#12580) @quasiben
Fix documentation author (#12527) @bdice
Update libcudf reduction docs for casting output types (#12526) @davidwendt
Add JSON reader page in user guide (#12499) @GregoryKimball
Link unsupported iteration API docstrings (#12482) @galipremsagar
strings_udf doc update (#12469) @brandon-b-miller
Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
Update pre-commit hooks guide (#12395) @bdice
Update test docs to not use detail comparison utilities (#12332) @PointKernel
Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
Add eval to docs. (#12322) @vyasr
Turn on xfail_strict=true (#12244) @wence-
Update 10 minutes to cuDF (#12114) @wence-

🚀 New Features

Use kvikIO as the default IO backend (#12574) @vuule
Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
Add strings methods removeprefix and removesuffix (#12557) @davidwendt
Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Make string quoting optional on CSV write (#12539) @mythrocks
Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
one_hot_encode to use experimental row comparators (#12478) @divyegala
Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
Add JSON Writer (#12474) @karthikeyann
Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
JNI bindings to write CSV (#12425) @mythrocks
Nested JSON depth benchmark (#12371) @karthikeyann
Implement lists::reverse (#12336) @ttnghia
Use device_read in experimental read_json (#12314) @vuule
Implement JNI for strings::reverse (#12283) @ttnghia
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
Add cudf::strings::reverse function (#12227) @davidwendt
Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
Support replace in strings_udf (#12207) @brandon-b-miller
Add support to read binary encoded decimals in parquet (#12205) @PointKernel
Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
Add device buffer datasource (#12024) @PointKernel
Implement groupby apply with JIT (#11452) @bwyogatama

🛠️ Improvements

Update shared workflow branches (#12696) @ajschmidt8
Pin dask and distributed for release (#12695) @galipremsagar
Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
Change ways to access ptr in Buffer (#12587) @galipremsagar
Version a parquet writer xfail (#12579) @galipremsagar
Remove column names (#12578) @vuule
Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
Add support for category dtypes in CSV reader (#12571) @galipremsagar
Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
Optimize cudf::make_lists_column (#12547) @ttnghia
Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
Guard CUDA runtime APIs with error checking (#12531) @PointKernel
Update TODOs from issue 10432. (#12528) @bdice
Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Fix SUM/MEAN aggregation type support. (#12503) @bdice
Stop using pandas._testing (#12492) @vyasr
Fix ROLLING_TEST gtests coded in namespace cudf::test (#12490) @davidwendt
Fix erroneously skipped ORC ZSTD test (#12486) @vuule
Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt
Raise warnings as errors in the test suite (#12468) @v...

Contributors

benfred, robertmaynard, and 29 other contributors

Assets 2

08 Dec 19:18

GPUtester

v22.12.01

f700408

v22.12.01

🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Fix type promotion edge cases in numerical binops (#12074) @wence-
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Rollback of DeviceBufferLike (#12009) @madsbk
Remove unused managed_allocator (#12005) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Remove validation that requires introspection (#11938) @vyasr
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

strings_udf: use libcudf caching of character tables (#12343) @wence-
Fix include line for IO Cython modules (#12250) @vyasr
Make dask pinning looser (#12231) @vyasr
Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
Fix compression in ORC writer (#12194) @vuule
Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
Fix decimal binary operations (#12142) @galipremsagar
Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
Fix/disable jitify lto (#12122) @robertmaynard
Fix conditional_full_join benchmark (#12121) @GregoryKimball
Fix regex working-memory-size refactor error (#12119) @davidwendt
Add in negative size checks for columns (#12118) @revans2
Add JNI for substring without 'end' parameter. (#12113) @firestarman
Fix reading of CSV files with blank second row (#12098) @vuule
Fix an error in IO with GzipFile type (#12085) @galipremsagar
Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
Fix alignment of compressed blocks in ORC writer (#12077) @vuule
Fix singleton-range __setitem__ edge case (#12075) @wence-
Fix type promotion edge cases in numerical binops (#12074) @wence-
Force using old fmt in nvbench. (#12067) @vyasr
Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
Force black exclusions for pre-commit. (#12036) @bdice
Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
Fix maximum page size estimate in Parquet writer (#11962) @vuule
Fix local offset handling in bgzip reader (#11918) @upsj
Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
Fix type casting in Series.setitem (#11904) @wence-
Fix memcheck error in get_dremel_data (#11903) @davidwendt
Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
Fix writing of Parquet files with many fragments (#11869) @etseidl
Fix RangeIndex unary operators. (#11868) @vyasr
JNI Avoid NPE for reading host binary data (#11865) @revans2
Fix decimal benchmark input data generation (#11863) @karthikeyann
Fix pre-commit copyright check (#11860) @galipremsagar
Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
add V2 page header support to parquet reader (#11778) @etseidl
Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
Add symlinks to notebooks. (#12128) @bdice
Add truncate API to python doc pages (#12109) @galipremsagar
Update Numba docs links. (#12107) @bdice
Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
Add pivot_table and crosstab to docs. (#12014) @bdice
Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
Rename libcudf++ to libcudf. (#11953) @bdice
Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
Support + in strings_udf (#12117) @brandon-b-miller
Support upper and lower in strings_udf (#12099) @brandon-b-miller
Add wheel builds (#12096) @vyasr
Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
Mark nvcomp zstd compression stable (#12059) @jbrennan333
Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
Enable building against the libarrow contained in pyarrow (#12034) @vyasr
Add strings like jni and native method (#12032) @cindyyuanjiang
Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
byte_range support for JSON Lines format (#12017) @karthikeyann
Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
Implement JNI for chunked Parquet reader (#11961) @ttnghia
Add method argument to DataFrame.quantile (#11957) @rjzamora
Add gpu memory watermark apis to JNI (#11950) @abellina
Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Enable CEC for strings_udf (#11884) @brandon-b-miller
ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
Implement chunked Parquet reader (#11867) @ttnghia
Add read_orc_metadata to libcudf (#11815) @vuule
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk
Pin dask and distributed for release (#12165) @galipremsagar
Don't rely on GNU find in headers_test.sh (#12164) @wence-
Update cp.clip call (#12148) @quasiben
Enable automatic column projection in groupby().agg (#12124) @rjzamora
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Spilling to host memory (#12106) @madsbk
First pass of pd.read_orc changes in tests (#12103) @galipremsagar
Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
Remove CUDA 10 compatibility code. (#12088) @bdice
Move and update dask nigthly install in CI (#12082) @galipremsagar
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Remove macros that inspect the contents of exceptions (#12076) @vyasr
Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
Remove overflow err...

Contributors

trxcllnt, robertmaynard, and 32 other contributors

Assets 2

08 Dec 15:20

GPUtester

v22.12.00

baae3a6

v22.12.00

🚨 Breaking Changes

Add JNI for substring without 'end' parameter. (#12113) @firestarman
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Fix type promotion edge cases in numerical binops (#12074) @wence-
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Rollback of DeviceBufferLike (#12009) @madsbk
Remove unused managed_allocator (#12005) @vyasr
Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
Remove validation that requires introspection (#11938) @vyasr
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

🐛 Bug Fixes

Fix include line for IO Cython modules (#12250) @vyasr
Make dask pinning looser (#12231) @vyasr
Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
Fix compression in ORC writer (#12194) @vuule
Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
Fix decimal binary operations (#12142) @galipremsagar
Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
Fix/disable jitify lto (#12122) @robertmaynard
Fix conditional_full_join benchmark (#12121) @GregoryKimball
Fix regex working-memory-size refactor error (#12119) @davidwendt
Add in negative size checks for columns (#12118) @revans2
Add JNI for substring without 'end' parameter. (#12113) @firestarman
Fix reading of CSV files with blank second row (#12098) @vuule
Fix an error in IO with GzipFile type (#12085) @galipremsagar
Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
Fix alignment of compressed blocks in ORC writer (#12077) @vuule
Fix singleton-range __setitem__ edge case (#12075) @wence-
Fix type promotion edge cases in numerical binops (#12074) @wence-
Force using old fmt in nvbench. (#12067) @vyasr
Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
Force black exclusions for pre-commit. (#12036) @bdice
Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
Fix maximum page size estimate in Parquet writer (#11962) @vuule
Fix local offset handling in bgzip reader (#11918) @upsj
Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
Fix type casting in Series.setitem (#11904) @wence-
Fix memcheck error in get_dremel_data (#11903) @davidwendt
Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
Fix writing of Parquet files with many fragments (#11869) @etseidl
Fix RangeIndex unary operators. (#11868) @vyasr
JNI Avoid NPE for reading host binary data (#11865) @revans2
Fix decimal benchmark input data generation (#11863) @karthikeyann
Fix pre-commit copyright check (#11860) @galipremsagar
Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
add V2 page header support to parquet reader (#11778) @etseidl
Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

📖 Documentation

Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
Add symlinks to notebooks. (#12128) @bdice
Add truncate API to python doc pages (#12109) @galipremsagar
Update Numba docs links. (#12107) @bdice
Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
Add pivot_table and crosstab to docs. (#12014) @bdice
Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
Rename libcudf++ to libcudf. (#11953) @bdice
Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
Add developer docs for writing tests (#11199) @vyasr

🚀 New Features

Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
Support + in strings_udf (#12117) @brandon-b-miller
Support upper and lower in strings_udf (#12099) @brandon-b-miller
Add wheel builds (#12096) @vyasr
Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
Mark nvcomp zstd compression stable (#12059) @jbrennan333
Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
Enable building against the libarrow contained in pyarrow (#12034) @vyasr
Add strings like jni and native method (#12032) @cindyyuanjiang
Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
byte_range support for JSON Lines format (#12017) @karthikeyann
Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
Implement JNI for chunked Parquet reader (#11961) @ttnghia
Add method argument to DataFrame.quantile (#11957) @rjzamora
Add gpu memory watermark apis to JNI (#11950) @abellina
Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
Enable CEC for strings_udf (#11884) @brandon-b-miller
ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
Implement chunked Parquet reader (#11867) @ttnghia
Add read_orc_metadata to libcudf (#11815) @vuule
Support nested types as groupby keys in libcudf (#11792) @PointKernel
Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

🛠️ Improvements

Reduce number of tests marked spilling (#12197) @madsbk
Pin dask and distributed for release (#12165) @galipremsagar
Don't rely on GNU find in headers_test.sh (#12164) @wence-
Update cp.clip call (#12148) @quasiben
Enable automatic column projection in groupby().agg (#12124) @rjzamora
Refactor purge_nonempty_nulls (#12111) @ttnghia
Create an int8 column in read_csv when all elements are missing (#12110) @vuule
Spilling to host memory (#12106) @madsbk
First pass of pd.read_orc changes in tests (#12103) @galipremsagar
Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
Remove CUDA 10 compatibility code. (#12088) @bdice
Move and update dask nigthly install in CI (#12082) @galipremsagar
Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to "ALWAYS" (#12080) @vuule
Remove macros that inspect the contents of exceptions (#12076) @vyasr
Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
Remove overflow error during decimal binops (#12063) @galipremsagar
Change cudf::detail::...

Contributors

trxcllnt, robertmaynard, and 32 other contributors

Assets 2

09 Feb 16:17

rapids-bot

v23.02.00a

480b4cc

[NIGHTLY] v23.02.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Pin dask and distributed for release (#12695) @galipremsagar
Change ways to access ptr in Buffer (#12587) @galipremsagar
Remove column names (#12578) @vuule
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Upgrade to arrow-10.0.1 (#12327) @galipremsagar
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
Remove deprecated code for 23.02 (#12281) @vyasr
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
Remove JIT type names, refactor id_to_type. (#12158) @bdice
Floor division uses integer division for integral arguments (#12131) @wence-

🐛 Bug Fixes

Fix update-version.sh (#12745) @raydouglass
Fix a mask data corruption in UDF (#12647) @galipremsagar
pre-commit: Update isort version to 5.12.0 (#12645) @wence-
tests: Skip cuInit tests if cuda-gdb is not found or not working (#12644) @wence-
Revert regex program java APIs and tests (#12639) @cindyyuanjiang
Fix leaks in ColumnVectorTest (#12625) @jlowe
Handle when spillable buffers own each other (#12607) @madsbk
Fix incorrect null counts for sliced columns in JCudfSerialization (#12589) @jlowe
lists: Transfer dtypes correctly through list.get (#12586) @wence-
timedelta: Don't go via float intermediates for floordiv (#12585) @wence-
Fixing BUG, get_next_chunk() should use the blocking function device_read() (#12584) @madsbk
Make JNI QuoteStyle accessible outside ai.rapids.cudf (#12572) @mythrocks
partition_by_hash(): support index (#12554) @madsbk
Mixed Join benchmark bug due to wrong conditional column (#12553) @divyegala
Update List Lexicographical Comparator (#12538) @divyegala
Dynamically read PTX version (#12534) @brandon-b-miller
build.sh switch to use RAPIDS magic value (#12525) @robertmaynard
Loosen runtime arrow pinning (#12522) @vyasr
Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
Fix issues with parquet chunked reader (#12488) @nvdbaranec
Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
Rename libcudf substring source files to slice (#12484) @davidwendt
Fix compile issue with arrow 10 (#12465) @ttnghia
Fix List offsets bug in mixed type list column in nested JSON reader (#12447) @karthikeyann
Fix xfail incompatibilities (#12423) @vyasr
Fix bug in Parquet column index encoding (#12404) @etseidl
When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
Fix get_json_object to return empty column on empty input (#12384) @davidwendt
Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
Fix reductions any/all return value for empty input (#12374) @davidwendt
Fix debug compile errors in parquet.hpp (#12372) @davidwendt
Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
Use correct memory resource in io::make_column (#12364) @vyasr
Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
Fix NumericPairIteratorTest for float values (#12306) @davidwendt
Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
Reconstruct dtypes correctly for list aggs of struct columns (#12290) @wence-
Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
Change reductions any/all to return valid values for empty input (#12279) @davidwendt
Only exclude join keys that are indices from key columns (#12271) @wence-
Fix spill to device limit (#12252) @madsbk
Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
Fix page size calculation in Parquet writer (#12182) @etseidl
Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
Floor division uses integer division for integral arguments (#12131) @wence-

📖 Documentation

Fix link to NVTX (#12598) @sameerz
Include missing groupby functions in documentation (#12580) @quasiben
Fix documentation author (#12527) @bdice
Update libcudf reduction docs for casting output types (#12526) @davidwendt
Add JSON reader page in user guide (#12499) @GregoryKimball
Link unsupported iteration API docstrings (#12482) @galipremsagar
strings_udf doc update (#12469) @brandon-b-miller
Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
Update pre-commit hooks guide (#12395) @bdice
Update test docs to not use detail comparison utilities (#12332) @PointKernel
Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
Add eval to docs. (#12322) @vyasr
Turn on xfail_strict=true (#12244) @wence-
Update 10 minutes to cuDF (#12114) @wence-

🚀 New Features

Use kvikIO as the default IO backend (#12574) @vuule
Use has_nonempty_nulls instead of may_contain_non_empty_nulls in superimpose_nulls and push_down_nulls (#12560) @ttnghia
Add strings methods removeprefix and removesuffix (#12557) @davidwendt
Add regex_program java APIs and unit tests (#12548) @cindyyuanjiang
Default cudf::io::read_json to nested JSON parser (#12544) @vuule
Make string quoting optional on CSV write (#12539) @mythrocks
Use new nvCOMP API to optimize the compression temp memory size (#12533) @vuule
Support "values" orient (array of arrays) in Nested JSON reader (#12498) @karthikeyann
one_hot_encode to use experimental row comparators (#12478) @divyegala
Support %W and %w format specifiers in cudf::strings::to_timestamps (#12475) @davidwendt
Add JSON Writer (#12474) @karthikeyann
Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
Add trailing comma support for nested JSON reader (#12448) @karthikeyann
Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
JNI bindings to write CSV (#12425) @mythrocks
Nested JSON depth benchmark (#12371) @karthikeyann
Implement lists::reverse (#12336) @ttnghia
Use device_read in experimental read_json (#12314) @vuule
Implement JNI for strings::reverse (#12283) @ttnghia
Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
Add cudf::strings::reverse function (#12227) @davidwendt
Selectively use dictionary encoding in Parquet writer (#12211) @etseidl
Support replace in strings_udf (#12207) @brandon-b-miller
Add support to read binary encoded decimals in parquet (#12205) @PointKernel
Support regex EOL where the string ends with a new-line character (#12181) @davidwendt
Updating stream_compaction/unique to use new row comparators (#12159) @divyegala
Add device buffer datasource (#12024) @PointKernel
Implement groupby apply with JIT (#11452) @bwyogatama

🛠️ Improvements

Update shared workflow branches (#12696) @ajschmidt8
Pin dask and distributed for release (#12695) @galipremsagar
Don't upload libcudf-example to Anaconda.org (#12671) @ajschmidt8
Pin wheel dependencies to same RAPIDS release (#12659) @sevagh
Use CTK 118/cp310 branch of wheel workflows (#12602) @sevagh
Change ways to access ptr in Buffer (#12587) @galipremsagar
Version a parquet writer xfail (#12579) @galipremsagar
Remove column names (#12578) @vuule
Parquet reader optimization to address V100 regression. (#12577) @nvdbaranec
Add support for category dtypes in CSV reader (#12571) @galipremsagar
Remove spill_lock parameter from SpillableBuffer.get_ptr() (#12564) @madsbk
Optimize cudf::make_lists_column (#12547) @ttnghia
Remove cudf::strings::repeat_strings_output_sizes from Java and JNI (#12546) @ttnghia
Test that cuInit is not called when RAPIDS_NO_INITIALIZE is set (#12545) @wence-
Rework repeat_strings to use sizes-to-offsets utility (#12543) @davidwendt
Replace exclusive_scan with sizes_to_offsets in cudf::lists::sequences (#12541) @davidwendt
Rework nvtext::ngrams_tokenize to use sizes-to-offsets utility (#12540) @davidwendt
Fix binary-ops gtests coded in namespace cudf::test (#12536) @davidwendt
More @acquire_spill_lock() and as_buffer(..., exposed=False) (#12535) @madsbk
Guard CUDA runtime APIs with error checking (#12531) @PointKernel
Update TODOs from issue 10432. (#12528) @bdice
Update rapids-cmake definitions version in GitHub Actions style checks. (#12511) @bdice
Switch engine=cudf to the new JSON reader (#12509) @galipremsagar
Fix SUM/MEAN aggregation type support. (#12503) @bdice
Stop using pandas._testing (#12492) @vyasr
Fix ROLLING_TEST gtests coded in namespace cudf::test...

Contributors

benfred, robertmaynard, and 30 other contributors

Assets 2

03 Nov 17:32

GPUtester

v22.10.01

d90f7e9

v22.10.01

🚨 Breaking Changes

Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Disable nvCOMP DEFLATE integration (#11811) @vuule
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Upgrade pandas to 1.5 (#11617) @galipremsagar
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
Adding optional parquet reader schema (#11524) @hyperbolic2346
Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
Disable Arrow S3 support by default. (#11470) @bdice
Convert thrust::optional usages to std::optional (#11455) @robertmaynard
Remove unused is_struct trait. (#11450) @bdice
Refactor the Buffer class (#11447) @madsbk
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
Use the new JSON parser when the experimental reader is selected (#11364) @vuule
Remove deprecated Series.applymap. (#11031) @bdice
Remove deprecated expand parameter from str.findall. (#11030) @bdice

🐛 Bug Fixes

Update cuda-python dependency to 11.7.1 (#11994) @shwina
Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
Handle ptx file paths during strings_udf import (#11862) @galipremsagar
Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
Fix is_valid checks in Scalar._binaryop (#11818) @wence-
Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
Disable nvCOMP DEFLATE integration (#11811) @vuule
Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
Fix ORC string sum statistics (#11740) @vuule
Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
Don't assume stream is a compile-time constant expression (#11725) @vyasr
Fix get_thrust.cmake format at patch command (#11715) @davidwendt
Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
Fix compile error due to missing header (#11697) @ttnghia
Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
Transfer correct dtype to exploded column (#11687) @wence-
Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
Maintain the index name after .loc (#11677) @shwina
Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
Fix multi-file remote datasource bug (#11655) @rjzamora
Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
fixes overflows in benchmarks (#11649) @elstehle
Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
Update zfill to match Python output (#11634) @davidwendt
Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
Fix host scalars construction of nested types (#11612) @galipremsagar
Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
Add is_timestamp test for leap second (60) (#11594) @davidwendt
Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
Fix exception in segmented-reduce benchmark (#11588) @davidwendt
Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
Correct distribution data type in quantiles benchmark (#11584) @vuule
Fix multibyte_split benchmark for host buffers (#11583) @upsj
xfail custreamz display test for now (#11567) @shwina
Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
Fix groupby failures in dask_cudf CI (#11561) @rjzamora
Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
Fix regex quantifier check to include capture groups (#11373) @davidwendt
Fix read_text when byte_range is aligned with field (#11371) @upsj
Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
column: calculate null_count before release()ing the cudf::column (#11365) @wence-

📖 Documentation

Update guide-to-udfs notebook (#11861) @brandon-b-miller
Update docstring for cudf.read_text (#11799) @GregoryKimball
Add doc section for list & struct handling (#11770) @galipremsagar
Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
Enable more Pydocstyle rules (#11582) @bdice
Remove unused cpp/img folder (#11554) @davidwendt
Publish C++ developer docs (#11475) @vyasr
Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
Update contributing doc to include links to the developer guides (#11390) @davidwendt
Fix table_view_base doxygen format (#11340) @davidwendt
Create main developer guide for Python (#11235) @vyasr
Add developer documentation for benchmarking (#11122) @vyasr
cuDF error handling document (#7917) @isVoid

🚀 New Features

Add hasNull statistic reading ability to ORC (#11747) @devavret
Add istitle to string UDFs (#11738) @brandon-b-miller
JSON Column creation in GPU (#11714) @karthikeyann
Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
Add BGZIP data_chunk_reader (#11652) @upsj
Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
Generic type casting to support the new nested JSON reader (#11613) @elstehle
JSON tree traversal (#11610) @karthikeyann
Add casting operators to masked UDFs (#11578) @brandon-b-miller
Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
Add strings 'like' function (#11558) @davidwendt
Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
Adds support for json lines format to the nested JSON reader (#11534) @elstehle
Adding optional parquet reader schema (#11524) @hyperbolic2346
Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
Add gdb pretty-printers for simple types (#...

Contributors

rgommers, trxcllnt, and 33 other contributors

Assets 2

Releases: rapidsai/cudf

v23.06.01

🚨 Breaking Changes

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors

v23.06.00

🚨 Breaking Changes

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors

v23.04.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v23.04.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

[NIGHTLY] v23.06.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

🚀 New Features

🛠️ Improvements

Contributors

v23.02.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v22.12.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v22.12.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

[NIGHTLY] v23.02.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v22.10.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors