Release v0.19.2 · rapidsai/cudf

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

unsnap: busy wait a number of cycles (#8073) @vuule
Fix returned column type when extracting from an empty list column (#8031) @jlowe
Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
Adds explode API (#7607) @isVoid
Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
Implement cudf::label_bins() (#7554) @vyasr
Add Python bindings for lists::contains (#7547) @skirui-source
cudf::row_bit_count() support. (#7534) @nvdbaranec
Implement drop_list_duplicates (#7528) @ttnghia
Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Enable type conversion from float to decimal type (#7450) @ChrisJar
Add cython for converting strings/fixed-point functions (#7429) @davidwendt
Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
Implement groupby collect_set (#7420) @ttnghia
Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
Refactor strings column factories (#7397) @harrism
Add groupby scan operations (sort groupby) (#7387) @karthikeyann
Add cudf::explode_position (#7376) @hyperbolic2346
Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
Add Series.drop api (#7304) @isVoid
get_json_object() implementation (#7286) @nvdbaranec
Python API for LIstMethods.len() (#7283) @isVoid
Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
Fix inplace update of data and add Series.update (#7201) @galipremsagar
Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

🛠️ Improvements

fix GDS include path for version 0.95 (#7877) @rongou
Update dask + distributed to 2021.4.0 (#7858) @jakirkham
Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
Add USE_GDS as an option in build script (#7833) @pxLi
add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
Revert dask versioning of concat dispatch (#7823) @galipremsagar
add copy methods in Java memory buffer (#7791) @rongou
Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
Allow hash_partition to take a seed value (#7771) @magnatelee
Turn on NVTX by default in java build (#7761) @tgravescs
Add Java bindings to join gather map APIs (#7751) @jlowe
Add replacements column support for Java replaceNulls (#7750) @jlowe
Add Java bindings for row_bit_count (#7749) @jlowe
Remove unused JVM array creation (#7748) @jlowe
Added JNI support for new is_integer (#7739) @revans2
Create and promote library aliases in libcudf installations (#7734) @trxcllnt
Support groupby operations for decimal dtypes (#7731) @vyasr
Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
Use stream in groupby calls (#7705) @karthikeyann
Update codeowners file (#7701) @ajschmidt8
Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
Misc Python/Cython optimizations (#7686) @shwina
Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
Add column_device_view to orc writer (#7676) @kaatish
cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
Feature/optimize accessor copy (#7660) @vyasr
Fix find_package(cudf) (#7658) @trxcllnt
Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
Add in JNI support for count_elements (#7651) @revans2
Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
Add in JNI support for table partition (#7637) @revans2
Add explicit fixed_point merge test (#7635) @codereport
Add JNI support for IDENTITY hash partitioning (#7626) @revans2
Java support on explode_outer (#7625) @sperlingxx
Java support of casting string from/to decimal (#7623) @sperlingxx
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
Add gbenchmarks for string substrings functions (#7603) @davidwendt
Refactor string conversion check (#7599) @ttnghia
JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
Fix auto-detecting GPU architectures (#7593) @trxcllnt
Reduce cudf library size (#7583) @robertmaynard
Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
Add gbenchmark for strings::concatenate (#7560) @davidwendt
Update Changelog Link (#7550) @ajschmidt8
Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
Add __repr__ for Column and ColumnAccessor (#7531) @shwina
Support Decimal DIV changes in cudf (#7527) @razajafri
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
Add gbenchmarks for strings extract function (#7522) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Reduce compile time/size for scan.cu (#7516) @davidwendt
Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
Removed unneeded includes from traits.hpp (#7509) @davidwendt
FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
JNI bit cast (#7493) @revans2
Combine rolling window function tests (#7480) @mythrocks
Prepare Changelog for Automation (#7477) @ajschmidt8
Java support for explode position (#7471) @sperlingxx
Update 0.18 changelog entry (#7463) @ajschmidt8
JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
Join APIs that return gathermaps (#7454) @shwina
Remove dependence on managed memory for multimap test (#7451) @jrhemstad
Use cuFile for Parquet IO when available (#7444) @vuule
Statistics cleanup (#7439) @kaatish
Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Improve string gather performance (#7433) @jlowe
Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
Detail APIs for datetime functions (#7430) @magnatelee
Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Simplify type dispatch with device_storage_dispatch (#7419) @codereport
Java support for casting of nested child columns (#7417) @razajafri
Improve scalar string replace performance for long strings (#7415) @jlowe
Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
bitmask_or implementation with bitmask refactor (#7406) @rwlee
Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
Clean up included headers in device_operators.cuh (#7401) @codereport
Move nullable index iterator to indexalator factory (#7399) @davidwendt
ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
Add gbenchmark for strings find/contains functions (#7392) @davidwendt
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
Added in JNI support for out of core sort algorithm (#7381) @revans2
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
jitify 2 support (#7372) @cwharris
compile_udf: Cache PTX for similar functions (#7371) @gmarkall
Add string scalar replace benchmark (#7369) @jlowe
Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
Update orc reader and writer fuzz tests (#7357) @galipremsagar
Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast Small Refactorings (#7352) @codereport
Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
Change block size parameter from a global to a template param. (#7333) @nvdbaranec
Partial clean up of ORC writer (#7324) @vuule
Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
Use string literals in fixed_point release_asserts (#7303) @codereport
Fix merge conflicts for #7295 (#7297) @ajschmidt8
Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
Refactor dictionary support for reductions any/all (#7242) @davidwendt
Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
Interval index and interval_range (#7182) @marlenezw
avro reader integration tests (#7156) @cwharris
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
Adding Interval Dtype (#6984) @marlenezw
Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.19.2

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements