Releases · rapidsai/cudf

03 Dec 19:17

GPUtester

v21.12.00

f1ef2d2

v21.12.00

🚨 Breaking Changes

Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Remove sizeof and standardize on memory_usage (#9544) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Refactor sorting APIs (#9464) @vyasr
Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
Various internal MultiIndex improvements (#9243) @vyasr

🐛 Bug Fixes

Fix read_parquet bug for bytes input (#9669) @rjzamora
Use _gather internal for sort_* (#9668) @isVoid
Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
Dont recompute output size if it is already available (#9649) @abellina
Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
Fix debrotli issue on CUDA 11.5 (#9632) @vuule
Use std::size_t when computing join output size (#9626) @jlowe
Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar
Add support for string 'nan', 'inf' & '-inf' values while type-casting to float (#9613) @galipremsagar
Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar
compile libnvcomp with PTDS if requested (#9540) @jbrennan333
Fix segmented_gather() for null LIST rows (#9537) @mythrocks
Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca
Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev
Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
Fix regex handling of embedded null characters (#9470) @davidwendt
Fix memcheck error in copy-if-else (#9467) @davidwendt
Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
Preserve the decimal scale when creating a default scalar (#9449) @revans2
Push down parent nulls when flattening nested columns. (#9443) @mythrocks
Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
Allow int-like objects for the decimals argument in round (#9428) @shwina
Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia
Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar
Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
Fix the crash in stats code (#9368) @devavret
Make Series.hash_encode results reproducible. (#9366) @bdice
Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar
Use f-string in join helper warning message. (#9325) @bdice
Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
Fix null count in statistics for parquet (#9303) @devavret
Potential overflow of decimal32 when casting to int64_t (#9287) @codereport
Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid
BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

📖 Documentation

Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport
Add dedicated page for StringHandling in python docs (#9624) @galipremsagar
Update docstring of DataFrame.merge (#9572) @galipremsagar
Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
Add example to docstrings in rolling.apply (#9522) @isVoid
Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
Improve Python docstring formatting. (#9493) @bdice
Update table of I/O supported types (#9476) @vuule
Document invalid regex patterns as undefined behavior (#9473) @davidwendt
Miscellaneous documentation fixes to cudf (#9471) @galipremsagar
Fix many documentation errors in libcudf. (#9355) @karthikeyann
Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
Improved deprecation warnings. (#9347) @bdice
doc reorder mr, stream to stream, mr (#9308) @karthikeyann
Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
Added deprecation warning for .label_encoding() (#9289) @mayankanand007

🚀 New Features

Enable Series.divide and DataFrame.divide (#9630) @vyasr
Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar
Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia
Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia
Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard
Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
Support args= in apply (#9514) @brandon-b-miller
Add groupby scan min/max support for strings values (#9502) @davidwendt
Add list output option to character_ngrams() function (#9499) @davidwendt
More granular column selection in ORC reader (#9496) @vuule
add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
Implement Series.datetime.floor (#9488) @skirui-source
Enable linting of CMake files using pre-commit (#9484) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Augment order_by to Accept a List of null_precedence (#9455) @isVoid
Add format API for list column of strings (#9454) @davidwendt
Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
Add cudf python groupby.diff (#9446) @karthikeyann
Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia
add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
Support Unary Operations in Masked UDF (#9409) @isVoid
Move Several Series Function to Frame (#9394) @isVoid
MD5 Python hash API (#9390) @bdice
Add cudf strings is_title API (#9380) @davidwendt
Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
Add support for writing ORC with map columns (#9369) @vuule
extract_list_elements() with column_view indices (#9367) @mythrocks
Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
Add na_position param to dask-cudf sort_values (#9264) @charlesbluca
Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca
New array conversion methods (#9236) @vyasr
Series apply method backed by masked UDFs (#9217) @brandon-b-miller
Grouping by frequency and resampling (#9178) @shwina
Pure-python masked UDFs (#9174) @brandon-b-miller
Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

🛠️ Improvements

Followup to PR 9088 comments (#9659) @cwharris
Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
Add 11.5 dev.yml to cudf (#9617) @galipremsagar
Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar
remove deprecated Rmm.initialize method (#9607) @rongou
Use HostColumnVectorCore for ch...

Contributors

robertmaynard, rongou, and 37 other contributors

Assets 2

12 Oct 20:19

GPUtester

v21.10.01

a1d2d13

v21.10.01

Assets 2

06 Oct 15:51

GPUtester

v21.10.00

072fd86

v21.10.00

🚨 Breaking Changes

Remove Cython APIs for table view generation (#9199) @vyasr
Upgrade pandas version in cudf (#9147) @galipremsagar
Make AST operators nullable (#9096) @vyasr
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
Support additional format specifiers in from_timestamps (#9047) @davidwendt
Expose expression base class publicly and simplify public AST API (#9045) @vyasr
Add support for struct type in ORC writer (#9025) @vuule
Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
Java bindings for conditional join output sizes (#9002) @jlowe
Move compute_column API out of ast namespace (#8957) @vyasr
cudf.dtype function (#8949) @shwina
Refactor Frame reductions (#8944) @vyasr
Add nested column selection to parquet reader (#8933) @devavret
JNI Aggregation Type Changes (#8919) @revans2
Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar
Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
Make groupby transform-like op order match original data order (#8720) @isVoid

🐛 Bug Fixes

fixed_point cudf::groupby for mean aggregation (#9296) @codereport
Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia
Update nvcomp to include fixes for installation of headers (#9276) @devavret
Fix Java column leak in testParquetWriteMap (#9271) @jlowe
Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar
Dataframe.sort_index optimizations (#9238) @galipremsagar
Temporarily disabling problematic test in parquet writer (#9230) @devavret
Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
Fix gather for sliced input structs column (#9218) @ttnghia
Fix JNI code for left semi and anti joins (#9207) @jlowe
Only install thrust when using a non 'system' version (#9206) @robertmaynard
Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks
get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca
Approximate overflow detection in ORC statistics (#9163) @vuule
Use decimal precision metadata when reading from parquet files (#9162) @shwina
Fix variable name in Java build script (#9161) @jlowe
Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
Fix conditional joins with empty left table (#9146) @vyasr
Fix joining on indexes with duplicate level names (#9137) @shwina
Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
Apply type metadata after column is slice-copied (#9131) @isVoid
Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
Support null literals in expressions (#9117) @vyasr
Fix cudf::hash_join output size for struct joins (#9107) @jlowe
Import fix (#9104) @shwina
Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks
Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
Preserve float16 upscaling (#9069) @galipremsagar
Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
Various multiindex related fixes (#9036) @shwina
Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar
cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar
Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick
Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca
Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar
Fix ORC reading of files with struct columns that have null values (#9005) @vuule
Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
Fix debug compile error for csv_test.cpp (#8981) @davidwendt
Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar
Java conditional joins should not require matching column counts (#8955) @jlowe
Fix concatenate empty structs (#8947) @sperlingxx
Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca
cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca
Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick
Support casting with Pandas dtype aliases (#8920) @sarahyurick
Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick
Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
Fix libcudf memory errors (#8884) @karthikeyann
Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
replace auto with auto& ref for cast<&> (#8866) @karthikeyann
Add missing include<optional> in binops (#8864) @karthikeyann
Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick
Re-enable JSON tests (#8843) @vuule
Support header with embedded delimiter in csv writer (#8798) @davidwendt

📖 Documentation

Add IO docs page in cudf documentation (#9145) @galipremsagar
use correct namespace in cuio code examples (#9037) @cwharris
Restructuring Contributing doc (#9026) @iskode
Update stable version in readme (#9008) @galipremsagar
Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
List GDS-enabled formats in the docs (#8805) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar

🚀 New Features

Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller
Add struct type support for drop_list_duplicates (#9202) @ttnghia
support CUDA async memory resource in JNI (#9201) @rongou
Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
Superimpose null masks for STRUCT columns. (#9144) @mythrocks
Implemented bindings for ceil timestamp operation (#9141) @shaneding
Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia
Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid
Make AST operators nullable (#9096) @vyasr
Java bindings for approx_percentile (#9094) @andygrove
Add dseries.struct.explode (#9086) @isVoid
Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
Support nested types for nth_element reduction (#9043) @sperlingxx
Update sort groupby to use non-atomic operation (#9035) @karthikeyann
Add support for struct type in ORC writer (#9025) @vuule
Implement interleave_columns for structs columns (#9012) @ttnghia
Add groupby first and last aggregations (#9004) @shwina
Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid
Python/Cython bindings for multibyte_split (#8998) @jdye64
Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid
Added Series.dt.is_month_end (#8989) @TravisHester
Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks
Implement timestamp ceil (#8942) @shaneding
Add nested column selection to parquet reader (#8933) @devavret
Expose conditional join size calculation (#8928) @vyasr
Support Nulls in Timeseries Generator (#8925) @isVoid
Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca
Add dot product binary op (#8909) @charlesbluca
Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid
Series string repeat (#8882) @sarahyurick
Python binding for quarters (#8862) @shaneding
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Add Java bindings for AST ...

Contributors

trxcllnt, robertmaynard, and 40 other contributors

Assets 2

16 Sep 19:23

GPUtester

v21.08.03

e4313b6

v21.08.03

Assets 2

06 Aug 20:26

GPUtester

v21.08.02

f6d31fa

v21.08.02

Assets 2

06 Aug 14:56

GPUtester

v21.08.01

e0a8114

v21.08.01

Assets 2

04 Aug 15:26

GPUtester

v21.08.00

106039c

v21.08.00

🚨 Breaking Changes

Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
Remove unused cudf::strings::create_offsets (#8663) @davidwendt
Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
Change default datetime index resolution to ns to match pandas (#8611) @vyasr
Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
String-to-boolean conversion is different from Pandas (#8549) @skirui-source
Add accurate hash join size functions (#8453) @PointKernel
Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
Remove special Index class from the general index class hierarchy (#8309) @vyasr
Add first-class dtype utilities (#8308) @vyasr
ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
Upgrade arrow to 4.0.1 (#7495) @galipremsagar

🐛 Bug Fixes

Fix contains check in string column (#8834) @galipremsagar
Remove unused variable from row_bit_count_test. (#8829) @mythrocks
Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
Handle empty child columns in row_bit_count() (#8791) @mythrocks
Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
Fix isort error in utils.pyx (#8771) @charlesbluca
Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
Fix issues with _CPackedColumns.serialize() handling of host and device data (#8759) @charlesbluca
Fix issues with MultiIndex in dropna, stack & reset_index (#8753) @galipremsagar
Write pandas extension types to parquet file metadata (#8749) @devavret
Fix where to handle DataFrame & Series input combination (#8747) @galipremsagar
Fix replace to handle null values correctly (#8744) @galipremsagar
Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
Fix cudf.Series constructor to handle list of sequences (#8735) @galipremsagar
Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
Fix orc reader assert on create data_type in debug (#8706) @davidwendt
Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
Bug fix: replace_nulls_policy functor not returning correct indices for gathermap (#8699) @isVoid
Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
Add post-processing steps to dask_cudf.groupby.CudfSeriesGroupby.aggregate (#8694) @charlesbluca
JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
Pin *arrow to use *cuda in run (#8651) @jakirkham
Add proper support for tolerances in testing methods. (#8649) @vyasr
Support multi-char case conversion in capitalize function (#8647) @davidwendt
Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
Temporarily disable libcudf example build tests (#8642) @isVoid
Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
Fix bug that columns only initialized once when specified columns and index in dataframe ctor (#8628) @isVoid
Propagate **kwargs through to as_*_column methods (#8618) @shwina
Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
Fix missed renumbering of Aggregation values (#8600) @revans2
Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
Apply metadata to keys before returning in Frame._encode (#8560) @charlesbluca
Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
String-to-boolean conversion is different from Pandas (#8549) @skirui-source
Fix __repr__ output with display.max_rows is None (#8547) @galipremsagar
Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
Properly retrieve last column when -1 is specified for column index (#8529) @isVoid
Fix importing apply from dask (#8517) @galipremsagar
Fix offset of the string dictionary length stream (#8515) @vuule
Fix double counting of selected columns in CSV reader (#8508) @ochan1
Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
Disallow groupby aggs for StructColumns (#8499) @charlesbluca
Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
Adding support for writing empty dataframe (#8490) @shaneding
Fix exclusive scan when including nulls and improve testing (#8478) @harrism
Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
Add nightly version for ucx-py in ci script (#8419) @galipremsagar
Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca

📖 Documentation

Update Python UDFs notebook (#8810) @brandon-b-miller
Fix dask.dataframe API docs links after reorg (#8772) @jsignell
Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
Custom Sphinx Extension: PandasCompat (#8643) @isVoid
Fix README.md (#8535) @ajschmidt8
Change namespace contains_nulls to struct (#8523) @davidwendt
Add info about NVTX ranges to dev guide (#8461) @jrhemstad
Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar

🚀 New Features

Fix concatenating structs (#8811) @shaneding
Implement JNI for groupby aggregations M2 and MERGE_M2 (#8763) @ttnghia
Bump isort to 5.6.4 and remove isort overrides made for 5.0.7 (#8755) @charlesbluca
Implement __setitem__ for StructColumn (#8737) @shaneding
Add is_leap_year to DateTimeProperties and DatetimeIndex (#8736) @isVoid
Add struct.explode() method (#8729) @shwina
Add DataFrame.to_struct() method to convert a DataFrame to a struct Series (#8728) @shwina
Add support for list type in ORC writer (#8723) @vuule
Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
Add datetime::is_leap_year (#8711) @isVoid
Accessing struct columns from dask_cudf (#8675) @shaneding
Added pct_change to Series (#8650) @TravisHester
Add strings support to cudf::shift function (#8648) @davidwendt
Support Scatter struct_scalar (#8630) @isVoid
Struct scalar from host dictionary (#8629) @shaneding
Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
JNI support for capitalize (#8624) @firestarman
Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
Add NVBench in CMake (#8619) @PointKernel
Change default datetime index resolution to ns to match pandas (#8611) @vyasr
ListColumn __setitem__ (#8606) @brandon-b-miller
Implement groupby aggregations M2 and MERGE_M2 (#8605) @ttnghia
Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
Benchmark for strings::repeat_strings APIs (#8589) @ttnghia
Nested scalar support for copy if else (#8588) @gerashegalov
User specified decimal columns to float64 (#8587) @jdye64
Add get_element for struct column (#8578) @isVoid
Python changes for adding __getitem__ for struct (#8577) @shaneding
Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
Refactor tests/iterator_utilities.hpp functions (#8540) @ttnghia
Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
Decimal support csv reader (#8511) @elstehle
Add column type tests (#8505) @isVoid
Warn when downscaling decimal columns (#8492) @ChrisJar
Add JNI for strings::repeat_strings (#8491) @ttnghia
Add Index.get_loc for Numerical, String Index support (#8489) @isVoid
Expose half_up rounding in cuDF (#8477) @shwina
Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
Add str.edit_distance_matrix (#8463) @isVoid
Support const...

Contributors

trxcllnt, robertmaynard, and 48 other contributors

Assets 2

17 Jun 15:12

GPUtester

v21.06.01

101fc0f

v21.06.01

v21.06.01

Assets 2

09 Jun 17:23

GPUtester

v21.06.00

ae44046

v21.06.00

🚨 Breaking Changes

Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
Update ORC statistics API to use C++17 standard library (#8241) @vuule
Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
Groupby.shift c++ API refactor and python binding (#8131) @isVoid

🐛 Bug Fixes

Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
Compilation fix: Remove redefinition for std::is_same_v() (#8369) @mythrocks
Add backward compatibility for dask-cudf to work with other versions of dask (#8368) @galipremsagar
Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
Raise error when unsupported arguments are passed to dask_cudf.DataFrame.sort_values (#8349) @galipremsagar
Raise NotImplementedError for axis=1 in rank (#8347) @galipremsagar
Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
Update Java string concatenate test for single column (#8330) @tgravescs
Use empty_like in scatter (#8314) @revans2
Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
Update io util to convert path like object to string (#8275) @ayushdg
Fix result column types for empty inputs to rolling window (#8274) @mythrocks
Actually test equality in assert_groupby_results_equal (#8272) @shwina
CMake always explicitly specify a source files extension (#8270) @robertmaynard
Fix struct binary search and struct flattening (#8268) @ttnghia
Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
upgrade dlpack to 0.5 (#8262) @cwharris
Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
Fix incorrect assertion in Java concat (#8258) @sperlingxx
Copy nested types upon construction (#8244) @isVoid
Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
Clip decimal binary op precision at max precision (#8194) @ChrisJar

📖 Documentation

Add docstring for dask_cudf.read_csv (#8355) @galipremsagar
Fix cudf release version in readme (#8331) @galipremsagar
Fix structs column description in dev docs (#8318) @isVoid
Update readme with correct CUDA versions (#8315) @raydouglass
Add description of the cuIO GDS integration (#8293) @vuule
Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard

🚀 New Features

Add support merging b/w categorical data (#8332) @galipremsagar
Java: Support struct scalar (#8327) @sperlingxx
added _is_homogeneous property (#8299) @shaneding
Added decimal writing for CSV writer (#8296) @kaatish
Java: Support creating a scalar from utf8 string (#8294) @firestarman
Add Java API for Concatenate strings with separator (#8289) @tgravescs
strings::join_list_elements options for empty list inputs (#8285) @ttnghia
Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
add unit tests for lead/lag on list for row window (#8259) @wbo4958
Create a String column from UTF8 String byte arrays (#8257) @firestarman
Support scattering list_scalar (#8256) @isVoid
Implement lists::concatenate_list_elements (#8231) @ttnghia
Support for struct scalars. (#8220) @nvdbaranec
Add support for decimal types in ORC writer (#8198) @vuule
Support create lists column from a list_scalar (#8185) @isVoid
Groupby.shift c++ API refactor and python binding (#8131) @isVoid
Add groupby::replace_nulls(replace_policy) api (#7118) @isVoid

🛠️ Improvements

Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
Add aliases for string methods (#8353) @shwina
Update environment variable used to determine cuda_version (#8321) @ajschmidt8
JNI: Refactor the code of making column from scalar (#8310) @firestarman
Update CHANGELOG.md links for calver (#8303) @ajschmidt8
Merge branch-0.19 into branch-21.06 (#8302) @ajschmidt8
use address and length for GDS reads/writes (#8301) @rongou
Update cudfjni version to 21.06.0 (#8292) @pxLi
Update docs build script (#8284) @ajschmidt8
Make device_buffer streams explicit and enforce move construction (#8280) @harrism
Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
Update cudfjni version to 21.06 (#8267) @pxLi
support RMM aligned resource adapter in JNI (#8266) @rongou
Pass compiler environment variables to conda python build (#8260) @Ethyling
Remove abc inheritance from Serializable (#8254) @vyasr
Move more methods into SingleColumnFrame (#8253) @vyasr
Update ORC statistics API to use C++17 standard library (#8241) @vuule
Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
Correct unused parameters in the copying algorithms (#8232) @robertmaynard
IO statistics cleanup (#8191) @kaatish
Refactor of rolling_window implementation. (#8158) @nvdbaranec
Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
Column refactoring 2 (#8130) @vyasr
support space in workspace (#7956) @jolorunyomi
Support collect_set on rolling window (#7881) @sperlingxx

Assets 2

28 Apr 18:30

GPUtester

v0.19.2

ab3b3f6

v0.19.2

🚨 Breaking Changes

Allow hash_partition to take a seed value (#7771) @magnatelee
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Replace device_vector with device_uvector in null_mask (#7715) @harrism
Don't identify decimals as strings. (#7710) @vyasr
Fix Java Parquet write after writer API changes (#7655) @revans2
Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
Add struct support to parquet writer (#7461) @devavret
Join APIs that return gathermaps (#7454) @shwina
fixed_point + cudf::binary_operation API Changes (#7435) @codereport
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
Refactor strings column factories (#7397) @harrism
Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
Upgrade pandas to 1.2 (#7375) @galipremsagar
Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

🐛 Bug Fixes

unsnap: busy wait a number of cycles (#8073) @vuule
Fix returned column type when extracting from an empty list column (#8031) @jlowe
Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
Fix a NameError in meta dispatch API (#7996) @galipremsagar
Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
jitify direct-to-cubin compilation and caching. (#7919) @cwharris
Use dynamic cudart for nvcomp in java build (#7896) @abellina
fix "incompatible redefinition" warnings (#7894) @cwharris
cudf consistently specifies the cuda runtime (#7887) @robertmaynard
disable verbose output for jitify_preprocess (#7886) @cwharris
CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
Sort by index in groupby tests more consistently (#7802) @shwina
Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
Add decimal column handling in copy_type_metadata (#7788) @shwina
Add column names validation in parquet writer (#7786) @galipremsagar
Fix Java explode outer unit tests (#7782) @jlowe
Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
User resource fix for replace_nulls (#7769) @magnatelee
Fix type dispatch for columnar replace_nulls (#7768) @jlowe
Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
Fix slicing and arrow representations of decimal columns (#7755) @vyasr
Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
Implement scatter for struct columns (#7752) @ttnghia
Fix data corruption in string columns (#7746) @galipremsagar
Fix string length in stripe dictionary building (#7744) @kaatish
Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
Fix dictionary size computation in ORC writer (#7737) @vuule
Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
Disable column_view data accessors for unsupported types (#7725) @jrhemstad
Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
Don't identify decimals as strings. (#7710) @vyasr
Fix return type of DataFrame.argsort (#7706) @galipremsagar
Fix/correct cudf installed package requirements (#7688) @robertmaynard
Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
Fix Java Parquet write after writer API changes (#7655) @revans2
Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
Fix internal compiler error during JNI Docker build (#7645) @jlowe
Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
Fix specifying GPU architecture in JNI build (#7612) @jlowe
Fix ORC writer OOM issue (#7605) @vuule
Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
Fix missing Dask imports (#7580) @kkraus14
CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
Fix ORC writer output corruption with string columns (#7565) @vuule
Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
FIX Fix Anaconda upload args (#7558) @dillon-cullinan
Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
Update missing docstring examples in python public APIs (#7546) @galipremsagar
Decimal32 Build Fix (#7544) @razajafri
FIX Retry conda output location (#7540) @dillon-cullinan
fix missing renames of dask git branches from master to main (#7535) @kkraus14
Remove detail from device_span (#7533) @rwlee
Change dask and distributed branch to main (#7532) @dantegd
Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
Change jit launch to safe_launch (#7510) @devavret
Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
Correctly compile benchmarks (#7485) @robertmaynard
Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
Fix __repr__ for categorical dtype (#7476) @galipremsagar
Java cleaner synchronization (#7474) @abellina
Fix java float/double parsing tests (#7473) @revans2
Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
fix cuFile JNI compile errors (#7445) @rongou
Support Series.__setitem__ with key to a new row (#7443) @isVoid
Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
fix Arrow CMake file (#7358) @rongou
Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

📖 Documentation

Fix join API doxygen (#7890) @shwina
Add Resources to README. (#7697) @bdice
Add isin examples in Docstring (#7479) @galipremsagar
Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
Fix typo in regex.md doc page (#7363) @davidwendt
Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

🚀 New Features

Enable basic reductions for decimal columns (#7776) @ChrisJar
Enable join on decimal columns (#7764) @ChrisJar
Allow merging index column with data column using keyword "on" (#7736) @skirui-source
Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
Add support for unique groupby aggregation (#7726) @shwina
Expose libcudf's label_bins function to cudf (#7724) @vyasr
Adding support for equi-join on struct (#7720) @hyperbolic2346
Add decimal column comparison operations (#7716) @isVoid
Implement scan operations for decimal columns (#7707) @ChrisJar
Enable typecasting between decimal and int (#7691) @ChrisJar
Enable decimal support in parquet writer (#7673) @devavret
Adds list.unique API (#7664) @isVoid
Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
Add lists.sort_values API (#7657) @isVoid
Add is_integer API that can check for the validity of...

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Releases: rapidsai/cudf

v21.12.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

v21.10.01

v21.10.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v21.08.03

v21.08.02

v21.08.01

v21.08.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v21.06.01

v21.06.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

v0.19.2

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features