Releases: rapidsai/cudf
Releases · rapidsai/cudf
v21.12.00
🚨 Breaking Changes
- Update
bitmask_and
andbitmask_or
to return a pair of resulting mask and count of unset bits (#9616) @PointKernel - Remove sizeof and standardize on memory_usage (#9544) @vyasr
- Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
- Refactor sorting APIs (#9464) @vyasr
- Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
- Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
- JNI: Support nested types in ORC writer (#9334) @firestarman
- Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
- Refactor cuIO timestamp processing with
cuda::std::chrono
(#9278) @PointKernel - Various internal MultiIndex improvements (#9243) @vyasr
🐛 Bug Fixes
- Fix read_parquet bug for bytes input (#9669) @rjzamora
- Use
_gather
internal forsort_*
(#9668) @isVoid - Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
- Dont recompute output size if it is already available (#9649) @abellina
- Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
- add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
- Fix debrotli issue on CUDA 11.5 (#9632) @vuule
- Use std::size_t when computing join output size (#9626) @jlowe
- Fix
usecols
parameter handling indask_cudf.read_csv
(#9618) @galipremsagar - Add support for string
'nan', 'inf' & '-inf'
values while type-casting tofloat
(#9613) @galipremsagar - Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
- Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
- Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
- Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
- Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
- Fix pytests failing in
cuda-11.5
environment (#9547) @galipremsagar - compile libnvcomp with PTDS if requested (#9540) @jbrennan333
- Fix
segmented_gather()
for null LIST rows (#9537) @mythrocks - Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
- Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
- Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
- Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
- Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
- Make sure all dask-cudf supported aggs are handled in
_tree_node_agg
(#9487) @charlesbluca - Resolve
hash_columns
FutureWarning
indask_cudf
(#9481) @pentschev - Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
- Fix regex handling of embedded null characters (#9470) @davidwendt
- Fix memcheck error in copy-if-else (#9467) @davidwendt
- Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
- Preserve the decimal scale when creating a default scalar (#9449) @revans2
- Push down parent nulls when flattening nested columns. (#9443) @mythrocks
- Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
- Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
- Allow int-like objects for the
decimals
argument inround
(#9428) @shwina - Fix stream compaction's
drop_duplicates
API to use stable sort (#9417) @ttnghia - Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
- Fix
StructColumn.to_pandas
type handling issues (#9388) @galipremsagar - Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
- Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
- Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
- Fix the crash in stats code (#9368) @devavret
- Make Series.hash_encode results reproducible. (#9366) @bdice
- Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
- Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
- Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
- Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
- Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
- Optimizations for
cudf.concat
whenaxis=1
(#9333) @galipremsagar - Use f-string in join helper warning message. (#9325) @bdice
- Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
- Fix null count in statistics for parquet (#9303) @devavret
- Potential overflow of
decimal32
when casting toint64_t
(#9287) @codereport - Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
- Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
- Implement
one_hot_encoding
in libcudf and bind to python (#9229) @isVoid - BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source
📖 Documentation
- Update Documentation to use
TYPED_TEST_SUITE
(#9654) @codereport - Add dedicated page for
StringHandling
in python docs (#9624) @galipremsagar - Update docstring of
DataFrame.merge
(#9572) @galipremsagar - Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
- Add example to docstrings in
rolling.apply
(#9522) @isVoid - Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
- Improve Python docstring formatting. (#9493) @bdice
- Update table of I/O supported types (#9476) @vuule
- Document invalid regex patterns as undefined behavior (#9473) @davidwendt
- Miscellaneous documentation fixes to
cudf
(#9471) @galipremsagar - Fix many documentation errors in libcudf. (#9355) @karthikeyann
- Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
- Improved deprecation warnings. (#9347) @bdice
- doc reorder mr, stream to stream, mr (#9308) @karthikeyann
- Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
- Added deprecation warning for
.label_encoding()
(#9289) @mayankanand007
🚀 New Features
- Enable Series.divide and DataFrame.divide (#9630) @vyasr
- Update
bitmask_and
andbitmask_or
to return a pair of resulting mask and count of unset bits (#9616) @PointKernel - Add handling of mixed numeric types in
to_dlpack
(#9585) @galipremsagar - Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
- Add JNI for
lists::drop_list_duplicates
with keys-values input column (#9553) @ttnghia - Support structs column in
min
,max
,argmin
andargmax
groupby aggregate() and scan() (#9545) @ttnghia - Move libcudacxx to use
rapids_cpm
and use newer versions (#9539) @robertmaynard - Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
- Support
args=
inapply
(#9514) @brandon-b-miller - Add groupby scan min/max support for strings values (#9502) @davidwendt
- Add list output option to character_ngrams() function (#9499) @davidwendt
- More granular column selection in ORC reader (#9496) @vuule
- add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
- Implement Series.datetime.floor (#9488) @skirui-source
- Enable linting of CMake files using pre-commit (#9484) @vyasr
- Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
- Augment
order_by
to Accept a List ofnull_precedence
(#9455) @isVoid - Add format API for list column of strings (#9454) @davidwendt
- Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
- Add cudf python groupby.diff (#9446) @karthikeyann
- Implement
lists::stable_sort_lists
for stable sorting of elements within each row of lists column (#9425) @ttnghia - add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
- Support Unary Operations in Masked UDF (#9409) @isVoid
- Move Several Series Function to Frame (#9394) @isVoid
- MD5 Python hash API (#9390) @bdice
- Add cudf strings is_title API (#9380) @davidwendt
- Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
- Add support for writing ORC with map columns (#9369) @vuule
- extract_list_elements() with column_view indices (#9367) @mythrocks
- Reimplement
lists::drop_list_duplicates
for keys-values lists columns (#9345) @ttnghia - Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
- JNI: Support nested types in ORC writer (#9334) @firestarman
- Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
- Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
- Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
- Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
- Add
na_position
param to dask-cudfsort_values
(#9264) @charlesbluca - Add
ascending
parameter for dask-cudfsort_values
(#9250) @charlesbluca - New array conversion methods (#9236) @vyasr
- Series
apply
method backed by masked UDFs (#9217) @brandon-b-miller - Grouping by frequency and resampling (#9178) @shwina
- Pure-python masked UDFs (#9174) @brandon-b-miller
- Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
- Add
calendrical_month_sequence
in c++ anddate_range
in python (#8886) @shwina
🛠️ Improvements
- Followup to PR 9088 comments (#9659) @cwharris
- Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
- Add
11.5
dev.yml tocudf
(#9617) @galipremsagar - Add
xfail
for parquet reader11.5
issue (#9612) @galipremsagar - remove deprecated Rmm.initialize method (#9607) @rongou
- Use HostColumnVectorCore for ch...
v21.10.01
v21.10.00
🚨 Breaking Changes
- Remove Cython APIs for table view generation (#9199) @vyasr
- Upgrade
pandas
version incudf
(#9147) @galipremsagar - Make AST operators nullable (#9096) @vyasr
- Remove the option to pass data types as strings to
read_csv
andread_json
(#9079) @vuule - Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
- Support additional format specifiers in from_timestamps (#9047) @davidwendt
- Expose expression base class publicly and simplify public AST API (#9045) @vyasr
- Add support for struct type in ORC writer (#9025) @vuule
- Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
- Java bindings for conditional join output sizes (#9002) @jlowe
- Move compute_column API out of ast namespace (#8957) @vyasr
cudf.dtype
function (#8949) @shwina- Refactor Frame reductions (#8944) @vyasr
- Add nested column selection to parquet reader (#8933) @devavret
- JNI Aggregation Type Changes (#8919) @revans2
- Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
- Expand CSV and JSON reader APIs to accept
dtypes
as a vector or map ofdata_type
objects (#8856) @vuule - Change cudf docs theme to pydata theme (#8746) @galipremsagar
- Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
- Make groupby transform-like op order match original data order (#8720) @isVoid
🐛 Bug Fixes
fixed_point
cudf::groupby
formean
aggregation (#9296) @codereport- Fix
interleave_columns
when the input string lists column having empty child column (#9292) @ttnghia - Update nvcomp to include fixes for installation of headers (#9276) @devavret
- Fix Java column leak in testParquetWriteMap (#9271) @jlowe
- Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
- Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
- Fix duplicate names issue in
MultiIndex.deserialize
(#9258) @galipremsagar Dataframe.sort_index
optimizations (#9238) @galipremsagar- Temporarily disabling problematic test in parquet writer (#9230) @devavret
- Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
- Fix
gather
for sliced input structs column (#9218) @ttnghia - Fix JNI code for left semi and anti joins (#9207) @jlowe
- Only install thrust when using a non 'system' version (#9206) @robertmaynard
- Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
- Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
- Fix
gather()
forSTRUCT
inputs with no nulls in members. (#9194) @mythrocks - get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
- rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
- Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
- Add handling for nulls in
dask_cudf.sorting.quantile_divisions
(#9171) @charlesbluca - Approximate overflow detection in ORC statistics (#9163) @vuule
- Use decimal precision metadata when reading from parquet files (#9162) @shwina
- Fix variable name in Java build script (#9161) @jlowe
- Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
- Fix conditional joins with empty left table (#9146) @vyasr
- Fix joining on indexes with duplicate level names (#9137) @shwina
- Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
- Apply type metadata after column is slice-copied (#9131) @isVoid
- Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
- Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
- Support null literals in expressions (#9117) @vyasr
- Fix cudf::hash_join output size for struct joins (#9107) @jlowe
- Import fix (#9104) @shwina
- Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
- Fix branch_stack calculation in
row_bit_count()
(#9076) @mythrocks - Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
- Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
- Preserve float16 upscaling (#9069) @galipremsagar
- Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
- Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
- Various multiindex related fixes (#9036) @shwina
- Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
- Add support for percentile dispatch in
dask_cudf
(#9031) @galipremsagar - cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
- Fetch correct grouping keys
agg
of dask groupby (#9022) @galipremsagar - Allow
where()
to work with a Series andother=cudf.NA
(#9019) @sarahyurick - Use correct index when returning Series from
GroupBy.apply()
(#9016) @charlesbluca - Fix
Dataframe
indexer setitem when array is passed (#9006) @galipremsagar - Fix ORC reading of files with struct columns that have null values (#9005) @vuule
- Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
- Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
- Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
- Fix debug compile error for csv_test.cpp (#8981) @davidwendt
- Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
- Fix concatenation of
cudf.RangeIndex
(#8970) @galipremsagar - Java conditional joins should not require matching column counts (#8955) @jlowe
- Fix concatenate empty structs (#8947) @sperlingxx
- Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
- Apply series name to result of
SeriesGroupby.apply()
(#8939) @charlesbluca cdef packed_columns
ascppclass
instead ofstruct
(#8936) @charlesbluca- Inserting a
cudf.NA
into a DataFrame (#8923) @sarahyurick - Support casting with Pandas dtype aliases (#8920) @sarahyurick
- Allow
sort_values
to accept samekind
values as Pandas (#8912) @sarahyurick - Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
- Fix libcudf memory errors (#8884) @karthikeyann
- Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
- replace auto with auto& ref for cast<&> (#8866) @karthikeyann
- Add missing include<optional> in binops (#8864) @karthikeyann
- Fix
select_dtypes
to work when non-class dtypes present in dataframe (#8849) @sarahyurick - Re-enable JSON tests (#8843) @vuule
- Support header with embedded delimiter in csv writer (#8798) @davidwendt
📖 Documentation
- Add IO docs page in
cudf
documentation (#9145) @galipremsagar - use correct namespace in cuio code examples (#9037) @cwharris
- Restructuring
Contributing doc
(#9026) @iskode - Update stable version in readme (#9008) @galipremsagar
- Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
- Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
- List GDS-enabled formats in the docs (#8805) @vuule
- Change cudf docs theme to pydata theme (#8746) @galipremsagar
🚀 New Features
- Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
- Align
DataFrame.apply
signature with pandas (#9275) @brandon-b-miller - Add struct type support for
drop_list_duplicates
(#9202) @ttnghia - support CUDA async memory resource in JNI (#9201) @rongou
- Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
- Superimpose null masks for STRUCT columns. (#9144) @mythrocks
- Implemented bindings for
ceil
timestamp operation (#9141) @shaneding - Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
- Implement
interleave_columns
for lists with arbitrary nested type (#9130) @ttnghia - Add python bindings to fixed-size window and groupby
rolling.var
,rolling.std
(#9097) @isVoid - Make AST operators nullable (#9096) @vyasr
- Java bindings for approx_percentile (#9094) @andygrove
- Add
dseries.struct.explode
(#9086) @isVoid - Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
- Remove the option to pass data types as strings to
read_csv
andread_json
(#9079) @vuule - Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
- Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
- Support nested types for nth_element reduction (#9043) @sperlingxx
- Update sort groupby to use non-atomic operation (#9035) @karthikeyann
- Add support for struct type in ORC writer (#9025) @vuule
- Implement
interleave_columns
for structs columns (#9012) @ttnghia - Add groupby first and last aggregations (#9004) @shwina
- Add
DecimalBaseColumn
and moveas_decimal_column
(#9001) @isVoid - Python/Cython bindings for multibyte_split (#8998) @jdye64
- Support scalar
months
inadd_calendrical_months
, extends API to INT32 support (#8991) @isVoid - Added Series.dt.is_month_end (#8989) @TravisHester
- Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
- Support "unflatten" of columns flattened via
flatten_nested_columns()
: (#8956) @mythrocks - Implement timestamp ceil (#8942) @shaneding
- Add nested column selection to parquet reader (#8933) @devavret
- Expose conditional join size calculation (#8928) @vyasr
- Support Nulls in Timeseries Generator (#8925) @isVoid
- Avoid index equality check in
_CPackedColumns.from_py_table()
(#8917) @charlesbluca - Add dot product binary op (#8909) @charlesbluca
- Expose
days_in_month
function in libcudf and add python bindings (#8892) @isVoid - Series string repeat (#8882) @sarahyurick
- Python binding for quarters (#8862) @shaneding
- Expand CSV and JSON reader APIs to accept
dtypes
as a vector or map ofdata_type
objects (#8856) @vuule - Add Java bindings for AST ...
v21.08.03
v21.08.02
v21.08.01
v21.08.00
🚨 Breaking Changes
- Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
- Remove unused cudf::strings::create_offsets (#8663) @davidwendt
- Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
- Change default datetime index resolution to ns to match pandas (#8611) @vyasr
- Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
- Add
strings::repeat_strings
API that can repeat each string a different number of times (#8561) @ttnghia - String-to-boolean conversion is different from Pandas (#8549) @skirui-source
- Add accurate hash join size functions (#8453) @PointKernel
- Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
- Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
- Adapt
cudf::scalar
classes to changes inrmm::device_scalar
(#8411) @harrism - Remove special Index class from the general index class hierarchy (#8309) @vyasr
- Add first-class dtype utilities (#8308) @vyasr
- ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
- Upgrade arrow to 4.0.1 (#7495) @galipremsagar
🐛 Bug Fixes
- Fix
contains
check in string column (#8834) @galipremsagar - Remove unused variable from
row_bit_count_test
. (#8829) @mythrocks - Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
- Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
- Handle empty child columns in row_bit_count() (#8791) @mythrocks
- Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
- Fix isort error in utils.pyx (#8771) @charlesbluca
- Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
- Fix issues with
_CPackedColumns.serialize()
handling of host and device data (#8759) @charlesbluca - Fix issues with
MultiIndex
indropna
,stack
&reset_index
(#8753) @galipremsagar - Write pandas extension types to parquet file metadata (#8749) @devavret
- Fix
where
to handleDataFrame
&Series
input combination (#8747) @galipremsagar - Fix
replace
to handle null values correctly (#8744) @galipremsagar - Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
- Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
- Fix
cudf.Series
constructor to handle list of sequences (#8735) @galipremsagar - Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
- Fix orc reader assert on create data_type in debug (#8706) @davidwendt
- Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
- JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
- Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
- Bug fix:
replace_nulls_policy
functor not returning correct indices for gathermap (#8699) @isVoid - Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
- Add post-processing steps to
dask_cudf.groupby.CudfSeriesGroupby.aggregate
(#8694) @charlesbluca - JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
- Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
- Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
- Pin
*arrow
to use*cuda
inrun
(#8651) @jakirkham - Add proper support for tolerances in testing methods. (#8649) @vyasr
- Support multi-char case conversion in capitalize function (#8647) @davidwendt
- Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
- Temporarily disable libcudf example build tests (#8642) @isVoid
- Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
- Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
- Fix bug that columns only initialized once when specified
columns
andindex
in dataframe ctor (#8628) @isVoid - Propagate **kwargs through to as_*_column methods (#8618) @shwina
- Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
- Fix missed renumbering of Aggregation values (#8600) @revans2
- Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
- Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
- Apply metadata to keys before returning in
Frame._encode
(#8560) @charlesbluca - Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
- Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
- String-to-boolean conversion is different from Pandas (#8549) @skirui-source
- Fix
__repr__
output withdisplay.max_rows
isNone
(#8547) @galipremsagar - Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
- Properly retrieve last column when
-1
is specified for column index (#8529) @isVoid - Fix importing
apply
fromdask
(#8517) @galipremsagar - Fix offset of the string dictionary length stream (#8515) @vuule
- Fix double counting of selected columns in CSV reader (#8508) @ochan1
- Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
- replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
- Disallow groupby aggs for
StructColumns
(#8499) @charlesbluca - Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
- Adding support for writing empty dataframe (#8490) @shaneding
- Fix exclusive scan when including nulls and improve testing (#8478) @harrism
- Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
- Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
- Add nightly version for ucx-py in ci script (#8419) @galipremsagar
- Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
- CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
- Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
- Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
- Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
- BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
- Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
- Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca
📖 Documentation
- Update Python UDFs notebook (#8810) @brandon-b-miller
- Fix dask.dataframe API docs links after reorg (#8772) @jsignell
- Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
- Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
- Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
- Custom Sphinx Extension:
PandasCompat
(#8643) @isVoid - Fix README.md (#8535) @ajschmidt8
- Change namespace contains_nulls to struct (#8523) @davidwendt
- Add info about NVTX ranges to dev guide (#8461) @jrhemstad
- Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar
🚀 New Features
- Fix concatenating structs (#8811) @shaneding
- Implement JNI for groupby aggregations
M2
andMERGE_M2
(#8763) @ttnghia - Bump
isort
to5.6.4
and removeisort
overrides made for 5.0.7 (#8755) @charlesbluca - Implement
__setitem__
forStructColumn
(#8737) @shaneding - Add
is_leap_year
toDateTimeProperties
andDatetimeIndex
(#8736) @isVoid - Add
struct.explode()
method (#8729) @shwina - Add
DataFrame.to_struct()
method to convert a DataFrame to a struct Series (#8728) @shwina - Add support for list type in ORC writer (#8723) @vuule
- Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
- Add
datetime::is_leap_year
(#8711) @isVoid - Accessing struct columns from
dask_cudf
(#8675) @shaneding - Added pct_change to Series (#8650) @TravisHester
- Add strings support to cudf::shift function (#8648) @davidwendt
- Support Scatter
struct_scalar
(#8630) @isVoid - Struct scalar from host dictionary (#8629) @shaneding
- Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
- JNI support for capitalize (#8624) @firestarman
- Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
- Add NVBench in CMake (#8619) @PointKernel
- Change default datetime index resolution to ns to match pandas (#8611) @vyasr
- ListColumn
__setitem__
(#8606) @brandon-b-miller - Implement groupby aggregations
M2
andMERGE_M2
(#8605) @ttnghia - Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
- Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
- Benchmark for
strings::repeat_strings
APIs (#8589) @ttnghia - Nested scalar support for copy if else (#8588) @gerashegalov
- User specified decimal columns to float64 (#8587) @jdye64
- Add
get_element
for struct column (#8578) @isVoid - Python changes for adding
__getitem__
forstruct
(#8577) @shaneding - Add
strings::repeat_strings
API that can repeat each string a different number of times (#8561) @ttnghia - Refactor
tests/iterator_utilities.hpp
functions (#8540) @ttnghia - Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
- Decimal support csv reader (#8511) @elstehle
- Add column type tests (#8505) @isVoid
- Warn when downscaling decimal columns (#8492) @ChrisJar
- Add JNI for
strings::repeat_strings
(#8491) @ttnghia - Add
Index.get_loc
for Numerical, String Index support (#8489) @isVoid - Expose half_up rounding in cuDF (#8477) @shwina
- Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
- Add
str.edit_distance_matrix
(#8463) @isVoid - Support const...
v21.06.01
v21.06.00
🚨 Breaking Changes
- Add support for
make_meta_obj
dispatch indask-cudf
(#8342) @galipremsagar - Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
- Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
- Update ORC statistics API to use C++17 standard library (#8241) @vuule
- Preserve column hierarchy when getting NULL row from
LIST
column (#8206) @isVoid Groupby.shift
c++ API refactor and python binding (#8131) @isVoid
🐛 Bug Fixes
- Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
- Compilation fix: Remove redefinition for
std::is_same_v()
(#8369) @mythrocks - Add backward compatibility for
dask-cudf
to work with other versions ofdask
(#8368) @galipremsagar - Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
- Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
- Raise error when unsupported arguments are passed to
dask_cudf.DataFrame.sort_values
(#8349) @galipremsagar - Raise
NotImplementedError
for axis=1 inrank
(#8347) @galipremsagar - Add support for
make_meta_obj
dispatch indask-cudf
(#8342) @galipremsagar - Update Java string concatenate test for single column (#8330) @tgravescs
- Use empty_like in scatter (#8314) @revans2
- Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
- Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
- COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
- Update io util to convert path like object to string (#8275) @ayushdg
- Fix result column types for empty inputs to rolling window (#8274) @mythrocks
- Actually test equality in assert_groupby_results_equal (#8272) @shwina
- CMake always explicitly specify a source files extension (#8270) @robertmaynard
- Fix struct binary search and struct flattening (#8268) @ttnghia
- Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
- upgrade dlpack to 0.5 (#8262) @cwharris
- Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
- Fix incorrect assertion in Java concat (#8258) @sperlingxx
- Copy nested types upon construction (#8244) @isVoid
- Preserve column hierarchy when getting NULL row from
LIST
column (#8206) @isVoid - Clip decimal binary op precision at max precision (#8194) @ChrisJar
📖 Documentation
- Add docstring for
dask_cudf.read_csv
(#8355) @galipremsagar - Fix cudf release version in readme (#8331) @galipremsagar
- Fix structs column description in dev docs (#8318) @isVoid
- Update readme with correct CUDA versions (#8315) @raydouglass
- Add description of the cuIO GDS integration (#8293) @vuule
- Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard
🚀 New Features
- Add support merging b/w categorical data (#8332) @galipremsagar
- Java: Support struct scalar (#8327) @sperlingxx
- added _is_homogeneous property (#8299) @shaneding
- Added decimal writing for CSV writer (#8296) @kaatish
- Java: Support creating a scalar from utf8 string (#8294) @firestarman
- Add Java API for Concatenate strings with separator (#8289) @tgravescs
strings::join_list_elements
options for empty list inputs (#8285) @ttnghia- Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
- add unit tests for lead/lag on list for row window (#8259) @wbo4958
- Create a String column from UTF8 String byte arrays (#8257) @firestarman
- Support scattering
list_scalar
(#8256) @isVoid - Implement
lists::concatenate_list_elements
(#8231) @ttnghia - Support for struct scalars. (#8220) @nvdbaranec
- Add support for decimal types in ORC writer (#8198) @vuule
- Support create lists column from a
list_scalar
(#8185) @isVoid Groupby.shift
c++ API refactor and python binding (#8131) @isVoid- Add
groupby::replace_nulls(replace_policy)
api (#7118) @isVoid
🛠️ Improvements
- Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
- Add aliases for string methods (#8353) @shwina
- Update environment variable used to determine
cuda_version
(#8321) @ajschmidt8 - JNI: Refactor the code of making column from scalar (#8310) @firestarman
- Update
CHANGELOG.md
links for calver (#8303) @ajschmidt8 - Merge
branch-0.19
intobranch-21.06
(#8302) @ajschmidt8 - use address and length for GDS reads/writes (#8301) @rongou
- Update cudfjni version to 21.06.0 (#8292) @pxLi
- Update docs build script (#8284) @ajschmidt8
- Make device_buffer streams explicit and enforce move construction (#8280) @harrism
- Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
- Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
- Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
- Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
- Update cudfjni version to 21.06 (#8267) @pxLi
- support RMM aligned resource adapter in JNI (#8266) @rongou
- Pass compiler environment variables to conda python build (#8260) @Ethyling
- Remove abc inheritance from Serializable (#8254) @vyasr
- Move more methods into SingleColumnFrame (#8253) @vyasr
- Update ORC statistics API to use C++17 standard library (#8241) @vuule
- Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
- Correct unused parameters in the copying algorithms (#8232) @robertmaynard
- IO statistics cleanup (#8191) @kaatish
- Refactor of rolling_window implementation. (#8158) @nvdbaranec
- Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
- Column refactoring 2 (#8130) @vyasr
- support space in workspace (#7956) @jolorunyomi
- Support collect_set on rolling window (#7881) @sperlingxx
v0.19.2
🚨 Breaking Changes
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
🐛 Bug Fixes
- unsnap: busy wait a number of cycles (#8073) @vuule
- Fix returned column type when extracting from an empty list column (#8031) @jlowe
- Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
📖 Documentation
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
🚀 New Features
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of...