Skip to content

Conversation

@vyasr
Copy link
Contributor

@vyasr vyasr commented Jan 21, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

vyasr and others added 18 commits January 20, 2026 12:53
IntervalDtype now inherits from _BaseDtype instead of StructDtype.
This change includes:
- Updated class declaration
- Store _fields directly in __init__ instead of calling super().__init__()
- Added @Property fields that returns the stored _fields dict
- Added @Property type that returns pd.Interval
- Added @cached_property itemsize that computes size from fields
- Removed outdated comment about subclassing StructDtype

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
The _recursively_replace_fields method converts dict with numeric/string keys
to {"left": ..., "right": ...} format. This is needed when results come from
pylibcudf without preserved nested field names.

The method:
- Converts dict keys (0, 1 or "0", "1") to proper field names ("left", "right")
- Handles nested StructDtype recursively
- Handles nested ListDtype recursively

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Update helper functions to handle IntervalDtype properly:
- recursively_update_struct_names: Add handling for IntervalDtype to recursively
  update nested struct/list subtypes
- _dtype_to_metadata: Add handling for IntervalDtype to generate proper
  ColumnMetadata for arrow conversion

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
IntervalColumn should explicitly raise NotImplementedError for
__cuda_array_interface__ since intervals are structured types
that cannot be represented as contiguous arrays. This matches
the behavior of StructColumn and prevents incorrect usage.
When passing a list of pandas Interval objects to as_column (e.g., via
IntervalIndex constructor), PyArrow cannot infer the interval type from
the raw list. This commit adds pd.Interval to the special case handling
that converts to pandas Series first, similar to pd.Timestamp and
pd.Timedelta.

The fix ensures that IntervalIndex([pd.Interval(0, 1)]) now correctly
creates an IntervalColumn instead of a StructColumn.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
IntervalColumn now handles its own metadata via its own _with_type_metadata
method, so this special case is no longer needed in StructColumn.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Now that IntervalDtype no longer inherits from StructDtype, we can use
isinstance(dtype, StructDtype) instead of type(dtype) is not StructDtype.
The exact type check was only needed to exclude the IntervalDtype subclass.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
StructColumn should create a regular Index, not IntervalIndex.
Before the split, IntervalColumn was a subclass of StructColumn,
so the tuple (IntervalColumn, StructColumn) was correct. Now that
they are independent sibling classes, they need separate handling.
Add proper handling for pandas IntervalDtype in as_column to ensure
it returns an IntervalColumn instead of a StructColumn.

Changes:
- In as_column: When converting pandas Series with IntervalDtype,
  wrap the result using _with_type_metadata with IntervalDtype
- In StructColumn._with_type_metadata: Add dispatch logic to convert
  to IntervalColumn when given an IntervalDtype, similar to how
  IntervalColumn.from_arrow works

This ensures that IntervalIndex constructor receives an IntervalColumn
as expected, fixing the "data must be an iterable of Interval data" error.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
IntervalDtype is stored as STRUCT at the pylibcudf level,
so dtype_to_pylibcudf_type should return plc.TypeId.STRUCT
for IntervalDtype, just like it does for StructDtype.
IntervalDtype has children (left and right fields) like StructDtype,
so it needs the same special handling in column_empty to create
child columns rather than trying to create a scalar.
When gathering columns (used by iloc, loc, etc.), apply type metadata
to ensure the original column type is preserved. This is critical for
IntervalColumn which is stored as STRUCT at the libcudf level but needs
to be reconstituted as IntervalColumn with IntervalDtype after gather.

Without this fix, operations like series.iloc[0] on interval series would
return StructColumn instead of preserving the IntervalColumn type.
…ture

- Remove _apply_child_metadata method from IntervalColumn
- Update _get_sliced_child to use plc_column.num_children()
- Update _with_type_metadata to use plc_column.children()
- Remove children= parameter from _from_preprocessed calls
- RESTORE IntervalDtype handling in StructColumn._with_type_metadata
  (needed for construction path: StructColumn -> IntervalColumn)
- Add strict=True to zip() call

The key insight is that StructColumn must retain the ability to convert
itself to IntervalColumn via _with_type_metadata(IntervalDtype), even
though IntervalColumn no longer inherits from StructColumn. This is
because the construction path creates StructColumn first (from Arrow
interval storage), then converts to IntervalColumn.

All 2,711 interval tests now pass.
This method is not needed because:
- IntervalColumn no longer inherits from StructColumn
- The struct accessor only registers for StructDtype, not IntervalDtype
- IntervalColumn's left/right properties directly access plc_column.children()

All 2,711 interval tests still pass.
@vyasr vyasr self-assigned this Jan 21, 2026
@vyasr vyasr requested a review from a team as a code owner January 21, 2026 07:42
@vyasr vyasr added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 21, 2026
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jan 21, 2026
@GPUtester GPUtester moved this to In Progress in cuDF Python Jan 21, 2026
When finding a common type between two IntervalDtype with different subtypes
(e.g., interval[int64, left] and interval[float64, left]), find_common_type
was falling back to object dtype because numpy doesn't know about IntervalDtype.

Now, when all dtypes are IntervalDtype with the same closed parameter, we:
1. Find the common type of their subtypes
2. Return IntervalDtype(common_subtype, closed)

This fixes union() operations on empty IntervalIndexes with different subtypes,
which was returning Index with dtype='object' instead of IntervalIndex.

Fixes pandas compatibility test:
tests/indexes/interval/test_setops.py::TestIntervalIndex::test_union_empty_result
Copy link
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1 with this decoupling

# Wrap StructColumn as IntervalColumn with proper metadata
result = result._with_type_metadata(
IntervalDtype(
subtype=cudf.dtype(arbitrary.dtype.subtype),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: cudf.dtype will be called in the IntervalDtype constructor as it would be nice to have less places use cudf.dtype

)
# For pandas dtypes, store them directly in the column's dtype property
elif isinstance(dtype, pd.ArrowDtype) and isinstance(
dtype.pyarrow_dtype, pa.lib.StructType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably More Correct if dtype.pyarrow_dtype was an instance of ArrowIntervalType (we have similar handling for this in ColumnBase.from_arrow)

# Check IntervalDtype first because it's a subclass of StructDtype
if isinstance(dtype, IntervalDtype):
# TODO: Rewrite this to avoid needing to round-trip via ColumnBase
# Dispatch to IntervalColumn when given IntervalDtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this entire branch just call return IntervalColumn._with_type_metadata?

vyasr added 2 commits January 21, 2026 12:58
1. Remove redundant cudf.dtype() call in column.py
   - IntervalDtype constructor already calls cudf.dtype() internally

2. Check for ArrowIntervalType instead of pa.lib.StructType in interval.py
   - More correct type check following pattern in ColumnBase.from_arrow

3. Simplify IntervalDtype handling in struct.py _with_type_metadata
   - Instead of manually reconstructing IntervalColumn, create it with
     current dtype and let IntervalColumn._with_type_metadata handle
     the child conversion
   - Reduces code duplication

All 2,711 interval tests pass.
@github-actions github-actions bot added the cudf.pandas Issues specific to cudf.pandas label Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudf.pandas Issues specific to cudf.pandas improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants