Removal IntervalDtype/StructDtype inheritance #21114

vyasr · 2026-01-21T07:42:16Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

IntervalDtype now inherits from _BaseDtype instead of StructDtype. This change includes: - Updated class declaration - Store _fields directly in __init__ instead of calling super().__init__() - Added @Property fields that returns the stored _fields dict - Added @Property type that returns pd.Interval - Added @cached_property itemsize that computes size from fields - Removed outdated comment about subclassing StructDtype Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

The _recursively_replace_fields method converts dict with numeric/string keys to {"left": ..., "right": ...} format. This is needed when results come from pylibcudf without preserved nested field names. The method: - Converts dict keys (0, 1 or "0", "1") to proper field names ("left", "right") - Handles nested StructDtype recursively - Handles nested ListDtype recursively Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Update helper functions to handle IntervalDtype properly: - recursively_update_struct_names: Add handling for IntervalDtype to recursively update nested struct/list subtypes - _dtype_to_metadata: Add handling for IntervalDtype to generate proper ColumnMetadata for arrow conversion Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

IntervalColumn should explicitly raise NotImplementedError for __cuda_array_interface__ since intervals are structured types that cannot be represented as contiguous arrays. This matches the behavior of StructColumn and prevents incorrect usage.

When passing a list of pandas Interval objects to as_column (e.g., via IntervalIndex constructor), PyArrow cannot infer the interval type from the raw list. This commit adds pd.Interval to the special case handling that converts to pandas Series first, similar to pd.Timestamp and pd.Timedelta. The fix ensures that IntervalIndex([pd.Interval(0, 1)]) now correctly creates an IntervalColumn instead of a StructColumn. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

IntervalColumn now handles its own metadata via its own _with_type_metadata method, so this special case is no longer needed in StructColumn. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Now that IntervalDtype no longer inherits from StructDtype, we can use isinstance(dtype, StructDtype) instead of type(dtype) is not StructDtype. The exact type check was only needed to exclude the IntervalDtype subclass. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

StructColumn should create a regular Index, not IntervalIndex. Before the split, IntervalColumn was a subclass of StructColumn, so the tuple (IntervalColumn, StructColumn) was correct. Now that they are independent sibling classes, they need separate handling.

Add proper handling for pandas IntervalDtype in as_column to ensure it returns an IntervalColumn instead of a StructColumn. Changes: - In as_column: When converting pandas Series with IntervalDtype, wrap the result using _with_type_metadata with IntervalDtype - In StructColumn._with_type_metadata: Add dispatch logic to convert to IntervalColumn when given an IntervalDtype, similar to how IntervalColumn.from_arrow works This ensures that IntervalIndex constructor receives an IntervalColumn as expected, fixing the "data must be an iterable of Interval data" error. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

IntervalDtype is stored as STRUCT at the pylibcudf level, so dtype_to_pylibcudf_type should return plc.TypeId.STRUCT for IntervalDtype, just like it does for StructDtype.

IntervalDtype has children (left and right fields) like StructDtype, so it needs the same special handling in column_empty to create child columns rather than trying to create a scalar.

When gathering columns (used by iloc, loc, etc.), apply type metadata to ensure the original column type is preserved. This is critical for IntervalColumn which is stored as STRUCT at the libcudf level but needs to be reconstituted as IntervalColumn with IntervalDtype after gather. Without this fix, operations like series.iloc[0] on interval series would return StructColumn instead of preserving the IntervalColumn type.

…ture - Remove _apply_child_metadata method from IntervalColumn - Update _get_sliced_child to use plc_column.num_children() - Update _with_type_metadata to use plc_column.children() - Remove children= parameter from _from_preprocessed calls - RESTORE IntervalDtype handling in StructColumn._with_type_metadata (needed for construction path: StructColumn -> IntervalColumn) - Add strict=True to zip() call The key insight is that StructColumn must retain the ability to convert itself to IntervalColumn via _with_type_metadata(IntervalDtype), even though IntervalColumn no longer inherits from StructColumn. This is because the construction path creates StructColumn first (from Arrow interval storage), then converts to IntervalColumn. All 2,711 interval tests now pass.

This method is not needed because: - IntervalColumn no longer inherits from StructColumn - The struct accessor only registers for StructDtype, not IntervalDtype - IntervalColumn's left/right properties directly access plc_column.children() All 2,711 interval tests still pass.

When finding a common type between two IntervalDtype with different subtypes (e.g., interval[int64, left] and interval[float64, left]), find_common_type was falling back to object dtype because numpy doesn't know about IntervalDtype. Now, when all dtypes are IntervalDtype with the same closed parameter, we: 1. Find the common type of their subtypes 2. Return IntervalDtype(common_subtype, closed) This fixes union() operations on empty IntervalIndexes with different subtypes, which was returning Index with dtype='object' instead of IntervalIndex. Fixes pandas compatibility test: tests/indexes/interval/test_setops.py::TestIntervalIndex::test_union_empty_result

mroeschke

Big +1 with this decoupling

mroeschke · 2026-01-21T19:41:34Z

python/cudf/cudf/core/column/column.py

+                # Wrap StructColumn as IntervalColumn with proper metadata
+                result = result._with_type_metadata(
+                    IntervalDtype(
+                        subtype=cudf.dtype(arbitrary.dtype.subtype),


Nit: cudf.dtype will be called in the IntervalDtype constructor as it would be nice to have less places use cudf.dtype

mroeschke · 2026-01-21T19:45:32Z

python/cudf/cudf/core/column/interval.py

+            )
+        # For pandas dtypes, store them directly in the column's dtype property
+        elif isinstance(dtype, pd.ArrowDtype) and isinstance(
+            dtype.pyarrow_dtype, pa.lib.StructType


It's probably More Correct if dtype.pyarrow_dtype was an instance of ArrowIntervalType (we have similar handling for this in ColumnBase.from_arrow)

mroeschke · 2026-01-21T19:50:50Z

python/cudf/cudf/core/column/struct.py

        # Check IntervalDtype first because it's a subclass of StructDtype
        if isinstance(dtype, IntervalDtype):
-            # TODO: Rewrite this to avoid needing to round-trip via ColumnBase
+            # Dispatch to IntervalColumn when given IntervalDtype


Could this entire branch just call return IntervalColumn._with_type_metadata?

1. Remove redundant cudf.dtype() call in column.py - IntervalDtype constructor already calls cudf.dtype() internally 2. Check for ArrowIntervalType instead of pa.lib.StructType in interval.py - More correct type check following pattern in ColumnBase.from_arrow 3. Simplify IntervalDtype handling in struct.py _with_type_metadata - Instead of manually reconstructing IntervalColumn, create it with current dtype and let IntervalColumn._with_type_metadata handle the child conversion - Reduces code duplication All 2,711 interval tests pass.

…rval_struct_clean

vyasr and others added 18 commits January 20, 2026 12:53

Change IntervalColumn to inherit from ColumnBase

243ae71

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Add infrastructure methods to IntervalColumn

c76e792

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Add _with_type_metadata method to IntervalColumn

f3840c6

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Fix element_indexing and imports in IntervalColumn

7c8fe07

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Remove IntervalDtype handling from StructColumn._with_type_metadata

c8e20b4

IntervalColumn now handles its own metadata via its own _with_type_metadata method, so this special case is no longer needed in StructColumn. Co-Authored-By: Claude Sonnet 4.5 <[email protected]>

Add IntervalDtype handling to dtype_to_pylibcudf_type

add9078

IntervalDtype is stored as STRUCT at the pylibcudf level, so dtype_to_pylibcudf_type should return plc.TypeId.STRUCT for IntervalDtype, just like it does for StructDtype.

Handle IntervalDtype in column_empty like StructDtype

34ed654

IntervalDtype has children (left and right fields) like StructDtype, so it needs the same special handling in column_empty to create child columns rather than trying to create a scalar.

vyasr self-assigned this Jan 21, 2026

vyasr requested a review from a team as a code owner January 21, 2026 07:42

vyasr requested review from TomAugspurger and mroeschke January 21, 2026 07:42

vyasr added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 21, 2026

github-actions bot added the Python Affects Python cuDF API. label Jan 21, 2026

github-project-automation bot added this to cuDF Python Jan 21, 2026

GPUtester moved this to In Progress in cuDF Python Jan 21, 2026

mroeschke reviewed Jan 21, 2026

View reviewed changes

Fix outstanding tests

8da1713

vyasr added 2 commits January 21, 2026 12:58

Remove now passing test

a17ad0b

github-actions bot added the cudf.pandas Issues specific to cudf.pandas label Jan 21, 2026

vyasr added 2 commits January 21, 2026 21:39

One more CoW test

825d8f4

Merge remote-tracking branch 'upstream/main' into refactor/split_inte…

fec7b1a

…rval_struct_clean

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Removal IntervalDtype/StructDtype inheritance #21114

Removal IntervalDtype/StructDtype inheritance #21114

vyasr commented Jan 21, 2026

Uh oh!

mroeschke left a comment

Uh oh!

mroeschke Jan 21, 2026

Uh oh!

mroeschke Jan 21, 2026

Uh oh!

mroeschke Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Removal IntervalDtype/StructDtype inheritance #21114

Are you sure you want to change the base?

Removal IntervalDtype/StructDtype inheritance #21114

Conversation

vyasr commented Jan 21, 2026

Description

Checklist

Uh oh!

mroeschke left a comment

Choose a reason for hiding this comment

Uh oh!

mroeschke Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

mroeschke Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

mroeschke Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants