-
Notifications
You must be signed in to change notification settings - Fork 665
Description
Modin version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest released version of Modin.
-
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
import modin.pandas as pd
from modin.pandas.api.extensions import register_dataframe_accessor
from typing import NamedTuple
class CustomTuple(NamedTuple):
a: str
b: int
@register_dataframe_accessor("custom_method")
def custom_method(self, custom_arg: CustomTuple):
print(custom_arg.a + str(custom_arg.b))
pd.DataFrame().custom_method(CustomTuple("a", 1))
Issue Description
The above raises TypeError: CustomTuple.__new__() missing 1 required positional argument: 'b'
.
When the query compiler caster walks a function's arguments, it attempts to convert tuples to lists. However, NamedTuple
objects have different constructor behavior from the native tuple
object:
tuple(["a", 1]) # ('a', 1)
CustomTuple(["a", 1]) # raises TypeError because it tries to use the whole list as the first field
To fix this, we need to modify this block of code:
modin/modin/core/storage_formats/pandas/query_compiler_caster.py
Lines 421 to 428 in bf5f344
return ( | |
# ValuesView, which we might get from dict.values(), is immutable, | |
# but not constructable, so we convert it to a tuple. Otherwise, | |
# we return an object of the same type as the input. | |
tuple | |
if issubclass(args_type, ValuesView) | |
else args_type | |
)(visit_nested_args(list(arguments), fn)) |
to either use NamedTuple._make
, or handle passing of collections differently.
Expected Behavior
Should not error.
Error Logs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/joshi/code/modin/modin/core/storage_formats/pandas/query_compiler_caster.py", line 986, in f_with_argument_casting
visit_nested_args(args, register_query_compilers)
File "/Users/joshi/code/modin/modin/core/storage_formats/pandas/query_compiler_caster.py", line 428, in visit_nested_args
)(visit_nested_args(list(arguments), fn))
File "/Users/joshi/code/modin/modin/core/storage_formats/pandas/query_compiler_caster.py", line 436, in visit_nested_args
visit_nested_args(arguments[i], fn)
File "/Users/joshi/code/modin/modin/core/storage_formats/pandas/query_compiler_caster.py", line 421, in visit_nested_args
return (
TypeError: CustomTuple.__new__() missing 1 required positional argument: 'b'
Installed Versions
INSTALLED VERSIONS
commit : 8600760
python : 3.10.13.final.0
python-bits : 64
OS : Darwin
OS-release : 24.5.0
Version : Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:25 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6020
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
Modin dependencies
modin : 0.32.0+69.g86007603
ray : 2.34.0
dask : 2024.8.1
distributed : 2024.8.1
pandas dependencies
pandas : 2.2.2
numpy : 2.2.6
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.0.0
pip : 23.3
Cython : None
pytest : 8.3.2
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.3.0
html5lib : None
pymysql : None
psycopg2 : 2.9.9
jinja2 : 3.1.4
IPython : 8.17.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat : None
fastparquet : 2024.5.0
fsspec : 2024.6.1
gcsfs : None
matplotlib : 3.9.2
numba : None
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : 0.23.1
pyarrow : 17.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : 2024.6.1
scipy : 1.14.1
sqlalchemy : 2.0.32
tables : 3.10.1
tabulate : 0.9.0
xarray : 2024.7.0
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None