-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers
Description
Describe the bug
In load_dataset_builder(), build_kwargs and config_kwargs can contain the same keywords leading to a TypeError("type object got multiple values for keyword argument "xyz").
I ran into this problem with the keyword: base_path. It might happen with other kwargs as well. I think a quickfix would be
builder_cls = import_main_class(dataset_module.module_path)
builder_kwargs = dataset_module.builder_kwargs
data_files = builder_kwargs.pop("data_files", data_files)
config_name = builder_kwargs.pop("config_name", name)
hash = builder_kwargs.pop("hash")
base_path = builder_kwargs.pop("base_path")and then pass base_path into builder_cls.
Steps to reproduce the bug
from datasets import load_dataset
load_dataset("rotten_tomatoes", base_path="./sample_data")Expected results
The docs state: **config_kwargs — Keyword arguments to be passed to the BuilderConfig and used in the DatasetBuilder.
So I would expect to be able to pass the base_path into load_dataset().
Actual results
TypeError("type object got multiple values for keyword argument "base_path").
Environment info
datasetsversion: 2.4.0- Platform: macOS-12.5-arm64-arm-64bit
- Python version: 3.8.9
- PyArrow version: 9.0.0
thepurpleowl
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers