Open
Description
Describe the bug
I am facing an issue since today where HF's dataset is acting weird and in some instances failure to recognise a valid dataset entirely, I think it is happening due to recent change in fsspec
lib as using this command fixed it for me in one-time: !pip install -U datasets huggingface_hub fsspec
Steps to reproduce the bug
from datasets import load_dataset
def download_hf():
dataset_name = input("Enter the dataset name: ")
subset_name = input("Enter subset name: ")
ds = load_dataset(dataset_name, name=subset_name)
for split in ds:
ds[split].to_pandas().to_csv(f"{subset_name}.csv", index=False)
download_hf()
Expected behavior
Downloading readme: 100%
1.55k/1.55k [00:00<00:00, 121kB/s]
Downloading data files: 100%
1/1 [00:00<00:00, 2.06it/s]
Downloading data: 0%| | 0.00/54.2k [00:00<?, ?B/s]
Downloading data: 100%|██████████| 54.2k/54.2k [00:00<00:00, 121kB/s]
Extracting data files: 100%
1/1 [00:00<00:00, 35.17it/s]
Generating test split:
140/0 [00:00<00:00, 2628.62 examples/s]
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
[<ipython-input-2-12ab305b0e77>](https://localhost:8080/#) in <cell line: 0>()
8 ds[split].to_pandas().to_csv(f"{subset_name}.csv", index=False)
9
---> 10 download_hf()
2 frames
[/usr/local/lib/python3.11/dist-packages/datasets/builder.py](https://localhost:8080/#) in as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
1171 is_local = not is_remote_filesystem(self._fs)
1172 if not is_local:
-> 1173 raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
1174 if not os.path.exists(self._output_dir):
1175 raise FileNotFoundError(
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.
OR
Traceback (most recent call last):
File "e:\Fuck\download-data\mcq_dataset.py", line 10, in <module>
download_hf()
File "e:\Fuck\download-data\mcq_dataset.py", line 6, in download_hf
ds = load_dataset(dataset_name, name=subset_name)
File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 1917, in dataset_module_factory
raise e1 from None
File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 1867, in dataset_module_factory
raise DatasetNotFoundError(f"Dataset '{path}' doesn't exist on the Hub or cannot be accessed.") from e
datasets.exceptions.DatasetNotFoundError: Dataset 'dataset repo_id' doesn't exist on the Hub or cannot be accessed.
Environment info
colab and 3.10 local system
Metadata
Metadata
Assignees
Labels
No labels