Skip to content

Dataset lib seems to broke after fssec lib update #7570

Open
@sleepingcat4

Description

@sleepingcat4

Describe the bug

I am facing an issue since today where HF's dataset is acting weird and in some instances failure to recognise a valid dataset entirely, I think it is happening due to recent change in fsspec lib as using this command fixed it for me in one-time: !pip install -U datasets huggingface_hub fsspec

Steps to reproduce the bug

from datasets import load_dataset

def download_hf():
dataset_name = input("Enter the dataset name: ")
subset_name = input("Enter subset name: ")
ds = load_dataset(dataset_name, name=subset_name)
for split in ds:
ds[split].to_pandas().to_csv(f"{subset_name}.csv", index=False)

download_hf()

Expected behavior

Downloading readme: 100%
 1.55k/1.55k [00:00<00:00, 121kB/s]
Downloading data files: 100%
 1/1 [00:00<00:00,  2.06it/s]

Downloading data:   0%|          | 0.00/54.2k [00:00<?, ?B/s]
Downloading data: 100%|██████████| 54.2k/54.2k [00:00<00:00, 121kB/s]
Extracting data files: 100%
 1/1 [00:00<00:00, 35.17it/s]
Generating test split: 
 140/0 [00:00<00:00, 2628.62 examples/s]
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
[<ipython-input-2-12ab305b0e77>](https://localhost:8080/#) in <cell line: 0>()
      8     ds[split].to_pandas().to_csv(f"{subset_name}.csv", index=False)
      9 
---> 10 download_hf()

2 frames
[/usr/local/lib/python3.11/dist-packages/datasets/builder.py](https://localhost:8080/#) in as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
   1171         is_local = not is_remote_filesystem(self._fs)
   1172         if not is_local:
-> 1173             raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
   1174         if not os.path.exists(self._output_dir):
   1175             raise FileNotFoundError(

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

OR

Traceback (most recent call last):
  File "e:\Fuck\download-data\mcq_dataset.py", line 10, in <module>
    download_hf()
  File "e:\Fuck\download-data\mcq_dataset.py", line 6, in download_hf
    ds = load_dataset(dataset_name, name=subset_name)
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 2606, in load_dataset
    builder_instance = load_dataset_builder(
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 2277, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 1917, in dataset_module_factory    
    raise e1 from None
  File "C:\Users\DELL\AppData\Local\Programs\Python\Python310\lib\site-packages\datasets\load.py", line 1867, in dataset_module_factory    
    raise DatasetNotFoundError(f"Dataset '{path}' doesn't exist on the Hub or cannot be accessed.") from e
datasets.exceptions.DatasetNotFoundError: Dataset 'dataset repo_id' doesn't exist on the Hub or cannot be accessed.

Environment info

colab and 3.10 local system

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions