Skip to content

load_dataset() Fails with NotImplementedError Due to LocalFileSystem Cache in Colab #321

Open
@kartmpk

Description

@kartmpk

load_dataset("argilla/synthetic-concise-reasoning-sft-filtered") raises a NotImplementedError in Colab due to incompatible local cache handling.

I'm running into the following issue when trying to load the dataset:

Error:
NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

Reproduction steps:
from datasets import load_dataset
ds = load_dataset("argilla/synthetic-concise-reasoning-sft-filtered")

Followed #260 to resolve dependencies.

Also got following errors, from these installs
! pip3 install ai-edge-torch-nightly==0.6.0.dev20250605
! pip3 install ai-edge-litert==1.3.0
! pip3 install mediapipe==0.10.21

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.
ydf 0.12.0 requires protobuf<6.0.0,>=5.29.1, but you have protobuf 4.25.8 which is incompatible.
grpcio-status 1.71.0 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.8 which is incompatible.
tensorflow 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.18.0 requires ml-dtypes<0.5.0,>=0.4.0, but you have ml-dtypes 0.5.1 which is incompatible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions