Skip to content

HuggingFace not using specified cache_dir #544

Closed
@deependujha

Description

@deependujha

🐛 Bug

To Reproduce

  • Run the below code, and you may observe that my_cache dir is not being used
import litdata as ld

# Define the Hugging Face dataset URI
hf_dataset_uri = "hf://datasets/leonardPKU/clevr_cogen_a_train/data"

# Create a streaming dataset
# dataset is of 13.2 GB - so at the end of the streaming, cache should be clear
dataset = ld.StreamingDataset(hf_dataset_uri, cache_dir = "my_cache", max_cache_size="10GB")

# Stream the dataset using StreamingDataLoader
dataloader = ld.StreamingDataLoader(dataset, batch_size=4)
for sample in dataloader:
    pass 

Expected behavior

Additional context

Environment detail
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions