Skip to content

Chapter 5 - Finetune - Disk occupation space problem #194

@jupiterMJM

Description

@jupiterMJM

Hi everyone!
I got a little problem when running the following course: https://huggingface.co/learn/audio-course/chapter5/fine-tuning .
I've understand all the command but I run into an error when running the command

common_voice = common_voice.map(
    prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=1
)

This command aims to transform audio data into log-mel diagramme.
I've got the following error: OSError: [Errno 28] No space left on device
which is quite clear.
After a little investigation, I've noticed the creation of temp file in my folder that were created when I launch this command. Here is what I think happens: the .map function transform every single audio data into the log-mel image and try to store it somewhere (the disk due to the fact that the RAM isn't enough). However, this tempfile can weigh up to several hundred Go !!!!!

Therefore, here is my question:

  • is there a way to change how the transformation into log-mel is done ? Like not to create all log-mel at one time, but more like batches when it is needed ?
  • if not, can someone tell me the ratio space_occupied_after_map_function over weight_of_the_dataset ?

Thanks a lot!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions