Skip to content

Feature Extraction Optimization in Training Process (Benchmark/DASB) #53

Open
@nyd3001

Description

@nyd3001

Problem:
During the training process, a significant amount of CPU resources are consumed, while the GPU utilization rate remains relatively low.
In the training process, the model extracts tokens from the given waveform for every epoch.

Effect: The above problems significantly prolong the training time, making the evaluation of a custom tokenizer cumbersome and time-consuming.

Suggestions:

  1. Restricting CPU Resource Usage by adding "torch.set_num_threads(1)".
  2. Is it possible to extract tokens once before training and save them to memory or disk? Subsequently, load the tokens from memory or disk when needed.
  3. I've noticed that in the speech_enhancement and speech_separation tasks, the extracted tokens are saved to a dictionary. However, the waveform is still loaded during retrieval, which incurs some I/O overhead. Therefore, once the tokens are saved, should we consider not loading the waveform anymore?

Preliminary Results: In the evaluation of keyword-spotting for one given tokenizer (i.e., encodec), the time usage for one epoch can be decreased by 50-80% after the implementation of the above suggestions.

Others: Is it supported to submit a merge request for the changes related to this feature?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions