Open
Description
Problem:
During the training process, a significant amount of CPU resources are consumed, while the GPU utilization rate remains relatively low.
In the training process, the model extracts tokens from the given waveform for every epoch.
Effect: The above problems significantly prolong the training time, making the evaluation of a custom tokenizer cumbersome and time-consuming.
Suggestions:
- Restricting CPU Resource Usage by adding "torch.set_num_threads(1)".
- Is it possible to extract tokens once before training and save them to memory or disk? Subsequently, load the tokens from memory or disk when needed.
- I've noticed that in the speech_enhancement and speech_separation tasks, the extracted tokens are saved to a dictionary. However, the waveform is still loaded during retrieval, which incurs some I/O overhead. Therefore, once the tokens are saved, should we consider not loading the waveform anymore?
Preliminary Results: In the evaluation of keyword-spotting for one given tokenizer (i.e., encodec), the time usage for one epoch can be decreased by 50-80% after the implementation of the above suggestions.
Others: Is it supported to submit a merge request for the changes related to this feature?
Metadata
Metadata
Assignees
Labels
No labels