Feature Extraction Optimization in Training Process (Benchmark/DASB)

**Problem**:
During the training process, a significant amount of CPU resources are consumed, while the GPU utilization rate remains relatively low.
In the training process, the model extracts tokens from the given waveform for every epoch.

**Effect**: The above problems significantly prolong the training time, making the evaluation of a custom tokenizer cumbersome and time-consuming.

**Suggestions**:

1. Restricting CPU Resource Usage by adding "torch.set_num_threads(1)".
2. Is it possible to extract tokens once before training and save them to memory or disk? Subsequently, load the tokens from memory or disk when needed.
3. I've noticed that in the speech_enhancement and speech_separation tasks, the extracted tokens are saved to a dictionary. However, the waveform is still loaded during retrieval, which incurs some I/O overhead. Therefore, once the tokens are saved, should we consider not loading the waveform anymore?

**Preliminary Results**: In the evaluation of keyword-spotting for one given tokenizer (i.e., encodec), the time usage for one epoch can be decreased by 50-80% after the implementation of the above suggestions.

**Others**: Is it supported to submit a merge request for the changes related to this feature?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Extraction Optimization in Training Process (Benchmark/DASB) #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Extraction Optimization in Training Process (Benchmark/DASB) #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions