-
Notifications
You must be signed in to change notification settings - Fork 258
Extract prompts from alignments option #1516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
pzelasko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a WIP PR that needs to be cleaned up, but I don't understand the intention behind the new option - can you explain that first?
| return None | ||
| del self.custom[name] | ||
| return self | ||
| return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
| samples = augment_fn(samples, self.sampling_rate) | ||
| return extractor.extract(samples, self.sampling_rate) | ||
|
|
||
| def plot_audio(self, ax=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
| # number of channels in underlying tracks. | ||
|
|
||
| # Ensure that all supervisions have the same channel. | ||
| if len(set(to_hashable(s.channel) for s in trimmed.supervisions)) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This skips trimmed segments that no longer contain any valid channel information (e.g., cases where the channel becomes empty or None after trimming).
| :param keep_all_channels: If ``True``, the output cut will have the same channels as the input cut. By default, | ||
| the trimmed cut will have the same channels as the supervision. | ||
| :param num_jobs: Number of parallel workers to process the cuts. | ||
| :param get_all_segments: If ``True``, all segments will be returned. If ``False``, only the segment with the longest duration will be returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this option - can you explain this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this option to preserve the original behavior. When set to True, it returns all chunks from the alignment-based chunking. When set to False, it returns only the first segment, which we use as the prompt. Returning all segments can break the mapping between batch indices and input samples, since the number of chunks per input varies, making it hard to maintain consistent ordering.
@pzelasko The idea is to generate a batch of prompts by selecting audio segments that satisfy minimum silence and maximum duration constraints. Each batch entry corresponds to input audio it came from. This enables on-the-fly prompt extraction for applications such as speaker cloning in TTS. |
This branch adds option to get single segment from
trim_to_alignment: