Skip to content

Conversation

@AmirHussein96
Copy link
Contributor

This branch adds option to get single segment from trim_to_alignment:

    # Trim cuts to alignments and get the first segment
    cuts_prompt = cuts.trim_to_alignments(type="word",
                                        max_pause=max_pause,
                                        max_segment_duration=max_segment_duration,
                                        get_all_segments=False)

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a WIP PR that needs to be cleaned up, but I don't understand the intention behind the new option - can you explain that first?

return None
del self.custom[name]
return self
return self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

samples = augment_fn(samples, self.sampling_rate)
return extractor.extract(samples, self.sampling_rate)

def plot_audio(self, ax=None, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

# number of channels in underlying tracks.

# Ensure that all supervisions have the same channel.
if len(set(to_hashable(s.channel) for s in trimmed.supervisions)) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips trimmed segments that no longer contain any valid channel information (e.g., cases where the channel becomes empty or None after trimming).

:param keep_all_channels: If ``True``, the output cut will have the same channels as the input cut. By default,
the trimmed cut will have the same channels as the supervision.
:param num_jobs: Number of parallel workers to process the cuts.
:param get_all_segments: If ``True``, all segments will be returned. If ``False``, only the segment with the longest duration will be returned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this option - can you explain this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this option to preserve the original behavior. When set to True, it returns all chunks from the alignment-based chunking. When set to False, it returns only the first segment, which we use as the prompt. Returning all segments can break the mapping between batch indices and input samples, since the number of chunks per input varies, making it hard to maintain consistent ordering.

@AmirHussein96
Copy link
Contributor Author

Looks like a WIP PR that needs to be cleaned up, but I don't understand the intention behind the new option - can you explain that first?

@pzelasko The idea is to generate a batch of prompts by selecting audio segments that satisfy minimum silence and maximum duration constraints. Each batch entry corresponds to input audio it came from. This enables on-the-fly prompt extraction for applications such as speaker cloning in TTS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants