Extract prompts from alignments option #1516

AmirHussein96 · 2025-09-24T14:56:47Z

This branch adds option to get single segment from trim_to_alignment:

    # Trim cuts to alignments and get the first segment
    cuts_prompt = cuts.trim_to_alignments(type="word",
                                        max_pause=max_pause,
                                        max_segment_duration=max_segment_duration,
                                        get_all_segments=False)

pzelasko

Looks like a WIP PR that needs to be cleaned up, but I don't understand the intention behind the new option - can you explain that first?

pzelasko · 2025-10-07T12:13:57Z

lhotse/custom.py

            return None
        del self.custom[name]
-        return self
+        return self


pzelasko · 2025-10-07T12:14:04Z

lhotse/cut/base.py

            samples = augment_fn(samples, self.sampling_rate)
        return extractor.extract(samples, self.sampling_rate)

-    def plot_audio(self, ax=None, **kwargs):


pzelasko · 2025-10-07T12:14:47Z

lhotse/cut/base.py

                # number of channels in underlying tracks.

                # Ensure that all supervisions have the same channel.
+                if len(set(to_hashable(s.channel) for s in trimmed.supervisions)) == 0:


why is this added?

This skips trimmed segments that no longer contain any valid channel information (e.g., cases where the channel becomes empty or None after trimming).

pzelasko · 2025-10-07T12:15:36Z

lhotse/cut/base.py

        :param keep_all_channels: If ``True``, the output cut will have the same channels as the input cut. By default,
            the trimmed cut will have the same channels as the supervision.
        :param num_jobs: Number of parallel workers to process the cuts.
+        :param get_all_segments: If ``True``, all segments will be returned. If ``False``, only the segment with the longest duration will be returned.


I don't understand this option - can you explain this?

I added this option to preserve the original behavior. When set to True, it returns all chunks from the alignment-based chunking. When set to False, it returns only the first segment, which we use as the prompt. Returning all segments can break the mapping between batch indices and input samples, since the number of chunks per input varies, making it hard to maintain consistent ordering.

AmirHussein96 · 2025-11-06T23:05:48Z

Looks like a WIP PR that needs to be cleaned up, but I don't understand the intention behind the new option - can you explain that first?

@pzelasko The idea is to generate a batch of prompts by selecting audio segments that satisfy minimum silence and maximum duration constraints. Each batch entry corresponds to input audio it came from. This enables on-the-fly prompt extraction for applications such as speaker cloning in TTS.

Amir Hussein and others added 6 commits July 6, 2025 19:54

add get single segment option to trim_to_alignment

3057611

get segment with max duration

f5514c9

Handle target audio with different duration

a04457a

Merge branch 'lhotse-speech:master' into alignments

8aa3072

collate with taregt audio

93dfaa1

Merge branch 'lhotse-speech:master' into alignments

3d83668

pzelasko reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extract prompts from alignments option #1516

Extract prompts from alignments option #1516

Uh oh!

AmirHussein96 commented Sep 24, 2025

Uh oh!

pzelasko left a comment

Uh oh!

pzelasko Oct 7, 2025

Uh oh!

pzelasko Oct 7, 2025

Uh oh!

pzelasko Oct 7, 2025

Uh oh!

AmirHussein96 Nov 6, 2025

Uh oh!

pzelasko Oct 7, 2025

Uh oh!

AmirHussein96 Nov 6, 2025

Uh oh!

AmirHussein96 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Extract prompts from alignments option #1516

Are you sure you want to change the base?

Extract prompts from alignments option #1516

Uh oh!

Conversation

AmirHussein96 commented Sep 24, 2025

Uh oh!

pzelasko left a comment

Choose a reason for hiding this comment

Uh oh!

pzelasko Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

pzelasko Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

pzelasko Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

AmirHussein96 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

pzelasko Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

AmirHussein96 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

AmirHussein96 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants