If only convert_output.py did paragraphs (for readibility) #229

cleesmith · 2024-05-27T15:40:49Z

Wow! Well done, spot on, etc. This worked perfect on a MacBook Pro M3 after doing this:
install:
pipx install insanely-fast-whisper
PYTORCH_ENABLE_MPS_FALLBACK=1

insanely-fast-whisper --file-name audio.mp3 --device-id mps --model-name openai/whisper-large-v3 --batch-size 4

Since insanely-fast-whisper only does audio:

do a youtube video about whatever, or read a book/chapter then:
pip install pytube pydub
from pytube import YouTube
from pydub import AudioSegment
video_url = "https://youtu.be/wGdHxSIYEIo?si=1r9krMIwerPusbZW"
yt = YouTube(video_url)
audio_stream = yt.streams.filter(only_audio=True).first()
audio_file = audio_stream.download(filename='audio')
audio = AudioSegment.from_file(audio_file)
audio.export("audio.mp3", format="mp3")
yields: audio.mp3 so then do:
insanely-fast-whisper --file-name audio.mp3 --device-id mps --model-name openai/whisper-large-v3 --batch-size 4
that yields: output.json so then do:
python -B convert_output.py output.json -f txt -o .
... see:
https://github.com/Vaibhavs10/insanely-fast-whisper/blob/main/convert_output.py
yields: output.txt so the last step is:
paragraphs, but how ?

Now, insanely perfect would be paragraphs ... they don't even have to be semantically/grammatically correct.
For now, I send output.txt to ChatGPT 4o or Gemini 1.5 Pro with this prompt:
"without changing any of the words, please make paragraphs where appropriate, just for human readibility to this text:"
... which also works great, but it would be nice to do this locally.

I tested with 30+ minute and 1+ hour videos (which transcribed in 4 mins 20 secs = wow).

Thanks ... fingers crossed for paragraphs too.

cleesmith · 2024-05-27T16:12:43Z

I guees I could add something like this to convert_output.py ...

import re
import textwrap

def add_paragraphs(transcribed_text, sentences_per_paragraph=5, line_width=70):
    # normalize spaces
    transcribed_text = re.sub(r'\s+', ' ', transcribed_text).strip()
    
    # split the text into sentences, keeping the punctuation with the sentence
    sentence_endings = re.compile(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?|\!)\s')
    sentences = sentence_endings.split(transcribed_text)

    paragraphs = []
    for i in range(0, len(sentences), sentences_per_paragraph):
        paragraph = ' '.join(sentences[i:i + sentences_per_paragraph]).strip()
        wrapped_paragraph = textwrap.fill(paragraph, width=line_width)
        paragraphs.append(wrapped_paragraph)

    formatted_text = '\n\n'.join(paragraphs)

    # ensure the last sentence has punctuation
    if not re.search(r'[.!?]$', formatted_text.strip()):
        formatted_text += '.'

    return formatted_text

with open('output.txt', 'r') as file:
    transcribed_text = file.read()

formatted_text = add_paragraphs(transcribed_text, sentences_per_paragraph=3, line_width=70)

with open('formatted_output.txt', 'w') as file:
    file.write(formatted_text)

print("Formatted text with paragraphs and line wrapping has been saved to 'formatted_output.txt'.")

Perhaps semantic and in context would be better ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

If only convert_output.py did paragraphs (for readibility) #229

If only convert_output.py did paragraphs (for readibility) #229

cleesmith commented May 27, 2024 •

edited

Loading

cleesmith commented May 27, 2024

If only convert_output.py did paragraphs (for readibility) #229

If only convert_output.py did paragraphs (for readibility) #229

Comments

cleesmith commented May 27, 2024 • edited Loading

cleesmith commented May 27, 2024

cleesmith commented May 27, 2024 •

edited

Loading