Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If only convert_output.py did paragraphs (for readibility) #229

Open
cleesmith opened this issue May 27, 2024 · 1 comment
Open

If only convert_output.py did paragraphs (for readibility) #229

cleesmith opened this issue May 27, 2024 · 1 comment

Comments

@cleesmith
Copy link

cleesmith commented May 27, 2024

Wow! Well done, spot on, etc. This worked perfect on a MacBook Pro M3 after doing this:
install:
pipx install insanely-fast-whisper
PYTORCH_ENABLE_MPS_FALLBACK=1

insanely-fast-whisper --file-name audio.mp3 --device-id mps --model-name openai/whisper-large-v3 --batch-size 4

Since insanely-fast-whisper only does audio:

  1. do a youtube video about whatever, or read a book/chapter then:

  2. pip install pytube pydub
    from pytube import YouTube
    from pydub import AudioSegment
    video_url = "https://youtu.be/wGdHxSIYEIo?si=1r9krMIwerPusbZW"
    yt = YouTube(video_url)
    audio_stream = yt.streams.filter(only_audio=True).first()
    audio_file = audio_stream.download(filename='audio')
    audio = AudioSegment.from_file(audio_file)
    audio.export("audio.mp3", format="mp3")

  3. yields: audio.mp3 so then do:
    insanely-fast-whisper --file-name audio.mp3 --device-id mps --model-name openai/whisper-large-v3 --batch-size 4

  4. that yields: output.json so then do:
    python -B convert_output.py output.json -f txt -o .
    ... see:
    https://github.com/Vaibhavs10/insanely-fast-whisper/blob/main/convert_output.py

  5. yields: output.txt so the last step is:
    paragraphs, but how ?

Now, insanely perfect would be paragraphs ... they don't even have to be semantically/grammatically correct.
For now, I send output.txt to ChatGPT 4o or Gemini 1.5 Pro with this prompt:
"without changing any of the words, please make paragraphs where appropriate, just for human readibility to this text:"
... which also works great, but it would be nice to do this locally.

I tested with 30+ minute and 1+ hour videos (which transcribed in 4 mins 20 secs = wow).

Thanks ... fingers crossed for paragraphs too.

@cleesmith
Copy link
Author

I guees I could add something like this to convert_output.py ...

import re
import textwrap

def add_paragraphs(transcribed_text, sentences_per_paragraph=5, line_width=70):
    # normalize spaces
    transcribed_text = re.sub(r'\s+', ' ', transcribed_text).strip()
    
    # split the text into sentences, keeping the punctuation with the sentence
    sentence_endings = re.compile(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?|\!)\s')
    sentences = sentence_endings.split(transcribed_text)

    paragraphs = []
    for i in range(0, len(sentences), sentences_per_paragraph):
        paragraph = ' '.join(sentences[i:i + sentences_per_paragraph]).strip()
        wrapped_paragraph = textwrap.fill(paragraph, width=line_width)
        paragraphs.append(wrapped_paragraph)

    formatted_text = '\n\n'.join(paragraphs)

    # ensure the last sentence has punctuation
    if not re.search(r'[.!?]$', formatted_text.strip()):
        formatted_text += '.'

    return formatted_text

with open('output.txt', 'r') as file:
    transcribed_text = file.read()

formatted_text = add_paragraphs(transcribed_text, sentences_per_paragraph=3, line_width=70)

with open('formatted_output.txt', 'w') as file:
    file.write(formatted_text)

print("Formatted text with paragraphs and line wrapping has been saved to 'formatted_output.txt'.")

Perhaps semantic and in context would be better ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant