Skip to content

JermaSites/Jerma-Subtitle-Search

Repository files navigation

Subtitles

Video Count    : 2519
Word Count     : 28,835,328
Duration       : 6074:45:45
Oldest Video   : 2005-07-16
Latest Video   : 2026-01-11

Subtitles are obtained using a Python script. Audio gets downloaded with yt-dlp, which gets transcribed using WhisperX (large-v3 model) and converted to LRC format with ffmpeg.

Relevant information gets written to a JSON file, which gets indexed and compressed using a JS script.

The Python script also supports downloading manually added/YouTube's auto-generated subtitles, and optionally only transcribing videos which don't have subtitles available.

Read More

Initially used YouTube's auto-generated subtitles, but far too many videos either didn't have them available or had censored swears.

Tried using OpenAI's Whisper next, but after transcribing a bunch of videos with it I realized it kinda sucks in some aspects. It hallucinated a lot, especially during sections with no speech. Timestamps were incorrect on some transcriptions, and the first timestamp would always start at zero seconds, which was normally wrong. It's also pretty slow, especially if you use some of the bigger models.

Switching to WhisperX mostly solved the aforementioned problems. However, it's still far from perfect and does have some limitations.

Webpage

Uses Mithril, MiniSearch, lite-youtube-embed and fflate.

screenshot of webpage search results for the query: "on GitHub"

Primary mouse (tap on touchscreen) timestamps to open them in the embeds.
Middle mouse (hold on touchscreen) timestamps to open them in a new tab.
Secondary mouse timestamps to copy their link to the clipboard.

Wildcard characters (*) can be used in searches to match zero or more of any character.

Running Locally

# feel free to substitute bun with node & npm/yarn/whatever
git clone https://github.com/JermaSites/Jerma-Subtitle-Search.git
cd Jerma-Subtitle-Search
curl -o src/assets/Subtitles.json https://subtitlefiles.jerma.io/file/jerma-subtitles/Subtitles.json
bun install
bun src/scripts/index-subtitles.js
bun run dev

Contributing

If you'd like to contribute have a look at the contributing guide.

jermaHeart Twitch Emote

About

Webpage for searching through everything Jerma has verbally said online

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 7