ghe_transcribe: A Tool to Transcribe Audio Files with Speaker Diarization

This repository contains a Python script called ghe_transcribe that transcribes audio files into text using Faster Whisper (a fast reimplementation of OpenAI's Whisper model) and Pyannote (for speaker diarization). This tool is especially useful for transcribing long audio recordings, improving transcription accuracy, and separating the audio into individual speakers.

Installation on Euler

First time environment setup:

Open https://jupyter.euler.hpc.ethz.ch/ and login with your @ethz.ch account. We can load the modules we need by running

module load stack/2024-06 python/3.11.6

Create a Python environment and create a kernel:

python3.11 -m venv venv3.11_ghe_transcribe --system-site-packages
source venv3.11_ghe_transcribe/bin/activate
pip3.11 install faster-whisper pyannote.audio ffmpeg-python huggingface-hub
ipython kernel install --user --name=venv3.11_ghe_transcribe

Setup JupyterHub starting configuration:

To have all new JupyterHub instanced with the venv3.11_ghe_transcribe Python environment,

nano .config/euler/jupyterhub/jupyterlabrc

and write:

module load stack/2024-06 python/3.11.6
source venv3.11_ghe_transcribe/bin/activate

Installation on MacOS

brew install ffmpeg cmake python3.11

python3.11 -m venv venv3.11_ghe_transcribe --system-site-packages
source venv3.11_ghe_transcribe/bin/activate
pip3.11 install faster-whisper pyannote.audio ffmpeg-python huggingface-hub
ipython kernel install --user --name=venv3.11_ghe_transcribe

How to use `ghe_transcribe`

Quick Start

Let's say you have an audio file called testing_audio_01.mp3, in the media folder, that you want to transcribe into a .csv and .md file.

setup config.json file

{
    "HF_TOKEN": "hf_*********************"
}

run the following command:

python ghe_transcribe.py media/testing_audio_01.mp3

Options

Options for ghe_transcribe:

ghe_transcribe(audio_file,
               device='cpu'|'cuda'|'mps',
               whisper_model='small.en'|'base.en'|'medium.en'|'small'|'base'|'medium'|'large'|'turbo',
               pyannote_model='pyannote/[email protected]'|'pyannote/speaker-diarization-3.1',
               save_output=True|False,
               semicolon=True|False,
               info=True|False
)

audio_file: The path to the audio file you want to transcribe. Accepted formats are .mp3, .wav.
device (optional): The device on which to run the model (cpu|cuda|mps). By default, the device is automatically detected based on whether CUDA or MPS is available.
whisper_model (optional): The size of the Faster Whisper model to use for transcription. Available options include small.en, base.en, medium.en, small, base, medium, large, turbo. By default, the English model medium.en is used.
pyannote_model (optional): The Pyannote model, defaults to pyannote/speaker-diarization-3.1.
save_output (optional): Default is True. It will create both output.csv and output.md. If output = None, the transcription will only be returned as a list of strings.
semicolon (optional): Specify whether to use semicolons or commas as the column separator in the CSV output. The default is commas.
info (optional): If you want the transcription tool to print additional information about the detected language and its probability.

Timings

Timing tests are run by using the timing function as defined in utils.py, and the audio file media/testing_audio_01.mp3

Device	Time (sec)
Euler Cluster (16 CPU cores, 16GB RAM) - `cpu`	67.4988
Euler Cluster (32 CPU cores, 16GB RAM) - `cpu`	44.3622
MacOS (Apple M2, 16GB RAM) - `mps`	41.2122
MacOS (Apple M2, 16GB RAM) - `cpu`	64.7549

Tools for...

Transcription

Why Whisper? See Whisper, wav2vec2 and Kaldi.

faster-whisper by Guillaume Klein, builds on OpenAI's open source transcription model Whisper.

Diarization

Why Pyannote? See Pyannote vs NeMo.

pyannote.audio by Hervé Bredin, open source diarization model of pyannoteAI, gated by HuggingFace access token https://hf.co/settings/tokens.
NeMo by Nvidia, open source diarization model.

Transcription + Diarization

WhisperX ← faster-whisper+pyannote.audio
whisper-diarization ← faster-whisper+NeMo
insanely-fast-whisper ← insanely-faster-whisper+pyannote.audio

GUI

wscribe-editor, works with wordlevel timestamps in a .json formatted like so sample.json.
QualCoder, a qualitative data analysis application written in Python.

Transcription + Diarization + GUI

noScribe ← faster-whisper+pyannote.audio
TranscriboZH ← WhisperX

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
media		media
output		output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
example.ipynb		example.ipynb
example.py		example.py
ghe_transcribe.py		ghe_transcribe.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ghe_transcribe: A Tool to Transcribe Audio Files with Speaker Diarization

Table of Contents

Installation on Euler

First time environment setup:

Create a Python environment and create a kernel:

Setup JupyterHub starting configuration:

Installation on MacOS

How to use `ghe_transcribe`

Quick Start

Options

Timings

Tools for...

Transcription

Diarization

Transcription + Diarization

GUI

Transcription + Diarization + GUI

About

Languages

Global-Health-Engineering/ghe_transcribe

Folders and files

Latest commit

History

Repository files navigation

ghe_transcribe: A Tool to Transcribe Audio Files with Speaker Diarization

Table of Contents

Installation on Euler

First time environment setup:

Create a Python environment and create a kernel:

Setup JupyterHub starting configuration:

Installation on MacOS

How to use ghe_transcribe

Quick Start

Options

Timings

Tools for...

Transcription

Diarization

Transcription + Diarization

GUI

Transcription + Diarization + GUI

About

Resources

Stars

Watchers

Forks

Languages

How to use `ghe_transcribe`