Skip to content

Global-Health-Engineering/ghe_transcribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ghe_transcribe: A Tool to Transcribe Audio Files with Speaker Diarization

This repository contains a Python script called ghe_transcribe that transcribes audio files into text using Faster Whisper (a fast reimplementation of OpenAI's Whisper model) and Pyannote (for speaker diarization). This tool is especially useful for transcribing long audio recordings, improving transcription accuracy, and separating the audio into individual speakers.

Table of Contents

  1. Installation on Euler
  2. Installation on MacOS
  3. How to use ghe_transcribe
  4. Tools for...

Installation on Euler

First time environment setup:

Open https://jupyter.euler.hpc.ethz.ch/ and login with your @ethz.ch account. We can load the modules we need by running

module load stack/2024-06 python/3.11.6

Create a Python environment and create a kernel:

python3.11 -m venv venv3.11_ghe_transcribe --system-site-packages
source venv3.11_ghe_transcribe/bin/activate
pip3.11 install faster-whisper pyannote.audio ffmpeg-python huggingface-hub
ipython kernel install --user --name=venv3.11_ghe_transcribe

Setup JupyterHub starting configuration:

To have all new JupyterHub instanced with the venv3.11_ghe_transcribe Python environment,

nano .config/euler/jupyterhub/jupyterlabrc

and write:

module load stack/2024-06 python/3.11.6
source venv3.11_ghe_transcribe/bin/activate

Installation on MacOS

brew install ffmpeg cmake python3.11
python3.11 -m venv venv3.11_ghe_transcribe --system-site-packages
source venv3.11_ghe_transcribe/bin/activate
pip3.11 install faster-whisper pyannote.audio ffmpeg-python huggingface-hub
ipython kernel install --user --name=venv3.11_ghe_transcribe

How to use ghe_transcribe

Quick Start

Let's say you have an audio file called testing_audio_01.mp3, in the media folder, that you want to transcribe into a .csv and .md file.

  • setup config.json file
{
    "HF_TOKEN": "hf_*********************"
}
  • run the following command:
python ghe_transcribe.py media/testing_audio_01.mp3

Options

Options for ghe_transcribe:

ghe_transcribe(audio_file,
               device='cpu'|'cuda'|'mps',
               whisper_model='small.en'|'base.en'|'medium.en'|'small'|'base'|'medium'|'large'|'turbo',
               pyannote_model='pyannote/[email protected]'|'pyannote/speaker-diarization-3.1',
               save_output=True|False,
               semicolon=True|False,
               info=True|False
)
  • audio_file: The path to the audio file you want to transcribe. Accepted formats are .mp3, .wav.
  • device (optional): The device on which to run the model (cpu|cuda|mps). By default, the device is automatically detected based on whether CUDA or MPS is available.
  • whisper_model (optional): The size of the Faster Whisper model to use for transcription. Available options include small.en, base.en, medium.en, small, base, medium, large, turbo. By default, the English model medium.en is used.
  • pyannote_model (optional): The Pyannote model, defaults to pyannote/speaker-diarization-3.1.
  • save_output (optional): Default is True. It will create both output.csv and output.md. If output = None, the transcription will only be returned as a list of strings.
  • semicolon (optional): Specify whether to use semicolons or commas as the column separator in the CSV output. The default is commas.
  • info (optional): If you want the transcription tool to print additional information about the detected language and its probability.

Timings

Timing tests are run by using the timing function as defined in utils.py, and the audio file media/testing_audio_01.mp3

Device Time (sec)
Euler Cluster (16 CPU cores, 16GB RAM) - cpu 67.4988
Euler Cluster (32 CPU cores, 16GB RAM) - cpu 44.3622
MacOS (Apple M2, 16GB RAM) - mps 41.2122
MacOS (Apple M2, 16GB RAM) - cpu 64.7549

Tools for...

Transcription

Why Whisper? See Whisper, wav2vec2 and Kaldi.

Diarization

Why Pyannote? See Pyannote vs NeMo.

Transcription + Diarization

GUI

Transcription + Diarization + GUI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks