Voice-to-text application with real-time transcription and automatic clipboard integration.
Press Shift+V, speak, press Shift+V again - your text is ready to paste. โจ
- ๐ฏ Voice recording - Simple Shift+V hotkey to start/stop recording
- ๐บ YouTube transcription - Press Shift+Y to transcribe YouTube videos from clipboard
- ๐ Local file transcription - Press Shift+F to transcribe audio/video files from clipboard
- ๐ Batch processing - Select and transcribe multiple files at once, each cached separately
- ๐ File Concatenator - Press Shift+K to concatenate source files for LLM context
- โก Real-time transcription - Using OpenAI Whisper Turbo model
- ๐ GPU acceleration - CUDA support for fast transcription (CPU fallback available)
- ๐ Automatic clipboard - Transcribed text instantly available for pasting
- ๐พ Persistent caching - All transcriptions cached 24h, survives restarts (avoid re-processing)
- ๐ System tray integration - Runs quietly in background with functional menu
- ๐ง Smart memory management - Auto-loads/unloads model to save GPU memory
- ๐ง Virtual audio support - Works with NVIDIA Broadcast, VB-Cable, Krisp, etc.
- ๐ Cross-platform - Windows, Linux, macOS
โญ Recommended: Windows with NVIDIA GPU for best performance
# Check Python version (must be 3.12)
python --version
# Install FFmpeg
winget install ffmpeg
# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python -m venv .venv
.venv\Scripts\activate
# Install dependencies (with CUDA for GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# Run
python main.py# Check Python version (must be 3.12)
python3.12 --version
# Install system dependencies
sudo apt update && sudo apt install ffmpeg portaudio19-dev python3-tk
# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python3.12 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
# Run
python main.py# Check Python version (must be 3.12)
python3.12 --version
# Install system dependencies
brew install ffmpeg portaudio
# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python3.12 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Run
python main.pyUsage:
- Press
Shift+Vโ speak โ pressShift+Vโ text in clipboard โ - Copy YouTube URL โ press
Shift+Yโ video transcribed โ text in clipboard โ - Copy file or file path โ press
Shift+Fโ file transcribed โ text in clipboard โ - Copy folder path โ
Shift+Kโ copy file path โShift+Kโ files concatenated โ text in clipboard โ
python main.pyList available devices:
python main.py --list-devicesUse specific device:
python main.py --device 6python main.py --keep-model-loadedChoose quality settings based on your GPU:
# Standard profile (RTX 2080S 8GB) - default
python main.py --gpu-profile standard
# High-End profile (RTX 3090+ 24GB) - maximum quality
python main.py --gpu-profile high_endStandard profile (8GB VRAM):
- beam_size=5, patience=1.0, temperature=0.0
- Good quality, fast transcription
- Recommended for RTX 2060, 2070, 2080, 3060, 3070
High-End profile (24GB VRAM):
- beam_size=10, patience=1.5, temperature=0.0
- Maximum quality, same speed or faster (better beam search)
- Recommended for RTX 3090, 4090, A5000, A6000
You can also change GPU profile from system tray menu.
- Press
Ctrl+Cin terminal - Right-click tray icon โ Exit
- ๐ Start application - tray icon appears in system tray
- โจ๏ธ Press
Shift+V- recording starts (icon turns red) - ๐ค Speak into microphone
- โจ๏ธ Press
Shift+Vagain - recording stops - โณ Wait for transcription (icon turns orange)
- ๐ Text automatically copied to clipboard
- โจ Paste anywhere with
Ctrl+V
- ๐ Copy YouTube video URL to clipboard
- โจ๏ธ Press
Shift+Y- downloading starts (icon turns purple) - โณ Wait for download and transcription (icon shows download arrow)
- ๐ Transcription automatically copied to clipboard
- โจ Paste anywhere with
Ctrl+V - ๐พ Transcription cached for 24h - next use instant, even after restart!
Single file:
- ๐ Copy file from File Explorer (Ctrl+C on file) OR copy file path as text
- โจ๏ธ Press
Shift+F- processing starts (icon turns orange) - โณ Wait for audio extraction and transcription
- ๐ Transcription automatically copied to clipboard
- โจ Paste anywhere with
Ctrl+V - ๐พ Transcription cached for 24h - next use instant, even after restart!
Multiple files:
- ๐ Select and copy multiple files from File Explorer (Ctrl+C on multiple files)
- โจ๏ธ Press
Shift+F- processing starts for all files - โณ Each file processed and transcribed separately
- ๐ All transcriptions concatenated with filename headers
- โจ Paste anywhere with
Ctrl+V - ๐พ Each file cached separately for 24h - can use individually later, survives restarts!
Supported formats:
- Audio:
.mp3,.wav,.m4a,.flac,.ogg,.aac,.wma - Video:
.mp4,.avi,.mkv,.mov,.wmv,.flv,.webm,.m4v
Concatenate source code files from a folder into a single text for LLM context.
Hotkey method (Shift+K):
- ๐ Copy folder path to clipboard (e.g.
C:\project\src) - โจ๏ธ Press
Shift+K- folder is set - ๐ Copy any file path to clipboard (e.g.
C:\anywhere\main.py) - โจ๏ธ Press
Shift+K- extension.pyis added - ๐ (Optional) Copy more files with different extensions โ
Shift+Kfor each - โณ Wait 2 seconds - all files with selected extensions are concatenated
- ๐ Result automatically copied to clipboard
GUI method:
- Right-click tray icon โ "File Concatenator... (Shift+K)"
- Select folder, enter extensions, click Run
CLI method:
python src/file_concatenator.py ./src .py .js .ts
python src/file_concatenator.py ./project .py --exclude dist build --output context.txt
python src/file_concatenator.py --guiOutput format:
################################################################################
# FILE: src/main.py
################################################################################
<file content>
################################################################################
# FILE: src/utils/helper.py
################################################################################
<file content>
Icon colors: ๐ข ready โ ๐ด recording โ ๐ฃ downloading โ ๐ต processing โ ๐ข ready
All transcriptions are automatically cached for 24 hours:
- ๐ Survives restarts - Cache stored in
~/.voicepaste_cache.json - โก Instant results - Re-using cached transcription takes <1ms
- ๐ Auto-cleanup - Expired entries removed automatically
- ๐ All types - YouTube, single files, batch files all cached equally
Example:
Day 1, 9:00 AM โ Process 5 lectures (30 mins total)
Day 1, 3:00 PM โ Reopen lecture 3 โ Instant from cache โ
Day 2, 8:00 AM โ Reboot PC, reopen lecture 1 โ Still cached โ
Day 2, 10:00 AM โ Reopen all 5 โ All instant from cache โ
Day 3, 9:00 AM โ Cache expired, would re-process if needed
Application uses intelligent 3-tier memory management by default:
- โก Preloading: Model starts loading to VRAM when you start recording - ready by the time you finish speaking
- ๐ฎ VRAM (GPU): Model actively used on CUDA for transcription
- ๐พ RAM (CPU): After 1 hour of inactivity, model moves from VRAM to RAM
- ๐ค Disk: After 5 hours of inactivity, model fully unloaded from memory
- ๐ Auto-recovery: Model automatically moves back to GPU when needed
Use --keep-model-loaded flag if:
- ๐ You use the app frequently
- ๐พ You have plenty of GPU memory
- โก Speed is more important than memory usage
Automatically adapts to your microphone:
- ๐ Detects native sample rate (e.g. 48kHz for NVIDIA Broadcast)
- ๐ผ Records at native sample rate for maximum compatibility
- ๐ Auto-resamples to 16kHz for Whisper processing
- ๐๏ธ Works with virtual audio devices (NVIDIA Broadcast, VB-Cable, Krisp, etc.)
Application automatically falls back to CPU if CUDA fails. Slower but works.
python main.py --list-devices
python main.py --device <ID>Speak longer (minimum 1 second) or check if microphone is working.
FFmpeg is a system dependency (not a Python package) and must be installed via system package manager.
Windows:
winget install ffmpegLinux:
sudo apt update && sudo apt install ffmpegmacOS:
brew install ffmpegAlready included in Quick Start for each OS. If still fails:
Windows:
pip install pipwin
pipwin install pyaudioLinux:
sudo apt update && sudo apt install portaudio19-dev python3-tk
pip install PyAudiomacOS:
brew install portaudio
pip install PyAudioFor system tray icon support:
sudo apt-get install gir1.2-appindicator3-0.1 libappindicator3-1- System tray icon requires Pillow with ImageDraw support (included in requirements)
- Global hotkeys work system-wide but may require accessibility permissions
- Go to System Preferences โ Security & Privacy โ Privacy โ Accessibility
- Add Terminal or your Python interpreter to allowed applications
MIT License - see LICENSE file for details