🎤 VoicePaste

Voice-to-text application with real-time transcription and automatic clipboard integration.

Press Shift+V, speak, press Shift+V again - your text is ready to paste. ✨

✨ Features

🎯 Voice recording - Simple Shift+V hotkey to start/stop recording
📺 YouTube transcription - Press Shift+Y to transcribe YouTube videos from clipboard
📁 Local file transcription - Press Shift+F to transcribe audio/video files from clipboard
📚 Batch processing - Select and transcribe multiple files at once, each cached separately
📄 File Concatenator - Press Shift+K to concatenate source files for LLM context
⚡ Real-time transcription - Using OpenAI Whisper Turbo model
🚀 GPU acceleration - CUDA support for fast transcription (CPU fallback available)
📋 Automatic clipboard - Transcribed text instantly available for pasting
💾 Persistent caching - All transcriptions cached 24h, survives restarts (avoid re-processing)
🔔 System tray integration - Runs quietly in background with functional menu
🧠 Smart memory management - Auto-loads/unloads model to save GPU memory
🎧 Virtual audio support - Works with NVIDIA Broadcast, VB-Cable, Krisp, etc.
🌍 Cross-platform - Windows, Linux, macOS

🚀 Quick Start

⭐ Recommended: Windows with NVIDIA GPU for best performance

🪟 Windows

# Check Python version (must be 3.12)
python --version

# Install FFmpeg
winget install ffmpeg

# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python -m venv .venv
.venv\Scripts\activate

# Install dependencies (with CUDA for GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Run
python main.py

🐧 Linux

# Check Python version (must be 3.12)
python3.12 --version

# Install system dependencies
sudo apt update && sudo apt install ffmpeg portaudio19-dev python3-tk

# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python3.12 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Run
python main.py

🍎 macOS

# Check Python version (must be 3.12)
python3.12 --version

# Install system dependencies
brew install ffmpeg portaudio

# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python3.12 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run
python main.py

Usage:

Press Shift+V → speak → press Shift+V → text in clipboard ✅
Copy YouTube URL → press Shift+Y → video transcribed → text in clipboard ✅
Copy file or file path → press Shift+F → file transcribed → text in clipboard ✅
Copy folder path → Shift+K → copy file path → Shift+K → files concatenated → text in clipboard ✅

🎯 Usage

Basic

python main.py

🎤 Select microphone

List available devices:

python main.py --list-devices

Use specific device:

python main.py --device 6

🚀 Keep model in memory

python main.py --keep-model-loaded

🎮 GPU Profile

Choose quality settings based on your GPU:

# Standard profile (RTX 2080S 8GB) - default
python main.py --gpu-profile standard

# High-End profile (RTX 3090+ 24GB) - maximum quality
python main.py --gpu-profile high_end

Standard profile (8GB VRAM):

beam_size=5, patience=1.0, temperature=0.0
Good quality, fast transcription
Recommended for RTX 2060, 2070, 2080, 3060, 3070

High-End profile (24GB VRAM):

beam_size=10, patience=1.5, temperature=0.0
Maximum quality, same speed or faster (better beam search)
Recommended for RTX 3090, 4090, A5000, A6000

You can also change GPU profile from system tray menu.

🚪 Exit

Press Ctrl+C in terminal
Right-click tray icon → Exit

🔄 How it works

🎤 Voice Recording

🚀 Start application - tray icon appears in system tray
⌨️ Press Shift+V - recording starts (icon turns red)
🎤 Speak into microphone
⌨️ Press Shift+V again - recording stops
⏳ Wait for transcription (icon turns orange)
📋 Text automatically copied to clipboard
✨ Paste anywhere with Ctrl+V

📺 YouTube Transcription

📋 Copy YouTube video URL to clipboard
⌨️ Press Shift+Y - downloading starts (icon turns purple)
⏳ Wait for download and transcription (icon shows download arrow)
📝 Transcription automatically copied to clipboard
✨ Paste anywhere with Ctrl+V
💾 Transcription cached for 24h - next use instant, even after restart!

📁 Local File Transcription

Single file:

📋 Copy file from File Explorer (Ctrl+C on file) OR copy file path as text
⌨️ Press Shift+F - processing starts (icon turns orange)
⏳ Wait for audio extraction and transcription
📝 Transcription automatically copied to clipboard
✨ Paste anywhere with Ctrl+V
💾 Transcription cached for 24h - next use instant, even after restart!

Multiple files:

📋 Select and copy multiple files from File Explorer (Ctrl+C on multiple files)
⌨️ Press Shift+F - processing starts for all files
⏳ Each file processed and transcribed separately
📝 All transcriptions concatenated with filename headers
✨ Paste anywhere with Ctrl+V
💾 Each file cached separately for 24h - can use individually later, survives restarts!

Supported formats:

Audio: .mp3, .wav, .m4a, .flac, .ogg, .aac, .wma
Video: .mp4, .avi, .mkv, .mov, .wmv, .flv, .webm, .m4v

📄 File Concatenator (for LLM context)

Concatenate source code files from a folder into a single text for LLM context.

Hotkey method (Shift+K):

📋 Copy folder path to clipboard (e.g. C:\project\src)
⌨️ Press Shift+K - folder is set
📋 Copy any file path to clipboard (e.g. C:\anywhere\main.py)
⌨️ Press Shift+K - extension .py is added
📋 (Optional) Copy more files with different extensions → Shift+K for each
⏳ Wait 2 seconds - all files with selected extensions are concatenated
📋 Result automatically copied to clipboard

GUI method:

Right-click tray icon → "File Concatenator... (Shift+K)"
Select folder, enter extensions, click Run

CLI method:

python src/file_concatenator.py ./src .py .js .ts
python src/file_concatenator.py ./project .py --exclude dist build --output context.txt
python src/file_concatenator.py --gui

Output format:

################################################################################
# FILE: src/main.py
################################################################################
<file content>

################################################################################
# FILE: src/utils/helper.py
################################################################################
<file content>

Icon colors: 🟢 ready → 🔴 recording → 🟣 downloading → 🔵 processing → 🟢 ready

🎛️ Advanced Features

💾 Persistent Caching (24h)

All transcriptions are automatically cached for 24 hours:

📝 Survives restarts - Cache stored in ~/.voicepaste_cache.json
⚡ Instant results - Re-using cached transcription takes <1ms
🔄 Auto-cleanup - Expired entries removed automatically
📊 All types - YouTube, single files, batch files all cached equally

Example:

Day 1, 9:00 AM  → Process 5 lectures (30 mins total)
Day 1, 3:00 PM  → Reopen lecture 3 → Instant from cache ✓
Day 2, 8:00 AM  → Reboot PC, reopen lecture 1 → Still cached ✓
Day 2, 10:00 AM → Reopen all 5 → All instant from cache ✓
Day 3, 9:00 AM  → Cache expired, would re-process if needed

🧠 Smart Memory Management

Application uses intelligent 3-tier memory management by default:

⚡ Preloading: Model starts loading to VRAM when you start recording - ready by the time you finish speaking
🎮 VRAM (GPU): Model actively used on CUDA for transcription
💾 RAM (CPU): After 1 hour of inactivity, model moves from VRAM to RAM
💤 Disk: After 5 hours of inactivity, model fully unloaded from memory
🔄 Auto-recovery: Model automatically moves back to GPU when needed

Use --keep-model-loaded flag if:

🔁 You use the app frequently
💾 You have plenty of GPU memory
⚡ Speed is more important than memory usage

🎧 Audio Compatibility

Automatically adapts to your microphone:

🔍 Detects native sample rate (e.g. 48kHz for NVIDIA Broadcast)
📼 Records at native sample rate for maximum compatibility
🔄 Auto-resamples to 16kHz for Whisper processing
🎛️ Works with virtual audio devices (NVIDIA Broadcast, VB-Cable, Krisp, etc.)

🔧 Troubleshooting

⚠️ cuDNN / CUDA errors

Application automatically falls back to CPU if CUDA fails. Slower but works.

🎤 Microphone issues

python main.py --list-devices
python main.py --device <ID>

⏱️ "Recording too short, ignoring..."

Speak longer (minimum 1 second) or check if microphone is working.

🎬 FFmpeg not found (for YouTube transcription)

FFmpeg is a system dependency (not a Python package) and must be installed via system package manager.

Windows:

winget install ffmpeg

Linux:

sudo apt update && sudo apt install ffmpeg

macOS:

brew install ffmpeg

📦 PyAudio installation fails

Already included in Quick Start for each OS. If still fails:

Windows:

pip install pipwin
pipwin install pyaudio

Linux:

sudo apt update && sudo apt install portaudio19-dev python3-tk
pip install PyAudio

macOS:

brew install portaudio
pip install PyAudio

🐧 Linux Additional Requirements

For system tray icon support:

sudo apt-get install gir1.2-appindicator3-0.1 libappindicator3-1

🍎 macOS Additional Notes

System tray icon requires Pillow with ImageDraw support (included in requirements)
Global hotkeys work system-wide but may require accessibility permissions
Go to System Preferences → Security & Privacy → Privacy → Accessibility
Add Terminal or your Python interpreter to allowed applications

📄 License

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pylintrc		pylintrc
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎤 VoicePaste

✨ Features

🚀 Quick Start

🪟 Windows

🐧 Linux

🍎 macOS

🎯 Usage

Basic

🎤 Select microphone

🚀 Keep model in memory

🎮 GPU Profile

🚪 Exit

🔄 How it works

🎤 Voice Recording

📺 YouTube Transcription

📁 Local File Transcription

📄 File Concatenator (for LLM context)

🎛️ Advanced Features

💾 Persistent Caching (24h)

🧠 Smart Memory Management

🎧 Audio Compatibility

🔧 Troubleshooting

⚠️ cuDNN / CUDA errors

🎤 Microphone issues

⏱️ "Recording too short, ignoring..."

🎬 FFmpeg not found (for YouTube transcription)

📦 PyAudio installation fails

🐧 Linux Additional Requirements

🍎 macOS Additional Notes

📄 License

About

Uh oh!

Releases

Packages

Languages

License

dam2452/VoicePaste

Folders and files

Latest commit

History

Repository files navigation

🎤 VoicePaste

✨ Features

🚀 Quick Start

🪟 Windows

🐧 Linux

🍎 macOS

🎯 Usage

Basic

🎤 Select microphone

🚀 Keep model in memory

🎮 GPU Profile

🚪 Exit

🔄 How it works

🎤 Voice Recording

📺 YouTube Transcription

📁 Local File Transcription

📄 File Concatenator (for LLM context)

🎛️ Advanced Features

💾 Persistent Caching (24h)

🧠 Smart Memory Management

🎧 Audio Compatibility

🔧 Troubleshooting

⚠️ cuDNN / CUDA errors

🎤 Microphone issues

⏱️ "Recording too short, ignoring..."

🎬 FFmpeg not found (for YouTube transcription)

📦 PyAudio installation fails

🐧 Linux Additional Requirements

🍎 macOS Additional Notes

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages