Skip to content

dam2452/VoicePaste

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

20 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽค VoicePaste

logo

Voice-to-text application with real-time transcription and automatic clipboard integration.

Press Shift+V, speak, press Shift+V again - your text is ready to paste. โœจ

โœจ Features

  • ๐ŸŽฏ Voice recording - Simple Shift+V hotkey to start/stop recording
  • ๐Ÿ“บ YouTube transcription - Press Shift+Y to transcribe YouTube videos from clipboard
  • ๐Ÿ“ Local file transcription - Press Shift+F to transcribe audio/video files from clipboard
  • ๐Ÿ“š Batch processing - Select and transcribe multiple files at once, each cached separately
  • ๐Ÿ“„ File Concatenator - Press Shift+K to concatenate source files for LLM context
  • โšก Real-time transcription - Using OpenAI Whisper Turbo model
  • ๐Ÿš€ GPU acceleration - CUDA support for fast transcription (CPU fallback available)
  • ๐Ÿ“‹ Automatic clipboard - Transcribed text instantly available for pasting
  • ๐Ÿ’พ Persistent caching - All transcriptions cached 24h, survives restarts (avoid re-processing)
  • ๐Ÿ”” System tray integration - Runs quietly in background with functional menu
  • ๐Ÿง  Smart memory management - Auto-loads/unloads model to save GPU memory
  • ๐ŸŽง Virtual audio support - Works with NVIDIA Broadcast, VB-Cable, Krisp, etc.
  • ๐ŸŒ Cross-platform - Windows, Linux, macOS

๐Ÿš€ Quick Start

โญ Recommended: Windows with NVIDIA GPU for best performance

๐ŸชŸ Windows

# Check Python version (must be 3.12)
python --version

# Install FFmpeg
winget install ffmpeg

# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python -m venv .venv
.venv\Scripts\activate

# Install dependencies (with CUDA for GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Run
python main.py

๐Ÿง Linux

# Check Python version (must be 3.12)
python3.12 --version

# Install system dependencies
sudo apt update && sudo apt install ffmpeg portaudio19-dev python3-tk

# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python3.12 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Run
python main.py

๐ŸŽ macOS

# Check Python version (must be 3.12)
python3.12 --version

# Install system dependencies
brew install ffmpeg portaudio

# Setup project
git clone https://github.com/yourusername/VoicePaste.git
cd VoicePaste
python3.12 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Run
python main.py

Usage:

  • Press Shift+V โ†’ speak โ†’ press Shift+V โ†’ text in clipboard โœ…
  • Copy YouTube URL โ†’ press Shift+Y โ†’ video transcribed โ†’ text in clipboard โœ…
  • Copy file or file path โ†’ press Shift+F โ†’ file transcribed โ†’ text in clipboard โœ…
  • Copy folder path โ†’ Shift+K โ†’ copy file path โ†’ Shift+K โ†’ files concatenated โ†’ text in clipboard โœ…

๐ŸŽฏ Usage

Basic

python main.py

๐ŸŽค Select microphone

List available devices:

python main.py --list-devices

Use specific device:

python main.py --device 6

๐Ÿš€ Keep model in memory

python main.py --keep-model-loaded

๐ŸŽฎ GPU Profile

Choose quality settings based on your GPU:

# Standard profile (RTX 2080S 8GB) - default
python main.py --gpu-profile standard

# High-End profile (RTX 3090+ 24GB) - maximum quality
python main.py --gpu-profile high_end

Standard profile (8GB VRAM):

  • beam_size=5, patience=1.0, temperature=0.0
  • Good quality, fast transcription
  • Recommended for RTX 2060, 2070, 2080, 3060, 3070

High-End profile (24GB VRAM):

  • beam_size=10, patience=1.5, temperature=0.0
  • Maximum quality, same speed or faster (better beam search)
  • Recommended for RTX 3090, 4090, A5000, A6000

You can also change GPU profile from system tray menu.

๐Ÿšช Exit

  • Press Ctrl+C in terminal
  • Right-click tray icon โ†’ Exit

๐Ÿ”„ How it works

๐ŸŽค Voice Recording

  1. ๐Ÿš€ Start application - tray icon appears in system tray
  2. โŒจ๏ธ Press Shift+V - recording starts (icon turns red)
  3. ๐ŸŽค Speak into microphone
  4. โŒจ๏ธ Press Shift+V again - recording stops
  5. โณ Wait for transcription (icon turns orange)
  6. ๐Ÿ“‹ Text automatically copied to clipboard
  7. โœจ Paste anywhere with Ctrl+V

๐Ÿ“บ YouTube Transcription

  1. ๐Ÿ“‹ Copy YouTube video URL to clipboard
  2. โŒจ๏ธ Press Shift+Y - downloading starts (icon turns purple)
  3. โณ Wait for download and transcription (icon shows download arrow)
  4. ๐Ÿ“ Transcription automatically copied to clipboard
  5. โœจ Paste anywhere with Ctrl+V
  6. ๐Ÿ’พ Transcription cached for 24h - next use instant, even after restart!

๐Ÿ“ Local File Transcription

Single file:

  1. ๐Ÿ“‹ Copy file from File Explorer (Ctrl+C on file) OR copy file path as text
  2. โŒจ๏ธ Press Shift+F - processing starts (icon turns orange)
  3. โณ Wait for audio extraction and transcription
  4. ๐Ÿ“ Transcription automatically copied to clipboard
  5. โœจ Paste anywhere with Ctrl+V
  6. ๐Ÿ’พ Transcription cached for 24h - next use instant, even after restart!

Multiple files:

  1. ๐Ÿ“‹ Select and copy multiple files from File Explorer (Ctrl+C on multiple files)
  2. โŒจ๏ธ Press Shift+F - processing starts for all files
  3. โณ Each file processed and transcribed separately
  4. ๐Ÿ“ All transcriptions concatenated with filename headers
  5. โœจ Paste anywhere with Ctrl+V
  6. ๐Ÿ’พ Each file cached separately for 24h - can use individually later, survives restarts!

Supported formats:

  • Audio: .mp3, .wav, .m4a, .flac, .ogg, .aac, .wma
  • Video: .mp4, .avi, .mkv, .mov, .wmv, .flv, .webm, .m4v

๐Ÿ“„ File Concatenator (for LLM context)

Concatenate source code files from a folder into a single text for LLM context.

Hotkey method (Shift+K):

  1. ๐Ÿ“‹ Copy folder path to clipboard (e.g. C:\project\src)
  2. โŒจ๏ธ Press Shift+K - folder is set
  3. ๐Ÿ“‹ Copy any file path to clipboard (e.g. C:\anywhere\main.py)
  4. โŒจ๏ธ Press Shift+K - extension .py is added
  5. ๐Ÿ“‹ (Optional) Copy more files with different extensions โ†’ Shift+K for each
  6. โณ Wait 2 seconds - all files with selected extensions are concatenated
  7. ๐Ÿ“‹ Result automatically copied to clipboard

GUI method:

  • Right-click tray icon โ†’ "File Concatenator... (Shift+K)"
  • Select folder, enter extensions, click Run

CLI method:

python src/file_concatenator.py ./src .py .js .ts
python src/file_concatenator.py ./project .py --exclude dist build --output context.txt
python src/file_concatenator.py --gui

Output format:

################################################################################
# FILE: src/main.py
################################################################################
<file content>

################################################################################
# FILE: src/utils/helper.py
################################################################################
<file content>

Icon colors: ๐ŸŸข ready โ†’ ๐Ÿ”ด recording โ†’ ๐ŸŸฃ downloading โ†’ ๐Ÿ”ต processing โ†’ ๐ŸŸข ready

๐ŸŽ›๏ธ Advanced Features

๐Ÿ’พ Persistent Caching (24h)

All transcriptions are automatically cached for 24 hours:

  • ๐Ÿ“ Survives restarts - Cache stored in ~/.voicepaste_cache.json
  • โšก Instant results - Re-using cached transcription takes <1ms
  • ๐Ÿ”„ Auto-cleanup - Expired entries removed automatically
  • ๐Ÿ“Š All types - YouTube, single files, batch files all cached equally

Example:

Day 1, 9:00 AM  โ†’ Process 5 lectures (30 mins total)
Day 1, 3:00 PM  โ†’ Reopen lecture 3 โ†’ Instant from cache โœ“
Day 2, 8:00 AM  โ†’ Reboot PC, reopen lecture 1 โ†’ Still cached โœ“
Day 2, 10:00 AM โ†’ Reopen all 5 โ†’ All instant from cache โœ“
Day 3, 9:00 AM  โ†’ Cache expired, would re-process if needed

๐Ÿง  Smart Memory Management

Application uses intelligent 3-tier memory management by default:

  • โšก Preloading: Model starts loading to VRAM when you start recording - ready by the time you finish speaking
  • ๐ŸŽฎ VRAM (GPU): Model actively used on CUDA for transcription
  • ๐Ÿ’พ RAM (CPU): After 1 hour of inactivity, model moves from VRAM to RAM
  • ๐Ÿ’ค Disk: After 5 hours of inactivity, model fully unloaded from memory
  • ๐Ÿ”„ Auto-recovery: Model automatically moves back to GPU when needed

Use --keep-model-loaded flag if:

  • ๐Ÿ” You use the app frequently
  • ๐Ÿ’พ You have plenty of GPU memory
  • โšก Speed is more important than memory usage

๐ŸŽง Audio Compatibility

Automatically adapts to your microphone:

  • ๐Ÿ” Detects native sample rate (e.g. 48kHz for NVIDIA Broadcast)
  • ๐Ÿ“ผ Records at native sample rate for maximum compatibility
  • ๐Ÿ”„ Auto-resamples to 16kHz for Whisper processing
  • ๐ŸŽ›๏ธ Works with virtual audio devices (NVIDIA Broadcast, VB-Cable, Krisp, etc.)

๐Ÿ”ง Troubleshooting

โš ๏ธ cuDNN / CUDA errors

Application automatically falls back to CPU if CUDA fails. Slower but works.

๐ŸŽค Microphone issues

python main.py --list-devices
python main.py --device <ID>

โฑ๏ธ "Recording too short, ignoring..."

Speak longer (minimum 1 second) or check if microphone is working.

๐ŸŽฌ FFmpeg not found (for YouTube transcription)

FFmpeg is a system dependency (not a Python package) and must be installed via system package manager.

Windows:

winget install ffmpeg

Linux:

sudo apt update && sudo apt install ffmpeg

macOS:

brew install ffmpeg

๐Ÿ“ฆ PyAudio installation fails

Already included in Quick Start for each OS. If still fails:

Windows:

pip install pipwin
pipwin install pyaudio

Linux:

sudo apt update && sudo apt install portaudio19-dev python3-tk
pip install PyAudio

macOS:

brew install portaudio
pip install PyAudio

๐Ÿง Linux Additional Requirements

For system tray icon support:

sudo apt-get install gir1.2-appindicator3-0.1 libappindicator3-1

๐ŸŽ macOS Additional Notes

  • System tray icon requires Pillow with ImageDraw support (included in requirements)
  • Global hotkeys work system-wide but may require accessibility permissions
  • Go to System Preferences โ†’ Security & Privacy โ†’ Privacy โ†’ Accessibility
  • Add Terminal or your Python interpreter to allowed applications

๐Ÿ“„ License

MIT License - see LICENSE file for details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages