A macOS menu bar application for AI-powered speech-to-text dictation and multimodal AI queries.
- Global hotkey (Option+Space) for instant voice-to-text
- Local processing using Whisper.cpp - no data sent to cloud
- Auto-paste transcription at cursor position
- Persistent history of all transcriptions
- Fast and accurate using optimized Whisper models
- Screenshot + Voice (Control+Space) - capture screen and ask questions
- Vision-language AI powered by Ollama (Llama 3.2 Vision, Qwen, Gemma, etc.)
- Multiple model support - choose from 11+ vision models
- Context-aware prompts for better accuracy
- Local inference - privacy-first design
- Menu bar integration with status indicators
- Launch at login option
- Searchable transcription history
- Configurable AI models
- Permission management UI
-
Clone the repository:
git clone https://github.com/YOUR_USERNAME/whisp.git cd whisp -
Run the installation script:
chmod +x install.sh ./install.sh
The script will:
- Check for dependencies (Xcode, Homebrew)
- Install Ollama (optional, for AI features)
- Build whisper.cpp
- Download Whisper models
- Build and install the app
-
Grant permissions: On first launch, macOS will ask for:
- Microphone access (required for recording)
- Accessibility access (required for auto-paste)
- Screen Recording (required for AI query screenshots)
Click to expand manual installation steps
- macOS 11.0 or later
- Xcode Command Line Tools
- Homebrew (recommended)
-
Install Xcode Command Line Tools:
xcode-select --install
-
Install Ollama (optional, for AI queries):
brew install ollama brew services start ollama # Download a vision model ollama pull llama3.2-vision:11b -
Build whisper.cpp:
cd whisper.cpp mkdir build cd build cmake .. -DGGML_METAL=ON cmake --build . --config Release cd ../..
-
Download Whisper model:
cd whisper.cpp/models bash ./download-ggml-model.sh base.en cd ../..
-
Build the macOS app:
cd SpeechToTextApp xcodebuild -scheme SpeechToTextApp -configuration Release -derivedDataPath build -
Install to Applications:
cp -R build/Build/Products/Release/SpeechToTextApp.app /Applications/Whisp.app open /Applications/Whisp.app
- Press Option+Space to start recording
- Speak your text
- Press Option+Space again to stop
- Text is automatically transcribed and pasted at your cursor
- Press Control+Space - screenshot is captured
- Speak your question about the screenshot
- Press Control+Space again to stop
- AI processes your question and displays the answer
Click the 🎤 icon in your menu bar to access:
- Show History - View all transcriptions and AI queries
- AI Settings - Configure which vision model to use
- Launch at Login - Auto-start Whisp
- Quit
Whisp supports multiple vision-language models via Ollama:
| Model | Size | Speed | Quality |
|---|---|---|---|
| llama3.2-vision:11b | ~7GB | Fast | Excellent |
| llama3.2-vision:90b | ~55GB | Slow | Best |
| qwen2-vl:2b | ~2GB | Very Fast | Good |
| qwen2-vl:7b | ~5GB | Fast | Excellent |
| minicpm-v:8b | ~5GB | Fast | Very Good |
To download additional models:
ollama pull <model-name>- Swift 6.1 - Modern Swift with concurrency support
- SwiftUI & AppKit - Native macOS UI
- whisper.cpp - High-performance speech recognition (C++)
- Ollama - Local LLM server for vision models
- Core Data - Persistent storage
- AVFoundation - Audio recording
- Carbon Events - Global hotkey registration
whisp/
├── SpeechToTextApp/ # Main macOS application
│ ├── AppDelegate.swift # App coordinator
│ ├── AudioRecorder.swift # Audio recording
│ ├── Transcriber.swift # Whisper integration
│ ├── GlobalHotKey.swift # Hotkey management
│ ├── AIQuery/ # AI query feature
│ │ ├── AIQueryManager.swift
│ │ ├── ScreenshotCapturer.swift
│ │ ├── OllamaProvider.swift
│ │ └── ModelManager.swift
│ └── History/ # History management
│ ├── HistoryManager.swift
│ └── HistoryView.swift
├── whisper.cpp/ # Embedded ASR library
├── install.sh # Installation script
└── README.md
- Local-first: Speech recognition runs entirely on-device using whisper.cpp
- No cloud: Dictation works completely offline
- Optional AI: Ollama runs locally on your Mac (localhost:11434)
- No telemetry: No usage data is collected or sent anywhere
- Open source: Full source code available for audit
- Go to System Settings → Privacy & Security → Microphone
- Enable access for Whisp
- Go to System Settings → Privacy & Security → Accessibility
- Enable access for Whisp
- Check Ollama is running:
brew services list | grep ollama - Restart Ollama:
brew services restart ollama - Verify model is installed:
ollama list - Grant Screen Recording permission in System Settings
- Ensure whisper.cpp is built:
ls whisper.cpp/build/bin/whisper-cli - Check model exists:
ls whisper.cpp/models/ggml-base.en.bin - Rebuild if needed: Follow manual installation steps
Contributions are welcome! Please feel free to submit pull requests or open issues.
- Clone the repository
- Open
SpeechToTextApp/SpeechToTextApp.xcodeprojin Xcode - Build and run (Cmd+R)
MIT License - see LICENSE file for details
If you find this project useful, please consider:
- Starring the repository ⭐
- Reporting bugs or suggesting features via Issues
- Contributing code via Pull Requests
- Sharing with others who might find it useful
Made with ❤️