Whisp 🎤

A macOS menu bar application for AI-powered speech-to-text dictation and multimodal AI queries.

Features

🎙️ Voice Dictation

Global hotkey (Option+Space) for instant voice-to-text
Local processing using Whisper.cpp - no data sent to cloud
Auto-paste transcription at cursor position
Persistent history of all transcriptions
Fast and accurate using optimized Whisper models

🤖 AI Multimodal Query

Screenshot + Voice (Control+Space) - capture screen and ask questions
Vision-language AI powered by Ollama (Llama 3.2 Vision, Qwen, Gemma, etc.)
Multiple model support - choose from 11+ vision models
Context-aware prompts for better accuracy
Local inference - privacy-first design

📋 Additional Features

Menu bar integration with status indicators
Launch at login option
Searchable transcription history
Configurable AI models
Permission management UI

Installation

Quick Install (Automated)

Clone the repository:

git clone https://github.com/YOUR_USERNAME/whisp.git
cd whisp

Run the installation script:
```
chmod +x install.sh
./install.sh
```
The script will:
- Check for dependencies (Xcode, Homebrew)
- Install Ollama (optional, for AI features)
- Build whisper.cpp
- Download Whisper models
- Build and install the app
Grant permissions: On first launch, macOS will ask for:
- Microphone access (required for recording)
- Accessibility access (required for auto-paste)
- Screen Recording (required for AI query screenshots)

Manual Install

Click to expand manual installation steps

Prerequisites

macOS 11.0 or later
Xcode Command Line Tools
Homebrew (recommended)

Steps

Install Xcode Command Line Tools:
```
xcode-select --install
```

Install Ollama (optional, for AI queries):

brew install ollama
brew services start ollama

# Download a vision model
ollama pull llama3.2-vision:11b

Build whisper.cpp:

cd whisper.cpp
mkdir build
cd build
cmake .. -DGGML_METAL=ON
cmake --build . --config Release
cd ../..

Download Whisper model:

cd whisper.cpp/models
bash ./download-ggml-model.sh base.en
cd ../..

Build the macOS app:

cd SpeechToTextApp
xcodebuild -scheme SpeechToTextApp -configuration Release -derivedDataPath build

Install to Applications:

cp -R build/Build/Products/Release/SpeechToTextApp.app /Applications/Whisp.app
open /Applications/Whisp.app

Usage

Voice Dictation

Press Option+Space to start recording
Speak your text
Press Option+Space again to stop
Text is automatically transcribed and pasted at your cursor

AI Multimodal Query

Press Control+Space - screenshot is captured
Speak your question about the screenshot
Press Control+Space again to stop
AI processes your question and displays the answer

Menu Bar Options

Click the 🎤 icon in your menu bar to access:

Show History - View all transcriptions and AI queries
AI Settings - Configure which vision model to use
Launch at Login - Auto-start Whisp
Quit

Supported AI Models

Whisp supports multiple vision-language models via Ollama:

Model	Size	Speed	Quality
llama3.2-vision:11b	~7GB	Fast	Excellent
llama3.2-vision:90b	~55GB	Slow	Best
qwen2-vl:2b	~2GB	Very Fast	Good
qwen2-vl:7b	~5GB	Fast	Excellent
minicpm-v:8b	~5GB	Fast	Very Good

To download additional models:

ollama pull <model-name>

Architecture

Technologies

Swift 6.1 - Modern Swift with concurrency support
SwiftUI & AppKit - Native macOS UI
whisper.cpp - High-performance speech recognition (C++)
Ollama - Local LLM server for vision models
Core Data - Persistent storage
AVFoundation - Audio recording
Carbon Events - Global hotkey registration

Project Structure

whisp/
├── SpeechToTextApp/           # Main macOS application
│   ├── AppDelegate.swift      # App coordinator
│   ├── AudioRecorder.swift    # Audio recording
│   ├── Transcriber.swift      # Whisper integration
│   ├── GlobalHotKey.swift     # Hotkey management
│   ├── AIQuery/               # AI query feature
│   │   ├── AIQueryManager.swift
│   │   ├── ScreenshotCapturer.swift
│   │   ├── OllamaProvider.swift
│   │   └── ModelManager.swift
│   └── History/               # History management
│       ├── HistoryManager.swift
│       └── HistoryView.swift
├── whisper.cpp/               # Embedded ASR library
├── install.sh                 # Installation script
└── README.md

Privacy & Security

Local-first: Speech recognition runs entirely on-device using whisper.cpp
No cloud: Dictation works completely offline
Optional AI: Ollama runs locally on your Mac (localhost:11434)
No telemetry: No usage data is collected or sent anywhere
Open source: Full source code available for audit

Troubleshooting

Microphone not working

Go to System Settings → Privacy & Security → Microphone
Enable access for Whisp

Auto-paste not working

Go to System Settings → Privacy & Security → Accessibility
Enable access for Whisp

AI queries failing

Check Ollama is running: brew services list | grep ollama
Restart Ollama: brew services restart ollama
Verify model is installed: ollama list
Grant Screen Recording permission in System Settings

Whisper transcription errors

Ensure whisper.cpp is built: ls whisper.cpp/build/bin/whisper-cli
Check model exists: ls whisper.cpp/models/ggml-base.en.bin
Rebuild if needed: Follow manual installation steps

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

Development Setup

Clone the repository
Open SpeechToTextApp/SpeechToTextApp.xcodeproj in Xcode
Build and run (Cmd+R)

License

MIT License - see LICENSE file for details

Support

If you find this project useful, please consider:

Starring the repository ⭐
Reporting bugs or suggesting features via Issues
Contributing code via Pull Requests
Sharing with others who might find it useful

Made with ❤️

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
SpeechToTextApp		SpeechToTextApp
whisper.cpp		whisper.cpp
.gitignore		.gitignore
AI_QUERY_IMPLEMENTATION_PLAN.md		AI_QUERY_IMPLEMENTATION_PLAN.md
LICENSE		LICENSE
README.md		README.md
add_icon_padding.sh		add_icon_padding.sh
add_icon_padding_sips.sh		add_icon_padding_sips.sh
generate_app_icon.sh		generate_app_icon.sh
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whisp 🎤

Features

🎙️ Voice Dictation

🤖 AI Multimodal Query

📋 Additional Features

Installation

Quick Install (Automated)

Manual Install

Prerequisites

Steps

Usage

Voice Dictation

AI Multimodal Query

Menu Bar Options

Supported AI Models

Architecture

Technologies

Project Structure

Privacy & Security

Troubleshooting

Microphone not working

Auto-paste not working

AI queries failing

Whisper transcription errors

Contributing

Development Setup

License

Support

About

Uh oh!

Releases

Packages

Languages

License

deeptivchopra/whisp

Folders and files

Latest commit

History

Repository files navigation

Whisp 🎤

Features

🎙️ Voice Dictation

🤖 AI Multimodal Query

📋 Additional Features

Installation

Quick Install (Automated)

Manual Install

Prerequisites

Steps

Usage

Voice Dictation

AI Multimodal Query

Menu Bar Options

Supported AI Models

Architecture

Technologies

Project Structure

Privacy & Security

Troubleshooting

Microphone not working

Auto-paste not working

AI queries failing

Whisper transcription errors

Contributing

Development Setup

License

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages