Skip to content

Free, open‑source, offline speech-to-text and on-device assistant. Converts audio to text and responds to questions about the your current view by inspecting a screenshot locally. Your data never leaves your device.

License

Notifications You must be signed in to change notification settings

deeptivchopra/whisp

Repository files navigation

Whisp 🎤

A macOS menu bar application for AI-powered speech-to-text dictation and multimodal AI queries.

macOS Swift License

Features

🎙️ Voice Dictation

  • Global hotkey (Option+Space) for instant voice-to-text
  • Local processing using Whisper.cpp - no data sent to cloud
  • Auto-paste transcription at cursor position
  • Persistent history of all transcriptions
  • Fast and accurate using optimized Whisper models

🤖 AI Multimodal Query

  • Screenshot + Voice (Control+Space) - capture screen and ask questions
  • Vision-language AI powered by Ollama (Llama 3.2 Vision, Qwen, Gemma, etc.)
  • Multiple model support - choose from 11+ vision models
  • Context-aware prompts for better accuracy
  • Local inference - privacy-first design

📋 Additional Features

  • Menu bar integration with status indicators
  • Launch at login option
  • Searchable transcription history
  • Configurable AI models
  • Permission management UI

Installation

Quick Install (Automated)

  1. Clone the repository:

    git clone https://github.com/YOUR_USERNAME/whisp.git
    cd whisp
  2. Run the installation script:

    chmod +x install.sh
    ./install.sh

    The script will:

    • Check for dependencies (Xcode, Homebrew)
    • Install Ollama (optional, for AI features)
    • Build whisper.cpp
    • Download Whisper models
    • Build and install the app
  3. Grant permissions: On first launch, macOS will ask for:

    • Microphone access (required for recording)
    • Accessibility access (required for auto-paste)
    • Screen Recording (required for AI query screenshots)

Manual Install

Click to expand manual installation steps

Prerequisites

  • macOS 11.0 or later
  • Xcode Command Line Tools
  • Homebrew (recommended)

Steps

  1. Install Xcode Command Line Tools:

    xcode-select --install
  2. Install Ollama (optional, for AI queries):

    brew install ollama
    brew services start ollama
    
    # Download a vision model
    ollama pull llama3.2-vision:11b
  3. Build whisper.cpp:

    cd whisper.cpp
    mkdir build
    cd build
    cmake .. -DGGML_METAL=ON
    cmake --build . --config Release
    cd ../..
  4. Download Whisper model:

    cd whisper.cpp/models
    bash ./download-ggml-model.sh base.en
    cd ../..
  5. Build the macOS app:

    cd SpeechToTextApp
    xcodebuild -scheme SpeechToTextApp -configuration Release -derivedDataPath build
  6. Install to Applications:

    cp -R build/Build/Products/Release/SpeechToTextApp.app /Applications/Whisp.app
    open /Applications/Whisp.app

Usage

Voice Dictation

  1. Press Option+Space to start recording
  2. Speak your text
  3. Press Option+Space again to stop
  4. Text is automatically transcribed and pasted at your cursor

AI Multimodal Query

  1. Press Control+Space - screenshot is captured
  2. Speak your question about the screenshot
  3. Press Control+Space again to stop
  4. AI processes your question and displays the answer

Menu Bar Options

Click the 🎤 icon in your menu bar to access:

  • Show History - View all transcriptions and AI queries
  • AI Settings - Configure which vision model to use
  • Launch at Login - Auto-start Whisp
  • Quit

Supported AI Models

Whisp supports multiple vision-language models via Ollama:

Model Size Speed Quality
llama3.2-vision:11b ~7GB Fast Excellent
llama3.2-vision:90b ~55GB Slow Best
qwen2-vl:2b ~2GB Very Fast Good
qwen2-vl:7b ~5GB Fast Excellent
minicpm-v:8b ~5GB Fast Very Good

To download additional models:

ollama pull <model-name>

Architecture

Technologies

  • Swift 6.1 - Modern Swift with concurrency support
  • SwiftUI & AppKit - Native macOS UI
  • whisper.cpp - High-performance speech recognition (C++)
  • Ollama - Local LLM server for vision models
  • Core Data - Persistent storage
  • AVFoundation - Audio recording
  • Carbon Events - Global hotkey registration

Project Structure

whisp/
├── SpeechToTextApp/           # Main macOS application
│   ├── AppDelegate.swift      # App coordinator
│   ├── AudioRecorder.swift    # Audio recording
│   ├── Transcriber.swift      # Whisper integration
│   ├── GlobalHotKey.swift     # Hotkey management
│   ├── AIQuery/               # AI query feature
│   │   ├── AIQueryManager.swift
│   │   ├── ScreenshotCapturer.swift
│   │   ├── OllamaProvider.swift
│   │   └── ModelManager.swift
│   └── History/               # History management
│       ├── HistoryManager.swift
│       └── HistoryView.swift
├── whisper.cpp/               # Embedded ASR library
├── install.sh                 # Installation script
└── README.md

Privacy & Security

  • Local-first: Speech recognition runs entirely on-device using whisper.cpp
  • No cloud: Dictation works completely offline
  • Optional AI: Ollama runs locally on your Mac (localhost:11434)
  • No telemetry: No usage data is collected or sent anywhere
  • Open source: Full source code available for audit

Troubleshooting

Microphone not working

  1. Go to System Settings → Privacy & Security → Microphone
  2. Enable access for Whisp

Auto-paste not working

  1. Go to System Settings → Privacy & Security → Accessibility
  2. Enable access for Whisp

AI queries failing

  1. Check Ollama is running: brew services list | grep ollama
  2. Restart Ollama: brew services restart ollama
  3. Verify model is installed: ollama list
  4. Grant Screen Recording permission in System Settings

Whisper transcription errors

  1. Ensure whisper.cpp is built: ls whisper.cpp/build/bin/whisper-cli
  2. Check model exists: ls whisper.cpp/models/ggml-base.en.bin
  3. Rebuild if needed: Follow manual installation steps

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues.

Development Setup

  1. Clone the repository
  2. Open SpeechToTextApp/SpeechToTextApp.xcodeproj in Xcode
  3. Build and run (Cmd+R)

License

MIT License - see LICENSE file for details

Support

If you find this project useful, please consider:

  • Starring the repository ⭐
  • Reporting bugs or suggesting features via Issues
  • Contributing code via Pull Requests
  • Sharing with others who might find it useful

Made with ❤️

About

Free, open‑source, offline speech-to-text and on-device assistant. Converts audio to text and responds to questions about the your current view by inspecting a screenshot locally. Your data never leaves your device.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published