WhisperKit Android

WhisperKit Android brings Foundation Models On Device for Automatic Speech Recognition. It extends the performance and feature set of WhisperKit from Apple platforms to Android and Linux. The current feature set is a subset of the iOS counterpart, but we are continuing to invest in Android and now welcome contributions from the community.

[Example App] [Blog Post] [Python Tools Repo]

Getting Started

WhisperKit API is currently experimental and requires explicit opt-in using @OptIn(ExperimentalWhisperKit::class). This indicates that the API may change in future releases. Use with caution in production code.

To use WhisperKit in your Android app, you need to:

Add the following dependencies to your app's build.gradle.kts:

dependencies {
   // 1. WhisperKit SDK
   implementation("com.argmaxinc:whisperkit:0.3.2")  // Check badge above for latest version

   // 2. QNN dependencies for hardware acceleration
   implementation("com.qualcomm.qnn:qnn-runtime:2.34.0")
   implementation("com.qualcomm.qnn:qnn-litert-delegate:2.34.0")
}

Configure JNI library packaging in your app's build.gradle.kts:

android {
   // ...
   packaging {
      jniLibs {
         useLegacyPackaging = true
      }
   }
}

Use WhisperKit in your code:

@OptIn(ExperimentalWhisperKit::class)
class YourActivity : AppCompatActivity() {
   private lateinit var whisperKit: WhisperKit

   override fun onCreate(savedInstanceState: Bundle?) {
      super.onCreate(savedInstanceState)

      // Initialize WhisperKit
      // Note: Always use applicationContext to avoid memory leaks and ensure proper lifecycle management
      whisperKit = WhisperKit.Builder()
         .setModel(WhisperKit.OPENAI_TINY_EN)
         .setApplicationContext(applicationContext)
         .setCallback { what, result ->
            // Handle transcription output
            when (what) {
               WhisperKit.TextOutputCallback.MSG_INIT -> {
                  // Model initialized successfully
               }
               WhisperKit.TextOutputCallback.MSG_TEXT_OUT -> {
                  // New transcription available
                  val fullText = result.text
                  val segments = result.segments
                  // Process the transcribed text as it becomes available
                  // This callback will be called multiple times as more audio is processed
                  segments.forEach { segment ->
                     // Process each segment
                     val segmentText = segment.text
                  }
               }
               WhisperKit.TextOutputCallback.MSG_CLOSE -> {
                  // Cleanup complete
               }
            }
         }
         .build()

      // Load the model
      lifecycleScope.launch {
         whisperKit.loadModel().collect { progress ->
            // Handle download progress
         }

         // Initialize with audio parameters
         whisperKit.init(frequency = 16000, channels = 1, duration = 0)

         // Transcribe audio data in chunks
         // You can call transcribe() multiple times with different chunks of audio data
         // Results will be delivered through the callback as they become available
         val audioChunk1: ByteArray = // First chunk of audio data
            whisperKit.transcribe(audioChunk1)

         val audioChunk2: ByteArray = // Second chunk of audio data
            whisperKit.transcribe(audioChunk2)

         // Continue processing more chunks as needed...
      }
   }

   override fun onDestroy() {
      super.onDestroy()
      whisperKit.deinitialize()
   }
}

Note: The WhisperKit API is currently experimental and may change in future releases. Make sure to handle the @OptIn(ExperimentalWhisperKit::class) annotation appropriately in your code.

Build From Source

(Click to expand)

The following setup was tested on macOS 15.1.

Common Setup Steps

These steps are required for both Android app development and CLI:

Install required build tools:

make setup

Build development environment in Docker with all development tools:

make env

The first time running make env command will take several minutes. After the Docker image builds, the next time running make env will execute inside the Docker container right away.

If you need to rebuild the Docker image:

make rebuild-env

Android App Development Path

Build and enter the Docker environment:

make env

Build the required native libraries:

make build jni

Open the Android project in Android Studio:
- Open the root project in Android Studio
- Navigate to android/examples/WhisperAX
- Build and run the app

CLI Development Path

Build and enter the Docker environment:

make env

Build the CLI app:

make build [linux | qnn | gpu]

linux: CPU-only build for Linux
qnn: Android build with Qualcomm NPU support
gpu: Android build with GPU support

Push dependencies to Android device (skip for Linux):

make adb-push

Run the CLI app:

For Android:

adb shell
cd /sdcard/argmax/tflite
export PATH=/data/local/tmp/bin:$PATH
export LD_LIBRARY_PATH=/data/local/tmp/lib
whisperkit-cli transcribe --model-path /path/to/openai_whisper-base --audio-path /path/to/inputs/jfk_441khz.m4a

For Linux:

./build/linux/whisperkit-cli transcribe --model-path /path/to/my/whisper_model --audio-path /path/to/my/audio_file.m4a --report --report-path /path/to/dump/report.json

For all options, run whisperkit-cli --help

Clean build files when needed:

make clean [all]

With all option, it will conduct deep clean including open source components.

Contributing

We are actively developing the project and welcome contributions from the community.

Development Setup

Installing clang-format

The project uses clang-format 14 for C++ code formatting. You'll need to install it based on your operating system:

macOS

# Install LLVM 14
brew install llvm@14

# Create symlink for clang-format
sudo ln -sf /opt/homebrew/opt/llvm@14/bin/clang-format /opt/homebrew/bin/clang-format

Linux (Ubuntu/Debian)

sudo apt update
sudo apt install -y clang-format-14
sudo ln -sf /usr/bin/clang-format-14 /usr/local/bin/clang-format

Verify the installation:

clang-format --version

To check C++ code formatting:

./gradlew spotlessCppCheck

Git Hooks

This project uses Git hooks to maintain code quality. These hooks help ensure consistent code formatting and quality standards.

Setup

To use the Git hooks, run the following command in your repository root:

git config core.hooksPath .githooks

Available Hooks

pre-commit

Runs spotlessApply to automatically fix code formatting issues
If formatting fixes are applied, they are automatically staged
The commit will be blocked if spotlessApply fails

Troubleshooting

If you need to bypass the hooks temporarily (not recommended), you can use:

git commit --no-verify

License

We release WhisperKit Android under MIT License.
FFmpeg open-source (audio decompressing) is released under LGPL
OpenAI Whisper model open-source checkpoints were released under the MIT License.
Qualcomm AI Hub .tflite models and QNN libraries for NPU deployment are released under the Qualcomm AI Model & Software License.

Citation

If you use WhisperKit for something cool or just find it useful, please drop us a note at [email protected]!

If you are looking for managed enterprise deployment with Argmax, please drop us a note at [email protected].

If you use WhisperKit for academic work, here is the BibTeX:

@misc{whisperkit-argmax,
   title = {WhisperKit},
   author = {Argmax, Inc.},
   year = {2024},
   URL = {https://github.com/argmaxinc/WhisperKitAndroid}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.githooks		.githooks
.github		.github
android		android
cli		cli
cpp		cpp
gradle		gradle
jni		jni
scripts		scripts
test		test
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WhisperKit Android

Table of Contents

Getting Started

Build From Source

Common Setup Steps

Android App Development Path

CLI Development Path

Contributing

Development Setup

Installing clang-format

macOS

Linux (Ubuntu/Debian)

Git Hooks

Setup

Available Hooks

pre-commit

Troubleshooting

License

Citation

About

Uh oh!

Releases 2

Packages

Contributors 13

Uh oh!

Languages

License

argmaxinc/WhisperKitAndroid

Folders and files

Latest commit

History

Repository files navigation

WhisperKit Android

Table of Contents

Getting Started

Build From Source

Common Setup Steps

Android App Development Path

CLI Development Path

Contributing

Development Setup

Installing clang-format

macOS

Linux (Ubuntu/Debian)

Git Hooks

Setup

Available Hooks

pre-commit

Troubleshooting

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 13

Uh oh!

Languages

Packages