|
| 1 | +--- |
| 2 | +title: "Transcribe Your Zoom Meetings" |
| 3 | +--- |
| 4 | + |
| 5 | +This guide creates a Node.js service that captures audio from Zoom Real-Time Media Streams (RTMS) and provides both real-time and asynchronous transcription using AssemblyAI. |
| 6 | + |
| 7 | +<Note title="Zoom RTMS Documentation"> |
| 8 | + For complete Zoom RTMS documentation, visit |
| 9 | + https://developers.zoom.us/docs/rtms/ |
| 10 | +</Note> |
| 11 | + |
| 12 | +## Features |
| 13 | + |
| 14 | +- **Real-time Transcription**: Live transcription during meetings using AssemblyAI's streaming API |
| 15 | +- **Asynchronous Transcription**: Complete post-meeting transcription with advanced features |
| 16 | +- **Flexible Audio Modes**: |
| 17 | + - Mixed stream (all participants combined) |
| 18 | + - Individual participant streams transcribed |
| 19 | +- **Multichannel Audio Support**: Separate channels for different participants |
| 20 | +- **Configurable Processing**: Enable/disable real-time or async transcription independently |
| 21 | + |
| 22 | +## Setup |
| 23 | + |
| 24 | +### Prerequisites |
| 25 | + |
| 26 | +- Node.js 16+ |
| 27 | +- FFmpeg installed on your system |
| 28 | +- Zoom RTMS Developer Preview access |
| 29 | +- AssemblyAI API key |
| 30 | +- ngrok (for local development and testing) |
| 31 | + |
| 32 | +### Installation |
| 33 | + |
| 34 | +1. **Clone the example repository and install dependencies**: |
| 35 | + |
| 36 | +```bash |
| 37 | +git clone https://github.com/zkleb-aai/assemblyai-zoom-rtms.git |
| 38 | +cd assemblyai-zoom-rtms |
| 39 | +npm install |
| 40 | +``` |
| 41 | + |
| 42 | +2. **Configure environment variables**: |
| 43 | + |
| 44 | +```bash |
| 45 | +cp .env.example .env |
| 46 | +``` |
| 47 | + |
| 48 | +Fill in your `.env` file: |
| 49 | + |
| 50 | +```env |
| 51 | +# Zoom Configuration |
| 52 | +ZM_CLIENT_ID=your_zoom_client_id |
| 53 | +ZM_CLIENT_SECRET=your_zoom_client_secret |
| 54 | +ZOOM_SECRET_TOKEN=your_webhook_secret_token |
| 55 | +
|
| 56 | +# AssemblyAI Configuration |
| 57 | +ASSEMBLYAI_API_KEY=your_assemblyai_api_key |
| 58 | +
|
| 59 | +# Service Configuration |
| 60 | +PORT=8080 |
| 61 | +REALTIME_ENABLED=true |
| 62 | +REALTIME_MODE=mixed |
| 63 | +ASYNC_ENABLED=true |
| 64 | +AUDIO_CHANNELS=mono |
| 65 | +AUDIO_SAMPLE_RATE=16000 |
| 66 | +TARGET_CHUNK_DURATION_MS=100 |
| 67 | +``` |
| 68 | + |
| 69 | +### Local development with ngrok |
| 70 | + |
| 71 | +For testing and development, you can use ngrok to expose your local server to the internet: |
| 72 | + |
| 73 | +1. **Install ngrok**: Download from [ngrok.com](https://ngrok.com/) or install via package manager: |
| 74 | + |
| 75 | + ```bash |
| 76 | + # macOS |
| 77 | + brew install ngrok |
| 78 | + |
| 79 | + # Windows (chocolatey) |
| 80 | + choco install ngrok |
| 81 | + |
| 82 | + # Or download directly from ngrok.com |
| 83 | + ``` |
| 84 | + |
| 85 | +2. **Start your local server**: |
| 86 | + |
| 87 | + ```bash |
| 88 | + npm start |
| 89 | + ``` |
| 90 | + |
| 91 | +3. **In a separate terminal, start ngrok**: |
| 92 | + |
| 93 | + ```bash |
| 94 | + ngrok http 8080 |
| 95 | + ``` |
| 96 | + |
| 97 | +4. **Copy the ngrok URL**: ngrok will display a forwarding URL like: |
| 98 | + |
| 99 | + ``` |
| 100 | + Forwarding https://example-abc123.ngrok-free.app -> http://localhost:8080 |
| 101 | + ``` |
| 102 | + |
| 103 | +5. **Use the ngrok URL in your Zoom app webhook configuration**: |
| 104 | + ``` |
| 105 | + https://example-abc123.ngrok-free.app/webhook |
| 106 | + ``` |
| 107 | + |
| 108 | +### Configuration options |
| 109 | + |
| 110 | +#### Real-time transcription |
| 111 | + |
| 112 | +- `REALTIME_ENABLED`: Enable/disable live transcription (default: `true`) |
| 113 | +- `REALTIME_MODE`: |
| 114 | + - `mixed`: Single stream with all participants combined |
| 115 | + - `individual`: Separate streams per participant |
| 116 | + |
| 117 | +#### Audio settings |
| 118 | + |
| 119 | +- `AUDIO_CHANNELS`: `mono` or `multichannel` |
| 120 | +- `AUDIO_SAMPLE_RATE`: Audio sample rate in Hz (default: `16000`) |
| 121 | +- `TARGET_CHUNK_DURATION_MS`: Audio chunk duration for streaming (default: `100`) |
| 122 | + |
| 123 | +#### Async transcription |
| 124 | + |
| 125 | +- `ASYNC_ENABLED`: Enable/disable post-meeting transcription (default: `true`) |
| 126 | + |
| 127 | +## Usage |
| 128 | + |
| 129 | +### Start the service |
| 130 | + |
| 131 | +```bash |
| 132 | +npm start |
| 133 | +``` |
| 134 | + |
| 135 | +The service will start on the configured port (default: 8080) and display: |
| 136 | + |
| 137 | +``` |
| 138 | +🎧 Zoom RTMS to AssemblyAI Transcription Service |
| 139 | +📋 Configuration: |
| 140 | + Real-time: ✅ (mixed) |
| 141 | + Audio: mono @ 16000Hz |
| 142 | + Async: ✅ |
| 143 | +🚀 Server running on port 8080 |
| 144 | +📡 Webhook endpoint: http://localhost:8080/webhook |
| 145 | +``` |
| 146 | + |
| 147 | +### Configure Zoom webhook |
| 148 | + |
| 149 | +1. In your Zoom App configuration, set the webhook endpoint to: |
| 150 | + |
| 151 | + ``` |
| 152 | + # For production |
| 153 | + https://your-domain.com/webhook |
| 154 | +
|
| 155 | + # For local development with ngrok |
| 156 | + https://example-abc123.ngrok-free.app/webhook |
| 157 | + ``` |
| 158 | + |
| 159 | +2. Subscribe to these events: |
| 160 | + - `meeting.rtms_started` |
| 161 | + - `meeting.rtms_stopped` |
| 162 | + |
| 163 | +### Testing with ngrok |
| 164 | + |
| 165 | +When using ngrok for testing: |
| 166 | + |
| 167 | +1. **Keep ngrok running**: The ngrok tunnel must remain active during testing |
| 168 | +2. **Update webhook URL**: If you restart ngrok, you'll get a new URL that needs to be updated in your Zoom app configuration |
| 169 | +3. **Monitor ngrok logs**: ngrok shows incoming webhook requests in its terminal output |
| 170 | +4. **Free tier limitations**: The free ngrok tier has some limitations; consider upgrading for heavy testing |
| 171 | + |
| 172 | +### Real-time output |
| 173 | + |
| 174 | +During meetings, you'll see live transcription: |
| 175 | + |
| 176 | +``` |
| 177 | +🚀 AssemblyAI session started: [abc12345] |
| 178 | +🎙️ [abc12345] Hello everyone, welcome to the meeting |
| 179 | +📝 [abc12345] FINAL: Hello everyone, welcome to the meeting. |
| 180 | +``` |
| 181 | + |
| 182 | +### Post-meeting files |
| 183 | + |
| 184 | +After each meeting, the service generates: |
| 185 | + |
| 186 | +- `transcript_[meeting_uuid].json` - Full AssemblyAI response with metadata |
| 187 | +- `transcript_[meeting_uuid].txt` - Plain text transcript |
| 188 | + |
| 189 | +## Advanced configuration |
| 190 | + |
| 191 | +### AssemblyAI features |
| 192 | + |
| 193 | +Modify the `ASYNC_CONFIG` object in the code to enable additional features: |
| 194 | + |
| 195 | +```javascript |
| 196 | +const ASYNC_CONFIG = { |
| 197 | + speaker_labels: true, // Speaker identification |
| 198 | + auto_chapters: true, // Automatic chapter detection |
| 199 | + sentiment_analysis: true, // Sentiment analysis |
| 200 | + entity_detection: true, // Named entity recognition |
| 201 | + redact_pii: true, // PII redaction |
| 202 | + summarization: true, // Auto-summarization |
| 203 | + auto_highlights: true, // Key highlights |
| 204 | +}; |
| 205 | +``` |
| 206 | + |
| 207 | +See [AssemblyAI's API documentation](https://www.assemblyai.com/docs/api-reference/transcripts/submit) for all available options. |
| 208 | + |
| 209 | +### Audio processing modes |
| 210 | + |
| 211 | +#### Mixed mode (default) |
| 212 | + |
| 213 | +- Single audio stream combining all participants |
| 214 | +- Most efficient for general transcription |
| 215 | +- Best for meetings with clear speakers |
| 216 | + |
| 217 | +#### Individual mode |
| 218 | + |
| 219 | +- Separate transcription stream per participant |
| 220 | +- Better speaker attribution |
| 221 | +- Higher resource usage |
| 222 | + |
| 223 | +#### Multichannel audio |
| 224 | + |
| 225 | +- Separate audio channels for different participants |
| 226 | +- Enables advanced speaker separation |
| 227 | +- Requires `AUDIO_CHANNELS=multichannel` |
| 228 | + |
| 229 | +## API endpoints |
| 230 | + |
| 231 | +### `POST` /webhook |
| 232 | + |
| 233 | +Handles Zoom RTMS webhook events: |
| 234 | + |
| 235 | +- URL validation |
| 236 | +- Meeting start/stop events |
| 237 | +- Automatic RTMS connection setup |
| 238 | + |
| 239 | +## Error handling |
| 240 | + |
| 241 | +The service includes comprehensive error handling: |
| 242 | + |
| 243 | +- Automatic reconnection for dropped connections |
| 244 | +- Graceful cleanup on meeting end |
| 245 | +- Audio buffer flushing to prevent data loss |
| 246 | +- Temporary file cleanup |
| 247 | + |
| 248 | +## Monitoring |
| 249 | + |
| 250 | +### Real-time logs |
| 251 | + |
| 252 | +- Connection status updates |
| 253 | +- Audio processing statistics |
| 254 | +- Transcription progress |
| 255 | +- Error notifications |
| 256 | + |
| 257 | +### Example log output |
| 258 | + |
| 259 | +``` |
| 260 | +📡 Connecting to Zoom signaling for meeting abc123 |
| 261 | +✅ Zoom signaling connected for meeting abc123 |
| 262 | +🎵 Connecting to Zoom media for meeting abc123 |
| 263 | +✅ Zoom media connected for meeting abc123 |
| 264 | +🚀 Started audio streaming for meeting abc123 |
| 265 | +🎵 [abc12345] 100 chunks, 32768 bytes, 10.2s |
| 266 | +📝 [abc12345] FINAL: This is the final transcription. |
| 267 | +``` |
| 268 | + |
| 269 | +### Development workflow |
| 270 | + |
| 271 | +1. Start your local server: `npm start` |
| 272 | +2. Start ngrok in another terminal: `ngrok http 8080` |
| 273 | +3. Update your Zoom app webhook URL with the ngrok URL |
| 274 | +4. Test with Zoom meetings |
| 275 | +5. Monitor logs in both your app and ngrok terminals |
0 commit comments