Skip to content

Update the Slam-1 page with information on the prompt param #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 22, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
195 changes: 28 additions & 167 deletions fern/pages/01-getting-started/slam-1.mdx
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
---
title: "How to get started using Slam-1"
subtitle: "Learn how to transcribe using Slam-1."
subtitle: "Learn how to transcribe pre-recorded audio using Slam-1."
hide-nav-links: true
description: "Learn how to transcribe prerecorded audio using Slam-1."
---

## Overview

Slam-1 represents a fundamental shift in speech recognition technology. By combining LLM architecture with our best-in-class ASR encoders, we've created the world's first Speech Language Model optimized explicitly for speech-to-text tasks.
Slam-1 is our new Speech Language Model that combines LLM architecture with ASR encoders for superior speech-to-text transcription. This model delivers unprecedented accuracy through its understanding of context and semantic meaning. Check out our [Slam-1 blog post](https://www.assemblyai.com/blog/slam-1-public-beta) to learn more about this new model!

This innovative approach moves beyond traditional speech recognition to deliver LLM-powered transcription with unprecedented accuracy and capabilities.
## Quick Start

Check warning on line 12 in fern/pages/01-getting-started/slam-1.mdx

View workflow job for this annotation

GitHub Actions / lint

[vale] reported by reviewdog 🐶 [AssemblyAI.Headings] Use sentence-style capitalization for 'Quick Start'. Raw Output: {"message": "[AssemblyAI.Headings] Use sentence-style capitalization for 'Quick Start'. ", "location": {"path": "fern/pages/01-getting-started/slam-1.mdx", "range": {"start": {"line": 12, "column": 4}}}, "severity": "WARNING"}

## How to get started

A beta version of Slam-1 is currently live in Production so you will make requests to the `https://api.assemblyai.com/v2/transcript` endpoint using your current API key similar to how you currently use the API. The only change you need to make is to include the `speech_model` parameter with a value of `"slam-1"` as shown in the code examples below.
Slam-1 is available in beta through our standard API endpoint. To use it:
1. Make requests to https://api.assemblyai.com/v2/transcript with your API key
2. Add the `speech_model` parameter with value "slam-1"

<Tabs groupId="language">
<Tab language="python" title="Python" default>
Expand Down Expand Up @@ -94,154 +94,18 @@

<Note title="Local audio files">
The above code example shows how to transcribe a file that is available via
URL. If you would like to work with local files see our [API
URL. If you would like to work with local files, see our [API
Reference](https://www.assemblyai.com/docs/api-reference/files/upload) for
more information on transcribing local files.
</Note>

## Fine-tuning Slam-1

Check warning on line 102 in fern/pages/01-getting-started/slam-1.mdx

View workflow job for this annotation

GitHub Actions / lint

[vale] reported by reviewdog 🐶 [AssemblyAI.Headings] Use sentence-style capitalization for 'Fine-tuning Slam-1'. Raw Output: {"message": "[AssemblyAI.Headings] Use sentence-style capitalization for 'Fine-tuning Slam-1'. ", "location": {"path": "fern/pages/01-getting-started/slam-1.mdx", "range": {"start": {"line": 102, "column": 4}}}, "severity": "WARNING"}

What truly sets Slam-1 apart is its ability to be customized for specific industries and use cases with minimal effort. Rather than spending months developing custom models or implementing complex post-processing rules, Slam-1 offers two distinct customization approaches that give you unprecedented control over your transcription results.

One approach is to prompt the model with key termininology in the form of a list of words and phrases. The other approach is to prompt the model with contextual information about the audio in your file. Continue reading for more information on these two approaches.

### Contextual prompting of words and phrases

One way to improve transcription accuracy is to leverage Slam-1's contextual understanding capabilities by prompting the model with certain words or phrases that are likely to appear frequently in your audio file. Slam-1 goes far beyond traditional "custom vocabulary" or "word boost" features found in other speech recognition providers. Rather than simply increasing the likelihood of detecting specific words, Slam-1's multi-modal architecture actually understands the semantic meaning and context of the terminology you provide, enhancing transcription quality not just of the exact terms you specify, but also related terminology, variations, and contextually similar phrases. Prompt the model with up to 1000 unique keywords. These keywords can be individual words or phrases of up to six words.

<Note title="Keyword count limits">
While we support up to 1000 Word Boost keywords, actual capacity may be lower due to internal tokenization and implementation constraints.
Key points to remember:
- Each word in a multi-word phrase counts towards the 1000 keyword limit
- Capitalization affects capacity (uppercase tokens consume more than lowercase)
- Longer words consume more capacity than shorter words

For optimal results, use shorter phrases when possible and be mindful of your total token count when approaching the keyword limit.
</Note>

A common use case for these contextual prompts is to leverage known context about the audio to create a list of key words and phrases. Here are some examples of items to consider when creating the list of key words or phrases for different industries:

- Virtual Sales/Support Call: Product terminology, company names, people names, locations, amounts
- Medical: Condition names, prescription names, doctor names, patient names, diagnostic terms
- Legal: Cases, law firms, people involved in the case

To prompt the model with words or phrases, include the `keyterms_prompt` parameter in your request as shown in the code example below. Words and phrases should be formatted in the way that you would like to see them returned in your transcripts.

<Tabs groupId="language">
<Tab language="python" title="Python" default>
```python
import requests
import time

base_url = "https://api.assemblyai.com"
headers = {"authorization": "<YOUR_API_KEY>"}

data = {
"audio_url": "https://assembly.ai/sports_injuries.mp3",
"speech_model": "slam-1",
"keyterms_prompt": ["foo", "bar", "baz"]
}

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

if response.status_code != 200:
print(f"Error: {response.status_code}, Response: {response.text}")
response.raise_for_status()

transcript_response = response.json()
transcript_id = transcript_response["id"]
polling_endpoint = f"{base_url}/v2/transcript/{transcript_id}"

while True:
transcript = requests.get(polling_endpoint, headers=headers).json()
if transcript["status"] == "completed":
print(transcript["text"])
break
elif transcript["status"] == "error":
raise RuntimeError(f"Transcription failed: {transcript['error']}")
else:
time.sleep(3)
```

</Tab>
<Tab language="typescript" title="Typescript">
```ts
import axios from 'axios'

const baseUrl = 'https://api.assemblyai.com'

const headers = {
authorization: '<YOUR_API_KEY>'
}
Improve transcription accuracy by leveraging Slam-1's contextual understanding capabilities by prompting the model with certain words or phrases that are likely to appear frequently in your audio file.

const data = {
audio_url: 'https://assembly.ai/sports_injuries.mp3',
speech_model: 'slam-1',
keyterms_prompt: ['foo', 'bar', 'baz']
}
Rather than simply increasing the likelihood of detecting specific words, Slam-1's multi-modal architecture actually understands the semantic meaning and context of the terminology you provide, enhancing transcription quality not just of the exact terms you specify, but also related terminology, variations, and contextually similar phrases.

const url = `${baseUrl}/v2/transcript`
const response = await axios.post(url, data, { headers: headers })

const transcriptId = response.data.id
const pollingEndpoint = `${baseUrl}/v2/transcript/${transcriptId}`

while (true) {
const pollingResponse = await axios.get(pollingEndpoint, {
headers: headers
})
const transcriptionResult = pollingResponse.data

if (transcriptionResult.status === 'completed') {
console.log(transcriptionResult.text)
break
} else if (transcriptionResult.status === 'error') {
throw new Error(`Transcription failed: ${transcriptionResult.error}`)
} else {
await new Promise((resolve) => setTimeout(resolve, 3000))
}
}
```

</Tab>
</Tabs>

Here are some examples of what a list of key words and phrases might look like for some specific industries:
<Tabs>
<Tab language="python" title="Medical consultation">
```python
"keyterms_prompt": [
"differential diagnosis", "myocardial infarction", "hypertension", "Wellbutrin XL 150mg", "lumbar radiculopathy", "bilateral paresthesia", "metastatic adenocarcinoma", "idiopathic thrombocytopenic purpura"
]
```
</Tab>
<Tab language="python" title="Professional therapy session">
```python
"keyterms_prompt": [
"cognitive behavioral therapy", "major depressive disorder", "generalized anxiety disorder", "ADHD", "trauma-informed care", "Lexapro 10mg", "psychosocial assessment", "therapeutic alliance"
]
```
</Tab>
<Tab language="python" title="Veterinary exam">
```python
"keyterms_prompt": [
"Caslick's procedure", "otitis media", "degenerative myelopathy", "feline immunodeficiency virus", "cruciate ligament tear", "Rimadyl 75mg", "subcutaneous fluid therapy", "tarsorrhaphy"
]
```
</Tab>
</Tabs>

### Contextual prompting with natural language

Another way to improve transcription accuracy is to leverage Slam-1's contextual understanding capabilities by prompting the model with a description of your audio in plain English. This allows the model to understand the broader context of your audio file and make more intelligent transcription decisions. You can provide up to 1,500 words of contextual information, giving the model rich background knowledge about the content, participants, domain, and purpose of the audio.

A common use case for these contextual prompts is to leverage known context about the call and provide it with the transcription. This might look like:

- Legal Deposition: "This is a deposition in the case of Smith v. Acme Corporation, a product liability lawsuit involving an alleged defect in the XJ-5000 power tool that resulted in severe lacerations to the plaintiff's right hand. The deposition will involve questioning of Dr. Elizabeth Chen, an orthopedic surgeon who treated the plaintiff's injuries."
- Veterinary Consultation: "This is a veterinary consultation about a dog with hip dysplasia."
- Medical Consultation: "This is a medical consultation between Dr. Jones, a pulmonologist, and Dan Rayman who is being evaluated for recurring pneumonia."

To prompt the model with contextual language, include the `prompt` parameter in your request as shown in the code example below.
Provide up to 1000 domain-specific words or phrases (maximum 6 words per phrase) that may appear in your audio using the optional `keyterms_prompt` parameter:

<Tabs groupId="language">
<Tab language="python" title="Python" default>
Expand All @@ -253,10 +117,10 @@
headers = {"authorization": "<YOUR_API_KEY>"}

data = {
"audio_url": "https://assembly.ai/sports_injuries.mp3",
"speech_model": "slam-1",
"prompt": "This is a shareholder meeting for the ACME Corporation to discuss Q3 financial results and expectations for Q4 and beyond."
}
"audio_url": "https://assembly.ai/sports_injuries.mp3",
"speech_model": "slam-1",
"keyterms_prompt": ['differential diagnosis', 'hypertension', 'Wellbutrin XL 150mg']
}

response = requests.post(base_url + "/v2/transcript", headers=headers, json=data)

Expand Down Expand Up @@ -293,7 +157,7 @@
const data = {
audio_url: 'https://assembly.ai/sports_injuries.mp3',
speech_model: 'slam-1',
prompt: 'This is a shareholder meeting for the ACME Corporation to discuss Q3 financial results and expectations for Q4 and beyond.'
keyterms_prompt: ['differential diagnosis', 'hypertension', 'Wellbutrin XL 150mg']
}

const url = `${baseUrl}/v2/transcript`
Expand Down Expand Up @@ -322,24 +186,21 @@
</Tab>
</Tabs>

<Warning>
Contextual prompting via the `prompt` features is currently experimental and is in an early form of release so your milage may vary when using this parameter.

Please email our Support team at [email protected] if you would like help iterating on the description you are using in the `prompt` field.
</Warning>

### Should I use `keyterms_prompt` or `prompt`?

Prompting with the `keyterms_prompt` parameter is a good option when you have a specific terminology or vocabulary that you would like to fine-tune the model with. `keyterms_prompt` works best with targeted use cases where specific words and phrases can be identified.

Prompting with the `prompt`parameter is a good option when you know the context of the audio, perhaps derived from some metadata, but exact terminology may be too broad or numerous to capture in a list. `prompt` works best when you would rather pass a high level description than specific words or phrases.
<Note title="Keyword count limits">
While we support up to 1000 Word Boost keywords, actual capacity may be lower due to internal tokenization and implementation constraints.
Key points to remember:
- Each word in a multi-word phrase counts towards the 1000 keyword limit
- Capitalization affects capacity (uppercase tokens consume more than lowercase)
- Longer words consume more capacity than shorter words

<Note>
The `keyterms_prompt` and `prompt` parameters cannot both be used in the same request.
For optimal results, use shorter phrases when possible and be mindful of your total token count when approaching the keyword limit.
</Note>

## Feedback
Here is an example of what a `keyterms_prompt` list might look like for a transcription of a professional therapy session for a patient named Jane Doe, who is being treated for anxiety and depression:
```txt wordWrap
["Jane Doe", "cognitive behavioral therapy", "major depressive disorder", "generalized anxiety disorder", "ADHD", "trauma-informed care", "Lexapro 10mg", "psychosocial assessment", "therapeutic alliance", "emotional dysregulation", "GAD-7", "PHQ-9", "Citalopram 20mg", "Lorazepam 2mg"]
```

We would appreciate any feedback that you have as you test this groundbreaking technology. Your insights will be invaluable in shaping the future of Slam-1 before its public release.
## Feedback

If you have a shared Slack channel with us, please share any feedback there directly in real-time. Otherwise, feel free to email any feedback to our Support team at [email protected].
We welcome your feedback on Slam-1 during this beta period. Share thoughts by emailing our Support team at [email protected].
Loading