diff --git a/fern/docs.yml b/fern/docs.yml
index 725884e7..41681461 100644
--- a/fern/docs.yml
+++ b/fern/docs.yml
@@ -371,6 +371,10 @@ navigation:
path: pages/05-guides/cookbooks/streaming-stt/real-time.mdx
slug: real-time
hidden: true
+ - page: Redact PII from Text Using LeMUR
+ path: pages/05-guides/cookbooks/lemur/lemur-pii-redaction.mdx
+ slug: lemur-pii-redaction
+ hidden: true
- section: SDK References
icon: duotone cubes
contents:
diff --git a/fern/pages/03-audio-intelligence/pii-redaction.mdx b/fern/pages/03-audio-intelligence/pii-redaction.mdx
index 99fb1ec1..1af3d214 100644
--- a/fern/pages/03-audio-intelligence/pii-redaction.mdx
+++ b/fern/pages/03-audio-intelligence/pii-redaction.mdx
@@ -560,6 +560,10 @@ of things. The season has been pretty dry already, and then the fact that we're
getting hit in the US. Is because there's a couple of weather systems that ...
```
+
+ If you would like the option to use LeMUR for custom PII redaction, check out this guide [Redact PII from Text Using LeMUR](/docs/lemur/lemur-pii-redaction).
+
+
## Create redacted audio files
In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII "beeped" out.
diff --git a/fern/pages/05-guides/cookbooks/core-transcription/detecting-low-confidence-words.mdx b/fern/pages/05-guides/cookbooks/core-transcription/detecting-low-confidence-words.mdx
index 97430c41..8e4884c4 100644
--- a/fern/pages/05-guides/cookbooks/core-transcription/detecting-low-confidence-words.mdx
+++ b/fern/pages/05-guides/cookbooks/core-transcription/detecting-low-confidence-words.mdx
@@ -25,7 +25,7 @@ const client = new AssemblyAI({
Next create the transcript with your audio file, either via local audio file or URL (AssemblyAI's servers need to be able to access the URL, make sure the URL links to a downloadable file).
```javascript
-const transcript = await client.transcripts.create({
+const transcript = await client.transcripts.transcribe({
audio_url: "./sample.mp4",
});
```
diff --git a/fern/pages/05-guides/cookbooks/lemur/lemur-pii-redaction.mdx b/fern/pages/05-guides/cookbooks/lemur/lemur-pii-redaction.mdx
new file mode 100644
index 00000000..c667b058
--- /dev/null
+++ b/fern/pages/05-guides/cookbooks/lemur/lemur-pii-redaction.mdx
@@ -0,0 +1,175 @@
+---
+title: "Redact PII from Text Using LeMUR"
+---
+
+This guide will show you how to use AssemblyAI's LeMUR framework to redact personally identifiable information (PII) from text.
+
+## Quickstart
+
+```python
+import assemblyai as aai
+import json
+import os
+
+aai.settings.api_key = 'YOUR API KEY'
+
+def generate_ner(transcript_text):
+ prompt = '''
+ You will be given a transcript of a conversation or text. Your task is to generate named entities from the given transcript text.
+
+ Please identify and extract the following named entities from the transcript:
+
+ 1. Person names
+ 2. Organization names
+ 3. Email addresses
+ 4. Phone numbers
+ 5. Full addresses
+
+ When extracting these entities, make sure to return the exact spelling and formatting as they appear in the transcript. Do not modify or standardize the entities in any way.
+
+ Present your results in a JSON format with a single field named "named_entities". This field should contain an array of strings, where each string is a named entity you've identified. For example:
+ {
+ "named_entities": ["John Doe", "Acme Corp", "john.doe@example.com", "123-456-7890", "123 Main St, Anytown, USA 12345"]
+ }
+
+ Important: Do not include any other information, explanations, or text in your response. Your output should consist solely of the JSON object containing the named entities.
+
+ If you do not find any named entities of a particular type, simply return a empty array for the "named_entities" field.
+ '''
+
+ response = aai.Lemur().task(
+ prompt=prompt,
+ input_text=transcript_text,
+ max_output_size=4000,
+ temperature=0.0,
+ final_model=aai.LemurModel.claude3_5_sonnet
+ ).response
+
+ try:
+ res_json = json.loads(response)
+ except:
+ res_json = {'named_entities': []}
+
+ named_entities = res_json.get('named_entities', [])
+
+ return named_entities
+
+transcriber = aai.Transcriber(config=aai.TranscriptionConfig(language_code='en'))
+transcript = transcriber.transcribe('YOUR_AUDIO_URL')
+
+redacted_transcript = ''
+
+for sentence in transcript.get_sentences():
+ generated_entities = generate_ner(sentence.text)
+
+ redacted_sentence = sentence.text
+
+ for entity in generated_entities:
+ redacted_sentence = redacted_sentence.replace(entity, '#' * len(entity))
+
+ redacted_transcript += redacted_sentence + ' '
+ print(redacted_sentence)
+
+print('Full redacted transcript:')
+print(redacted_transcript)
+```
+
+## Get Started
+
+Before we begin, make sure you have an AssemblyAI account and an API key. You can [sign up](https://assemblyai.com/dashboard/signup) for an account and get your API key from your dashboard.
+
+For information about LeMUR pricing, see our [pricing page](https://www.assemblyai.com/pricing).
+
+## Step-by-Step Instructions
+
+Install the SDK.
+
+```python
+pip install assemblyai
+```
+
+Import the `assemblyai` package and set your API key.
+
+```python
+import assemblyai as aai
+import json
+import os
+
+aai.settings.api_key = 'YOUR API KEY'
+```
+
+Define a function `generate_ner` that uses LeMUR to identify named entities (person names, organizations, emails, phone numbers, addresses) in a given text.
+
+```python
+def generate_ner(transcript_text):
+ prompt = '''
+ You will be given a transcript of a conversation or text. Your task is to generate named entities from the given transcript text.
+
+ Please identify and extract the following named entities from the transcript:
+
+ 1. Person names
+ 2. Organization names
+ 3. Email addresses
+ 4. Phone numbers
+ 5. Full addresses
+
+ When extracting these entities, make sure to return the exact spelling and formatting as they appear in the transcript. Do not modify or standardize the entities in any way.
+
+ Present your results in a JSON format with a single field named "named_entities". This field should contain an array of strings, where each string is a named entity you've identified. For example:
+ {
+ "named_entities": ["John Doe", "Acme Corp", "john.doe@example.com", "123-456-7890", "123 Main St, Anytown, USA 12345"]
+ }
+
+ Important: Do not include any other information, explanations, or text in your response. Your output should consist solely of the JSON object containing the named entities.
+
+ If you do not find any named entities of a particular type, simply return a empty array for the "named_entities" field.
+ '''
+
+ response = aai.Lemur().task(
+ prompt=prompt,
+ input_text=transcript_text,
+ max_output_size=4000,
+ temperature=0.0,
+ final_model=aai.LemurModel.claude3_5_sonnet
+ ).response
+
+ try:
+ res_json = json.loads(response)
+ except:
+ res_json = {'named_entities': []}
+
+ named_entities = res_json.get('named_entities', [])
+
+ return named_entities
+```
+
+Transcribe an audio file using the AssemblyAI Transcriber.
+
+```python
+transcriber = aai.Transcriber(config=aai.TranscriptionConfig(language_code='en'))
+transcript = transcriber.transcribe('YOUR_AUDIO_URL')
+```
+
+Iterate through each sentence in the transcript, identify named entities using `generate_ner`, and replace them with # characters.
+
+```python
+redacted_transcript = ''
+
+for sentence in transcript.get_sentences():
+ generated_entities = generate_ner(sentence.text)
+
+ redacted_sentence = sentence.text
+
+ for entity in generated_entities:
+ redacted_sentence = redacted_sentence.replace(entity, '#' * len(entity))
+
+ redacted_transcript += redacted_sentence + ' '
+ print(redacted_sentence)
+```
+
+Print the redacted transcript.
+
+```python
+print('Full redacted transcript:')
+print(redacted_transcript)
+```
\ No newline at end of file
diff --git a/fern/pages/05-guides/index.mdx b/fern/pages/05-guides/index.mdx
index 84ea0bc8..5341f37b 100644
--- a/fern/pages/05-guides/index.mdx
+++ b/fern/pages/05-guides/index.mdx
@@ -1016,6 +1016,18 @@ For examples using the API without SDKs see [API guides](#api-guides).
/>
+
+
+ Redact PII from Text Using LeMUR{" "}
+
+
+