AssemblyAI · m-ods · Apr 7, 2025 · Apr 7, 2025 · Apr 7, 2025 · Apr 7, 2025
diff --git a/fern/docs.yml b/fern/docs.yml
@@ -371,6 +371,10 @@ navigation:
                 path: pages/05-guides/cookbooks/streaming-stt/real-time.mdx
                 slug: real-time
                 hidden: true
+              - page: Redact PII from Text Using LeMUR
+                path: pages/05-guides/cookbooks/lemur/lemur-pii-redaction.mdx
+                slug: lemur-pii-redaction
+                hidden: true
           - section: SDK References
             icon: duotone cubes
             contents:

diff --git a/fern/pages/03-audio-intelligence/pii-redaction.mdx b/fern/pages/03-audio-intelligence/pii-redaction.mdx
@@ -560,6 +560,10 @@ of things. The season has been pretty dry already, and then the fact that we're
 getting hit in the US. Is because there's a couple of weather systems that ...
 ```
 
+<Tip title="PII Redaction Using LeMUR">
+  If you would like the option to use LeMUR for custom PII redaction, check out this guide [Redact PII from Text Using LeMUR](/docs/lemur/lemur-pii-redaction).
+</Tip>
+
 ## Create redacted audio files
 
 In addition to redacting sensitive information from the transcription text, you can also generate a version of the original audio file with the PII "beeped" out.

diff --git a/...pages/05-guides/cookbooks/core-transcription/detecting-low-confidence-words.mdx b/...pages/05-guides/cookbooks/core-transcription/detecting-low-confidence-words.mdx
@@ -25,7 +25,7 @@ const client = new AssemblyAI({
 Next create the transcript with your audio file, either via local audio file or URL (AssemblyAI's servers need to be able to access the URL, make sure the URL links to a downloadable file).
 
 ```javascript
-const transcript = await client.transcripts.create({
+const transcript = await client.transcripts.transcribe({
   audio_url: "./sample.mp4",
 });
 ```

diff --git a/fern/pages/05-guides/cookbooks/lemur/lemur-pii-redaction.mdx b/fern/pages/05-guides/cookbooks/lemur/lemur-pii-redaction.mdx
@@ -0,0 +1,175 @@
+---
+title: "Redact PII from Text Using LeMUR"
+---
+
+This guide will show you how to use AssemblyAI's LeMUR framework to redact personally identifiable information (PII) from text.
+
+## Quickstart
+
+```python
+import assemblyai as aai
+import json
+import os
+
+aai.settings.api_key = 'YOUR API KEY'
+
+def generate_ner(transcript_text):
+    prompt = '''
+    You will be given a transcript of a conversation or text. Your task is to generate named entities from the given transcript text.
+
+    Please identify and extract the following named entities from the transcript:
+
+    1. Person names
+    2. Organization names
+    3. Email addresses
+    4. Phone numbers
+    5. Full addresses
+
+    When extracting these entities, make sure to return the exact spelling and formatting as they appear in the transcript. Do not modify or standardize the entities in any way.
+
+    Present your results in a JSON format with a single field named "named_entities". This field should contain an array of strings, where each string is a named entity you've identified. For example:
+    {
+      "named_entities": ["John Doe", "Acme Corp", "[email protected]", "123-456-7890", "123 Main St, Anytown, USA 12345"]
+    }
+
+    Important: Do not include any other information, explanations, or text in your response. Your output should consist solely of the JSON object containing the named entities.
+
+    If you do not find any named entities of a particular type, simply return a empty array for the "named_entities" field.
+    '''
+
+    response = aai.Lemur().task(
+        prompt=prompt,
+        input_text=transcript_text,
+        max_output_size=4000,
+        temperature=0.0,
+        final_model=aai.LemurModel.claude3_5_sonnet
+    ).response
+
+    try:
+      res_json = json.loads(response)
+    except:
+      res_json = {'named_entities': []}
+
+    named_entities = res_json.get('named_entities', [])
+
+    return named_entities
+
+transcriber = aai.Transcriber(config=aai.TranscriptionConfig(language_code='en'))
+transcript = transcriber.transcribe('YOUR_AUDIO_URL')
+
+redacted_transcript = ''
+
+for sentence in transcript.get_sentences():
+  generated_entities = generate_ner(sentence.text)
+
+  redacted_sentence = sentence.text
+
+  for entity in generated_entities:
+    redacted_sentence = redacted_sentence.replace(entity, '#' * len(entity))
+
+  redacted_transcript += redacted_sentence + ' '
+  print(redacted_sentence)
+
+print('Full redacted transcript:')
+print(redacted_transcript)
+```
+
+## Get Started
+
+Before we begin, make sure you have an AssemblyAI account and an API key. You can [sign up](https://assemblyai.com/dashboard/signup) for an account and get your API key from your dashboard.
+
+For information about LeMUR pricing, see our [pricing page](https://www.assemblyai.com/pricing).
+
+## Step-by-Step Instructions
+
+Install the SDK.
+
+```python
+pip install assemblyai
+```
+
+Import the `assemblyai` package and set your API key.
+
+```python
+import assemblyai as aai
+import json
+import os
+
+aai.settings.api_key = 'YOUR API KEY'
+```
+
+Define a function `generate_ner` that uses LeMUR to identify named entities (person names, organizations, emails, phone numbers, addresses) in a given text.
+
+```python
+def generate_ner(transcript_text):
+    prompt = '''
+    You will be given a transcript of a conversation or text. Your task is to generate named entities from the given transcript text.
+
+    Please identify and extract the following named entities from the transcript:
+
+    1. Person names
+    2. Organization names
+    3. Email addresses
+    4. Phone numbers
+    5. Full addresses
+
+    When extracting these entities, make sure to return the exact spelling and formatting as they appear in the transcript. Do not modify or standardize the entities in any way.
+
+    Present your results in a JSON format with a single field named "named_entities". This field should contain an array of strings, where each string is a named entity you've identified. For example:
+    {
+      "named_entities": ["John Doe", "Acme Corp", "[email protected]", "123-456-7890", "123 Main St, Anytown, USA 12345"]
+    }
+
+    Important: Do not include any other information, explanations, or text in your response. Your output should consist solely of the JSON object containing the named entities.
+
+    If you do not find any named entities of a particular type, simply return a empty array for the "named_entities" field.
+    '''
+
+    response = aai.Lemur().task(
+        prompt=prompt,
+        input_text=transcript_text,
+        max_output_size=4000,
+        temperature=0.0,
+        final_model=aai.LemurModel.claude3_5_sonnet
+    ).response
+
+    try:
+      res_json = json.loads(response)
+    except:
+      res_json = {'named_entities': []}
+
+    named_entities = res_json.get('named_entities', [])
+
+    return named_entities
+```
+
+Transcribe an audio file using the AssemblyAI Transcriber.
+
+```python
+transcriber = aai.Transcriber(config=aai.TranscriptionConfig(language_code='en'))
+transcript = transcriber.transcribe('YOUR_AUDIO_URL')
+```
+
+Iterate through each sentence in the transcript, identify named entities using `generate_ner`, and replace them with # characters.
+
+```python
+redacted_transcript = ''
+
+for sentence in transcript.get_sentences():
+  generated_entities = generate_ner(sentence.text)
+
+  redacted_sentence = sentence.text
+
+  for entity in generated_entities:
+    redacted_sentence = redacted_sentence.replace(entity, '#' * len(entity))
+
+  redacted_transcript += redacted_sentence + ' '
+  print(redacted_sentence)
+```
+
+Print the redacted transcript.
+
+```python
+print('Full redacted transcript:')
+print(redacted_transcript)
+```
diff --git a/fern/pages/05-guides/index.mdx b/fern/pages/05-guides/index.mdx
@@ -1016,6 +1016,18 @@ For examples using the API without SDKs see [API guides](#api-guides).
         />
       </a>
     </li>
+    <li>
+      <a
+        href="guides/lemur-pii-redaction"
+        className="link-cta rounded-lg flex items-center gap-2"
+      >
+        Redact PII from Text Using LeMUR{" "}
+        <Icon
+          icon="duotone arrow-right"
+          color="rgba(var(--accent-aaa),var(--tw-text-opacity,1))"
+        />
+      </a>
+    </li>
   </ul>
 </div>