[DOC] Add Contextual AI to Chroma Integration #5746

Jinash-Rouniyar · 2025-10-28T06:05:56Z

Description of changes

Showcases Complete RAG Pipeline example with Chroma + Contextual AI RAG Tools

Feat: Added Contextual AI Documentation

github-actions · 2025-10-28T06:06:14Z

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

propel-code-bot · 2025-10-29T20:24:26Z

Documentation: Add Contextual AI RAG integration guide

Adds a new Markdoc page that demonstrates a full Retrieval-Augmented Generation (RAG) workflow using Chroma together with Contextual AI’s Parse, Rerank, Generate, and LMUnit APIs. The guide is provided in both Python and TypeScript, covers document parsing, async job polling, vector storage in Chroma, reranking with custom instructions, grounded response generation, and quality evaluation. In addition, the global integrations index table is updated to list Contextual AI under framework integrations.

Key Changes

• New file docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md (≈395 LOC) containing step-by-step code examples, API explanations, and links to external resources.
• Updated docs/docs.trychroma.com/markdoc/content/integrations/chroma-integrations.md to include Contextual AI entry in the frameworks matrix.

Affected Areas

• Documentation site (Markdoc) – framework integrations section
• No application/runtime code touched

This summary was automatically generated by @propel-code-bot

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

…s/contextual-ai.md Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

propel-code-bot · 2025-10-30T05:04:06Z

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

+results = contextual_client.parse.job_results(
+    parse_response.job_id,
+    output_types=['blocks-per-page']
+)


[BestPractice]

Missing error handling for contextual_client.parse.job_results() API call. If this API call fails after successfully completing the job status check, users will get an unhandled exception. The Contextual AI SDK raises specific exceptions that should be handled appropriately. Add error handling:

import contextual try: results = contextual_client.parse.job_results( parse_response.job_id, output_types=['blocks-per-page'] ) except contextual.APIConnectionError as e: raise Exception(f"Network error retrieving parse results: {e}") except contextual.APIStatusError as e: raise Exception(f"API error retrieving parse results: {e.status_code} - {e.response}") except Exception as e: raise Exception(f"Failed to retrieve parse results: {e}")

Context for Agents

[**BestPractice**] Missing error handling for `contextual_client.parse.job_results()` API call. If this API call fails after successfully completing the job status check, users will get an unhandled exception. The Contextual AI SDK raises specific exceptions that should be handled appropriately. Add error handling: ```python import contextual try: results = contextual_client.parse.job_results( parse_response.job_id, output_types=['blocks-per-page'] ) except contextual.APIConnectionError as e: raise Exception(f"Network error retrieving parse results: {e}") except contextual.APIStatusError as e: raise Exception(f"API error retrieving parse results: {e.status_code} - {e.response}") except Exception as e: raise Exception(f"Failed to retrieve parse results: {e}") ``` File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Line: 62

For documentation, I think it would be appropriate to avoid adding additional complexity of handling network/connection failure exceptions.

propel-code-bot · 2025-10-30T05:04:07Z

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

+# Add parsed content to Chroma
+texts, metadatas, ids = [], [], []
+
+for page in results.pages:


[BestPractice]

Potential attribute access error: The code assumes results.pages exists and that each page has a blocks attribute, but there's no validation that the API response structure matches expectations. If the API response format changes or is malformed, this will raise AttributeError. Add validation:

if not hasattr(results, 'pages') or not results.pages: raise Exception("No pages found in parse results") for page in results.pages: if not hasattr(page, 'blocks') or not page.blocks: continue # Skip pages without blocks for block in page.blocks: # ... existing code

Context for Agents

[**BestPractice**] Potential attribute access error: The code assumes `results.pages` exists and that each page has a `blocks` attribute, but there's no validation that the API response structure matches expectations. If the API response format changes or is malformed, this will raise `AttributeError`. Add validation: ```python if not hasattr(results, 'pages') or not results.pages: raise Exception("No pages found in parse results") for page in results.pages: if not hasattr(page, 'blocks') or not page.blocks: continue # Skip pages without blocks for block in page.blocks: # ... existing code ``` File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Line: 80

The API response contains results.page and blocks attribute.

propel-code-bot · 2025-10-30T05:04:07Z

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

+collection.add(
+    documents=texts,
+    metadatas=metadatas,
+    ids=ids
+)


[BestPractice]

Missing error handling for ChromaDB operations. If collection.add() fails (due to duplicate IDs, connection issues, dimension mismatches, or validation errors), this will raise an unhandled exception. ChromaDB can raise various exceptions including dimension mismatches when embeddings don't match collection dimensions, and validation errors for malformed data. Add error handling:

try: collection.add( documents=texts, metadatas=metadatas, ids=ids ) except ValueError as e: # Handles duplicate IDs and validation errors raise Exception(f"ChromaDB validation error: {e}") except Exception as e: raise Exception(f"Failed to add documents to Chroma collection: {e}")

Context for Agents

[**BestPractice**] Missing error handling for ChromaDB operations. If `collection.add()` fails (due to duplicate IDs, connection issues, dimension mismatches, or validation errors), this will raise an unhandled exception. ChromaDB can raise various exceptions including dimension mismatches when embeddings don't match collection dimensions, and validation errors for malformed data. Add error handling: ```python try: collection.add( documents=texts, metadatas=metadatas, ids=ids ) except ValueError as e: # Handles duplicate IDs and validation errors raise Exception(f"ChromaDB validation error: {e}") except Exception as e: raise Exception(f"Failed to add documents to Chroma collection: {e}") ``` File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Line: 94

propel-code-bot · 2025-10-30T05:04:08Z

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

+rerank_response = contextual_client.rerank.create(
+    query=query,
+    documents=results['documents'][0],


[CriticalError]

Potential index out of bounds error: The code assumes results['documents'][0] and results['metadatas'][0] exist, but if the Chroma query returns no results, this will raise IndexError. ChromaDB query results are structured as arrays that may be empty. Add validation:

if not results['documents'] or not results['documents'][0]: raise Exception("No documents found for the query") if not results['metadatas'] or not results['metadatas'][0]: raise Exception("No metadata found for the query") rerank_response = contextual_client.rerank.create( query=query, documents=results['documents'][0], metadata=[str(m) for m in results['metadatas'][0]], # ... rest of parameters )

Context for Agents

[**CriticalError**] Potential index out of bounds error: The code assumes `results['documents'][0]` and `results['metadatas'][0]` exist, but if the Chroma query returns no results, this will raise `IndexError`. ChromaDB query results are structured as arrays that may be empty. Add validation: ```python if not results['documents'] or not results['documents'][0]: raise Exception("No documents found for the query") if not results['metadatas'] or not results['metadatas'][0]: raise Exception("No metadata found for the query") rerank_response = contextual_client.rerank.create( query=query, documents=results['documents'][0], metadata=[str(m) for m in results['metadatas'][0]], # ... rest of parameters ) ``` File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Line: 110

propel-code-bot · 2025-10-30T05:04:09Z

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

+top_docs = [
+    results['documents'][0][r.index]
+    for r in rerank_response.results[:5]
+]


[CriticalError]

Potential index out of bounds error: The code assumes rerank_response.results has items and that each result has a valid index attribute. If there are no rerank results or if r.index is out of bounds for the original results, this could cause IndexError. The Contextual AI rerank API returns results with index references that may not align with the original query results. Add bounds checking:

if not hasattr(rerank_response, 'results') or not rerank_response.results: raise Exception("No rerank results returned") top_docs = [] for r in rerank_response.results[:5]: if hasattr(r, 'index') and r.index < len(results['documents'][0]): top_docs.append(results['documents'][0][r.index]) else: print(f"Warning: Invalid index {r.index} in rerank results") if not top_docs: raise Exception("No valid documents found after reranking")

Context for Agents

[**CriticalError**] Potential index out of bounds error: The code assumes `rerank_response.results` has items and that each result has a valid `index` attribute. If there are no rerank results or if `r.index` is out of bounds for the original results, this could cause `IndexError`. The Contextual AI rerank API returns results with index references that may not align with the original query results. Add bounds checking: ```python if not hasattr(rerank_response, 'results') or not rerank_response.results: raise Exception("No rerank results returned") top_docs = [] for r in rerank_response.results[:5]: if hasattr(r, 'index') and r.index < len(results['documents'][0]): top_docs.append(results['documents'][0][r.index]) else: print(f"Warning: Invalid index {r.index} in rerank results") if not top_docs: raise Exception("No valid documents found after reranking") ``` File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Line: 120

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md

…s/contextual-ai.md Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>

Jinash Rouniyar and others added 4 commits October 23, 2025 10:31

Feat: Added Contextual AI Documentation

5259108

Addressed PR comments

fbbf782

Merge pull request #1 from Jinash-Rouniyar/feature/contextualai

609365c

Feat: Added Contextual AI Documentation

Merge branch 'chroma-core:main' into main

cb86a3b

Jinash-Rouniyar commented Oct 28, 2025

View reviewed changes

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Show resolved Hide resolved

itaismith changed the title ~~Feat: Add Contextual AI to Chroma Integration~~ [DOC] Add Contextual AI to Chroma Integration Oct 29, 2025

itaismith marked this pull request as ready for review October 29, 2025 20:23

propel-code-bot bot reviewed Oct 29, 2025

View reviewed changes

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Outdated Show resolved Hide resolved

propel-code-bot bot reviewed Oct 29, 2025

View reviewed changes

docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md Outdated Show resolved Hide resolved

Addressed identified issues

e9cacc0