Skip to content

Conversation

@Jinash-Rouniyar
Copy link

Description of changes

  • Showcases Complete RAG Pipeline example with Chroma + Contextual AI RAG Tools

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@itaismith itaismith changed the title Feat: Add Contextual AI to Chroma Integration [DOC] Add Contextual AI to Chroma Integration Oct 29, 2025
@itaismith itaismith marked this pull request as ready for review October 29, 2025 20:23
@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Oct 29, 2025

Documentation: Add Contextual AI RAG integration guide

Adds a new Markdoc page that demonstrates a full Retrieval-Augmented Generation (RAG) workflow using Chroma together with Contextual AI’s Parse, Rerank, Generate, and LMUnit APIs. The guide is provided in both Python and TypeScript, covers document parsing, async job polling, vector storage in Chroma, reranking with custom instructions, grounded response generation, and quality evaluation. In addition, the global integrations index table is updated to list Contextual AI under framework integrations.

Key Changes

• New file docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md (≈395 LOC) containing step-by-step code examples, API explanations, and links to external resources.
• Updated docs/docs.trychroma.com/markdoc/content/integrations/chroma-integrations.md to include Contextual AI entry in the frameworks matrix.

Affected Areas

• Documentation site (Markdoc) – framework integrations section
• No application/runtime code touched

This summary was automatically generated by @propel-code-bot

…s/contextual-ai.md

Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
Comment on lines +59 to +62
results = contextual_client.parse.job_results(
parse_response.job_id,
output_types=['blocks-per-page']
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Missing error handling for contextual_client.parse.job_results() API call. If this API call fails after successfully completing the job status check, users will get an unhandled exception. The Contextual AI SDK raises specific exceptions that should be handled appropriately. Add error handling:

import contextual

try:
    results = contextual_client.parse.job_results(
        parse_response.job_id,
        output_types=['blocks-per-page']
    )
except contextual.APIConnectionError as e:
    raise Exception(f"Network error retrieving parse results: {e}")
except contextual.APIStatusError as e:
    raise Exception(f"API error retrieving parse results: {e.status_code} - {e.response}")
except Exception as e:
    raise Exception(f"Failed to retrieve parse results: {e}")
Context for Agents
[**BestPractice**]

Missing error handling for `contextual_client.parse.job_results()` API call. If this API call fails after successfully completing the job status check, users will get an unhandled exception. The Contextual AI SDK raises specific exceptions that should be handled appropriately. Add error handling:

```python
import contextual

try:
    results = contextual_client.parse.job_results(
        parse_response.job_id,
        output_types=['blocks-per-page']
    )
except contextual.APIConnectionError as e:
    raise Exception(f"Network error retrieving parse results: {e}")
except contextual.APIStatusError as e:
    raise Exception(f"API error retrieving parse results: {e.status_code} - {e.response}")
except Exception as e:
    raise Exception(f"Failed to retrieve parse results: {e}")
```

File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 62

Copy link
Author

@Jinash-Rouniyar Jinash-Rouniyar Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For documentation, I think it would be appropriate to avoid adding additional complexity of handling network/connection failure exceptions.

# Add parsed content to Chroma
texts, metadatas, ids = [], [], []

for page in results.pages:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Potential attribute access error: The code assumes results.pages exists and that each page has a blocks attribute, but there's no validation that the API response structure matches expectations. If the API response format changes or is malformed, this will raise AttributeError. Add validation:

if not hasattr(results, 'pages') or not results.pages:
    raise Exception("No pages found in parse results")

for page in results.pages:
    if not hasattr(page, 'blocks') or not page.blocks:
        continue  # Skip pages without blocks
    
    for block in page.blocks:
        # ... existing code
Context for Agents
[**BestPractice**]

Potential attribute access error: The code assumes `results.pages` exists and that each page has a `blocks` attribute, but there's no validation that the API response structure matches expectations. If the API response format changes or is malformed, this will raise `AttributeError`. Add validation:

```python
if not hasattr(results, 'pages') or not results.pages:
    raise Exception("No pages found in parse results")

for page in results.pages:
    if not hasattr(page, 'blocks') or not page.blocks:
        continue  # Skip pages without blocks
    
    for block in page.blocks:
        # ... existing code
```

File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 80

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API response contains results.page and blocks attribute.

Comment on lines +90 to +94
collection.add(
documents=texts,
metadatas=metadatas,
ids=ids
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BestPractice]

Missing error handling for ChromaDB operations. If collection.add() fails (due to duplicate IDs, connection issues, dimension mismatches, or validation errors), this will raise an unhandled exception. ChromaDB can raise various exceptions including dimension mismatches when embeddings don't match collection dimensions, and validation errors for malformed data. Add error handling:

try:
    collection.add(
        documents=texts,
        metadatas=metadatas,
        ids=ids
    )
except ValueError as e:
    # Handles duplicate IDs and validation errors
    raise Exception(f"ChromaDB validation error: {e}")
except Exception as e:
    raise Exception(f"Failed to add documents to Chroma collection: {e}")
Context for Agents
[**BestPractice**]

Missing error handling for ChromaDB operations. If `collection.add()` fails (due to duplicate IDs, connection issues, dimension mismatches, or validation errors), this will raise an unhandled exception. ChromaDB can raise various exceptions including dimension mismatches when embeddings don't match collection dimensions, and validation errors for malformed data. Add error handling:

```python
try:
    collection.add(
        documents=texts,
        metadatas=metadatas,
        ids=ids
    )
except ValueError as e:
    # Handles duplicate IDs and validation errors
    raise Exception(f"ChromaDB validation error: {e}")
except Exception as e:
    raise Exception(f"Failed to add documents to Chroma collection: {e}")
```

File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 94

Comment on lines +108 to +110
rerank_response = contextual_client.rerank.create(
query=query,
documents=results['documents'][0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Potential index out of bounds error: The code assumes results['documents'][0] and results['metadatas'][0] exist, but if the Chroma query returns no results, this will raise IndexError. ChromaDB query results are structured as arrays that may be empty. Add validation:

if not results['documents'] or not results['documents'][0]:
    raise Exception("No documents found for the query")

if not results['metadatas'] or not results['metadatas'][0]:
    raise Exception("No metadata found for the query")

rerank_response = contextual_client.rerank.create(
    query=query,
    documents=results['documents'][0],
    metadata=[str(m) for m in results['metadatas'][0]],
    # ... rest of parameters
)
Context for Agents
[**CriticalError**]

Potential index out of bounds error: The code assumes `results['documents'][0]` and `results['metadatas'][0]` exist, but if the Chroma query returns no results, this will raise `IndexError`. ChromaDB query results are structured as arrays that may be empty. Add validation:

```python
if not results['documents'] or not results['documents'][0]:
    raise Exception("No documents found for the query")

if not results['metadatas'] or not results['metadatas'][0]:
    raise Exception("No metadata found for the query")

rerank_response = contextual_client.rerank.create(
    query=query,
    documents=results['documents'][0],
    metadata=[str(m) for m in results['metadatas'][0]],
    # ... rest of parameters
)
```

File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 110

Comment on lines +117 to +120
top_docs = [
results['documents'][0][r.index]
for r in rerank_response.results[:5]
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

Potential index out of bounds error: The code assumes rerank_response.results has items and that each result has a valid index attribute. If there are no rerank results or if r.index is out of bounds for the original results, this could cause IndexError. The Contextual AI rerank API returns results with index references that may not align with the original query results. Add bounds checking:

if not hasattr(rerank_response, 'results') or not rerank_response.results:
    raise Exception("No rerank results returned")

top_docs = []
for r in rerank_response.results[:5]:
    if hasattr(r, 'index') and r.index < len(results['documents'][0]):
        top_docs.append(results['documents'][0][r.index])
    else:
        print(f"Warning: Invalid index {r.index} in rerank results")

if not top_docs:
    raise Exception("No valid documents found after reranking")
Context for Agents
[**CriticalError**]

Potential index out of bounds error: The code assumes `rerank_response.results` has items and that each result has a valid `index` attribute. If there are no rerank results or if `r.index` is out of bounds for the original results, this could cause `IndexError`. The Contextual AI rerank API returns results with index references that may not align with the original query results. Add bounds checking:

```python
if not hasattr(rerank_response, 'results') or not rerank_response.results:
    raise Exception("No rerank results returned")

top_docs = []
for r in rerank_response.results[:5]:
    if hasattr(r, 'index') and r.index < len(results['documents'][0]):
        top_docs.append(results['documents'][0][r.index])
    else:
        print(f"Warning: Invalid index {r.index} in rerank results")

if not top_docs:
    raise Exception("No valid documents found after reranking")
```

File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 120

jinashrouniyar-268 and others added 2 commits November 8, 2025 12:13
…s/contextual-ai.md

Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants