-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[DOC] Add Contextual AI to Chroma Integration #5746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Feat: Added Contextual AI Documentation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Show resolved
Hide resolved
|
Documentation: Add Adds a new Markdoc page that demonstrates a full Retrieval-Augmented Generation (RAG) workflow using Chroma together with Contextual AI’s Key Changes• New file Affected Areas• Documentation site (Markdoc) – framework integrations section This summary was automatically generated by @propel-code-bot |
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
…s/contextual-ai.md Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
| results = contextual_client.parse.job_results( | ||
| parse_response.job_id, | ||
| output_types=['blocks-per-page'] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Missing error handling for contextual_client.parse.job_results() API call. If this API call fails after successfully completing the job status check, users will get an unhandled exception. The Contextual AI SDK raises specific exceptions that should be handled appropriately. Add error handling:
import contextual
try:
results = contextual_client.parse.job_results(
parse_response.job_id,
output_types=['blocks-per-page']
)
except contextual.APIConnectionError as e:
raise Exception(f"Network error retrieving parse results: {e}")
except contextual.APIStatusError as e:
raise Exception(f"API error retrieving parse results: {e.status_code} - {e.response}")
except Exception as e:
raise Exception(f"Failed to retrieve parse results: {e}")Context for Agents
[**BestPractice**]
Missing error handling for `contextual_client.parse.job_results()` API call. If this API call fails after successfully completing the job status check, users will get an unhandled exception. The Contextual AI SDK raises specific exceptions that should be handled appropriately. Add error handling:
```python
import contextual
try:
results = contextual_client.parse.job_results(
parse_response.job_id,
output_types=['blocks-per-page']
)
except contextual.APIConnectionError as e:
raise Exception(f"Network error retrieving parse results: {e}")
except contextual.APIStatusError as e:
raise Exception(f"API error retrieving parse results: {e.status_code} - {e.response}")
except Exception as e:
raise Exception(f"Failed to retrieve parse results: {e}")
```
File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 62There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For documentation, I think it would be appropriate to avoid adding additional complexity of handling network/connection failure exceptions.
| # Add parsed content to Chroma | ||
| texts, metadatas, ids = [], [], [] | ||
|
|
||
| for page in results.pages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Potential attribute access error: The code assumes results.pages exists and that each page has a blocks attribute, but there's no validation that the API response structure matches expectations. If the API response format changes or is malformed, this will raise AttributeError. Add validation:
if not hasattr(results, 'pages') or not results.pages:
raise Exception("No pages found in parse results")
for page in results.pages:
if not hasattr(page, 'blocks') or not page.blocks:
continue # Skip pages without blocks
for block in page.blocks:
# ... existing codeContext for Agents
[**BestPractice**]
Potential attribute access error: The code assumes `results.pages` exists and that each page has a `blocks` attribute, but there's no validation that the API response structure matches expectations. If the API response format changes or is malformed, this will raise `AttributeError`. Add validation:
```python
if not hasattr(results, 'pages') or not results.pages:
raise Exception("No pages found in parse results")
for page in results.pages:
if not hasattr(page, 'blocks') or not page.blocks:
continue # Skip pages without blocks
for block in page.blocks:
# ... existing code
```
File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 80There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API response contains results.page and blocks attribute.
| collection.add( | ||
| documents=texts, | ||
| metadatas=metadatas, | ||
| ids=ids | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[BestPractice]
Missing error handling for ChromaDB operations. If collection.add() fails (due to duplicate IDs, connection issues, dimension mismatches, or validation errors), this will raise an unhandled exception. ChromaDB can raise various exceptions including dimension mismatches when embeddings don't match collection dimensions, and validation errors for malformed data. Add error handling:
try:
collection.add(
documents=texts,
metadatas=metadatas,
ids=ids
)
except ValueError as e:
# Handles duplicate IDs and validation errors
raise Exception(f"ChromaDB validation error: {e}")
except Exception as e:
raise Exception(f"Failed to add documents to Chroma collection: {e}")Context for Agents
[**BestPractice**]
Missing error handling for ChromaDB operations. If `collection.add()` fails (due to duplicate IDs, connection issues, dimension mismatches, or validation errors), this will raise an unhandled exception. ChromaDB can raise various exceptions including dimension mismatches when embeddings don't match collection dimensions, and validation errors for malformed data. Add error handling:
```python
try:
collection.add(
documents=texts,
metadatas=metadatas,
ids=ids
)
except ValueError as e:
# Handles duplicate IDs and validation errors
raise Exception(f"ChromaDB validation error: {e}")
except Exception as e:
raise Exception(f"Failed to add documents to Chroma collection: {e}")
```
File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 94| rerank_response = contextual_client.rerank.create( | ||
| query=query, | ||
| documents=results['documents'][0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CriticalError]
Potential index out of bounds error: The code assumes results['documents'][0] and results['metadatas'][0] exist, but if the Chroma query returns no results, this will raise IndexError. ChromaDB query results are structured as arrays that may be empty. Add validation:
if not results['documents'] or not results['documents'][0]:
raise Exception("No documents found for the query")
if not results['metadatas'] or not results['metadatas'][0]:
raise Exception("No metadata found for the query")
rerank_response = contextual_client.rerank.create(
query=query,
documents=results['documents'][0],
metadata=[str(m) for m in results['metadatas'][0]],
# ... rest of parameters
)Context for Agents
[**CriticalError**]
Potential index out of bounds error: The code assumes `results['documents'][0]` and `results['metadatas'][0]` exist, but if the Chroma query returns no results, this will raise `IndexError`. ChromaDB query results are structured as arrays that may be empty. Add validation:
```python
if not results['documents'] or not results['documents'][0]:
raise Exception("No documents found for the query")
if not results['metadatas'] or not results['metadatas'][0]:
raise Exception("No metadata found for the query")
rerank_response = contextual_client.rerank.create(
query=query,
documents=results['documents'][0],
metadata=[str(m) for m in results['metadatas'][0]],
# ... rest of parameters
)
```
File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 110| top_docs = [ | ||
| results['documents'][0][r.index] | ||
| for r in rerank_response.results[:5] | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[CriticalError]
Potential index out of bounds error: The code assumes rerank_response.results has items and that each result has a valid index attribute. If there are no rerank results or if r.index is out of bounds for the original results, this could cause IndexError. The Contextual AI rerank API returns results with index references that may not align with the original query results. Add bounds checking:
if not hasattr(rerank_response, 'results') or not rerank_response.results:
raise Exception("No rerank results returned")
top_docs = []
for r in rerank_response.results[:5]:
if hasattr(r, 'index') and r.index < len(results['documents'][0]):
top_docs.append(results['documents'][0][r.index])
else:
print(f"Warning: Invalid index {r.index} in rerank results")
if not top_docs:
raise Exception("No valid documents found after reranking")Context for Agents
[**CriticalError**]
Potential index out of bounds error: The code assumes `rerank_response.results` has items and that each result has a valid `index` attribute. If there are no rerank results or if `r.index` is out of bounds for the original results, this could cause `IndexError`. The Contextual AI rerank API returns results with index references that may not align with the original query results. Add bounds checking:
```python
if not hasattr(rerank_response, 'results') or not rerank_response.results:
raise Exception("No rerank results returned")
top_docs = []
for r in rerank_response.results[:5]:
if hasattr(r, 'index') and r.index < len(results['documents'][0]):
top_docs.append(results['documents'][0][r.index])
else:
print(f"Warning: Invalid index {r.index} in rerank results")
if not top_docs:
raise Exception("No valid documents found after reranking")
```
File: docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Line: 120
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
docs/docs.trychroma.com/markdoc/content/integrations/frameworks/contextual-ai.md
Outdated
Show resolved
Hide resolved
…s/contextual-ai.md Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
Description of changes