This is a python client for the Duohub API.
Duohub is a blazing fast graph RAG service designed for voice AI and other low-latency applications. It is used to retrieve memory from your knowledege graph in under 50ms.
You will need an API key to use the client. You can get one by signing up on the Duohub app. For more information, visit our website: duohub.ai.
pip install duohub
or
poetry add duohub
Basic usage is as follows:
from duohub import Duohub
client = Duohub(api_key="your_api_key")
response = client.query(query="What is the capital of France?", memoryID="your_memory_id")
print(response)
Output schema is as follows:
{
"payload": [
{
"content": "string",
"score": 1
}
],
"facts": [
{
"content": "string"
}
],
"sources": [
{
"id": "string",
"name": "string",
"url": "string",
"score": 1
}
]
}
facts
: Whether to return facts in the response. Defaults toFalse
.assisted
: Whether to return an answer in the response. Defaults toFalse
.query
: The query to search the graph with.memoryID
: The memory ID to isolate your search results to.top_k
: Number of top memories to return. Defaults to 5.
When you only pass a query and memory ID, you are using default mode. This is the fastest option, and most single sentence queries will get a response in under 50ms.
from duohub import Duohub
client = Duohub(api_key="your_api_key")
response = client.query(query="What is the capital of France?", memoryID="your_memory_id")
print(response)
Your response (located in payload[0].content
) is a string representation of a subgraph that is relevant to your query returned as the payload. You can pass this to your context window using a system message and user message template.
If you pass the assisted=True
parameter to the client, the API will add reasoning to your query and uses the graph context to returns the answer. Assisted mode will add some latency to your query, though it should still be under 250ms.
Using assisted mode will improve the results of your chatbot as it will eliminate any irrelevant information before being passed to your context window, preventing your LLM from assigning attention to noise in your graph results.
from duohub import Duohub
client = Duohub(api_key="your_api_key")
response = client.query(query="What is the capital of France?", memoryID="your_memory_id", assisted=True)
print(response)
Assisted mode results will be a JSON object with the following structure:
{
"payload": [
{
"content": "The capital of France is Paris.",
"score": 1
}
],
"facts": [],
"sources": []
}
If you pass facts=True
to the client, the API will return a list of facts that are relevant to your query. This is useful if you want to pass the results to another model for deeper reasoning.
Because the latency for a fact query is higher than default or assisted mode, we recommend not using these in voice AI or other low-latency applications.
It is more suitable for chatbot workflows or other applications that do not require real-time responses.
from duohub import Duohub
client = Duohub(api_key="your_api_key")
response = client.query(query="What is the capital of France?", memoryID="your_memory_id", facts=True)
print(response)
Your response will include both payload and facts:
{
"payload": [
{
"content": "Paris is the capital of France.",
"score": 1
}
],
"facts": [
{
"content": "Paris is the capital of France."
},
{
"content": "Paris is a city in France."
},
{
"content": "France is a country in Europe."
}
],
"sources": [
{
"id": "123",
"name": "Wikipedia",
"url": "https://wikipedia.org/wiki/Paris",
"score": 1
}
]
}
You can combine the options to get a more tailored response. For example, you can get facts and a payload:
from duohub import Duohub
client = Duohub(api_key="your_api_key")
response = client.query(query="What is the capital of France?", memoryID="your_memory_id", facts=True, assisted=True)
print(response)
Your response will be a JSON object with the following structure:
{
"payload": [
{
"content": "Paris is the capital of France.",
"score": 1
}
],
"facts": [
{
"content": "Paris is the capital of France."
},
{
"content": "Paris is a city in France."
},
{
"content": "France is a country in Europe."
}
],
"sources": [
{
"id": "123",
"name": "Wikipedia",
"url": "https://wikipedia.org/wiki/Paris",
"score": 1
}
]
}
You can add files to Duohub using either local files or external URIs:
# Add a local file
response = client.add_file(file_path="path/to/your/file.txt")
# Add an external website or sitemap
response = client.add_file(
external_uri="https://example.com",
file_type="website" # Options: 'website', 'sitemap', or 'website_bulk'
)
Create a new memory (graph or vector storage):
response = client.create_memory(
name="My Memory",
memory_type="graph", # or "vector"
description="Optional description",
ontology="culture", # Required for graph type. Options: culture, essays, support_requests
chunk_size=250, # Only for vector type
chunk_overlap=10, # Only for vector type (1-50)
webhook_url="https://your-webhook.com", # Optional
acceleration=False # Optional
)
Add files to an existing memory:
response = client.add_files_to_memory(
memory_id="your_memory_id",
files=["file_id_1", "file_id_2"]
)
Remove a file from memory:
response = client.delete_file_from_memory(
memory_id="your_memory_id",
file_id="file_id_to_remove"
)
After adding files, start the ingestion process:
response = client.start_ingestion(
memory_id="your_memory_id"
)
Note: The file management endpoints can only be used with memories created on or after 17-Dec-2024.
We welcome contributions to this client! Please feel free to submit a PR. If you encounter any issues, please open an issue.