Skip to content

Commit 66e5656

Browse files
cin-kleintaprosoft
andauthored
feat: integrate nano-graphrag (#433)
* add nano graph-rag * ignore entities for relevant context reference * refactor and add local model as default nano-graphrag * feat: add kotaemon llm & embedding integration with nanographrag * fix: add env var for nano GraphRAG --------- Co-authored-by: Tadashi <[email protected]>
1 parent 19b386b commit 66e5656

File tree

7 files changed

+465
-13
lines changed

7 files changed

+465
-13
lines changed

.env.example

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ COHERE_API_KEY=<COHERE_API_KEY>
1919
# settings for local models
2020
LOCAL_MODEL=llama3.1:8b
2121
LOCAL_MODEL_EMBEDDINGS=nomic-embed-text
22+
LOCAL_EMBEDDING_MODEL_DIM = 768
23+
LOCAL_EMBEDDING_MODEL_MAX_TOKENS = 8192
2224

2325
# settings for GraphRAG
2426
GRAPHRAG_API_KEY=<YOUR_OPENAI_KEY>

README.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,22 @@ documents and developers who want to build their own RAG pipeline.
170170
### Setup GraphRAG
171171

172172
> [!NOTE]
173-
> Currently GraphRAG feature only works with OpenAI or Ollama API.
173+
> Official MS GraphRAG indexing only works with OpenAI or Ollama API.
174+
> We recommend most users to use NanoGraphRAG implementation for straightforward integration with Kotaemon.
175+
176+
<details>
177+
178+
<summary>Setup Nano GRAPHRAG</summary>
179+
180+
- Install nano-GraphRAG: `pip install nano-graphrag`
181+
- Launch Kotaemon with `USE_NANO_GRAPHRAG=true` environment variable.
182+
- Set your default LLM & Embedding models in Resources setting and it will be recognized automatically from NanoGraphRAG.
183+
184+
</details>
185+
186+
<details>
187+
188+
<summary>Setup MS GRAPHRAG</summary>
174189

175190
- **Non-Docker Installation**: If you are not using Docker, install GraphRAG with the following command:
176191

@@ -181,6 +196,8 @@ documents and developers who want to build their own RAG pipeline.
181196
- **Setting Up API KEY**: To use the GraphRAG retriever feature, ensure you set the `GRAPHRAG_API_KEY` environment variable. You can do this directly in your environment or by adding it to a `.env` file.
182197
- **Using Local Models and Custom Settings**: If you want to use GraphRAG with local models (like `Ollama`) or customize the default LLM and other configurations, set the `USE_CUSTOMIZED_GRAPHRAG_SETTING` environment variable to true. Then, adjust your settings in the `settings.yaml.example` file.
183198

199+
</details>
200+
184201
### Setup Local Models (for local/private RAG)
185202

186203
See [Local model setup](docs/local_model.md).

flowsettings.py

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -284,32 +284,54 @@
284284
},
285285
}
286286

287-
287+
USE_NANO_GRAPHRAG = config("USE_NANO_GRAPHRAG", default=False, cast=bool)
288+
GRAPHRAG_INDEX_TYPE = (
289+
"ktem.index.file.graph.GraphRAGIndex"
290+
if not USE_NANO_GRAPHRAG
291+
else "ktem.index.file.graph.NanoGraphRAGIndex"
292+
)
288293
KH_INDEX_TYPES = [
289294
"ktem.index.file.FileIndex",
290-
"ktem.index.file.graph.GraphRAGIndex",
295+
GRAPHRAG_INDEX_TYPE,
291296
]
292-
KH_INDICES = [
297+
298+
GRAPHRAG_INDEX = (
293299
{
294-
"name": "File",
300+
"name": "GraphRAG",
295301
"config": {
296302
"supported_file_types": (
297303
".png, .jpeg, .jpg, .tiff, .tif, .pdf, .xls, .xlsx, .doc, .docx, "
298304
".pptx, .csv, .html, .mhtml, .txt, .md, .zip"
299305
),
300306
"private": False,
301307
},
302-
"index_type": "ktem.index.file.FileIndex",
303-
},
308+
"index_type": "ktem.index.file.graph.GraphRAGIndex",
309+
}
310+
if not USE_NANO_GRAPHRAG
311+
else {
312+
"name": "NanoGraphRAG",
313+
"config": {
314+
"supported_file_types": (
315+
".png, .jpeg, .jpg, .tiff, .tif, .pdf, .xls, .xlsx, .doc, .docx, "
316+
".pptx, .csv, .html, .mhtml, .txt, .md, .zip"
317+
),
318+
"private": False,
319+
},
320+
"index_type": "ktem.index.file.graph.NanoGraphRAGIndex",
321+
}
322+
)
323+
324+
KH_INDICES = [
304325
{
305-
"name": "GraphRAG",
326+
"name": "File",
306327
"config": {
307328
"supported_file_types": (
308329
".png, .jpeg, .jpg, .tiff, .tif, .pdf, .xls, .xlsx, .doc, .docx, "
309330
".pptx, .csv, .html, .mhtml, .txt, .md, .zip"
310331
),
311332
"private": False,
312333
},
313-
"index_type": "ktem.index.file.graph.GraphRAGIndex",
334+
"index_type": "ktem.index.file.FileIndex",
314335
},
336+
GRAPHRAG_INDEX,
315337
]
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
from .graph_index import GraphRAGIndex
2+
from .nano_graph_index import NanoGraphRAGIndex
23

3-
__all__ = ["GraphRAGIndex"]
4+
__all__ = ["GraphRAGIndex", "NanoGraphRAGIndex"]
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
from typing import Any
2+
3+
from ..base import BaseFileIndexRetriever
4+
from .graph_index import GraphRAGIndex
5+
from .nano_pipelines import NanoGraphRAGIndexingPipeline, NanoGraphRAGRetrieverPipeline
6+
7+
8+
class NanoGraphRAGIndex(GraphRAGIndex):
9+
def _setup_indexing_cls(self):
10+
self._indexing_pipeline_cls = NanoGraphRAGIndexingPipeline
11+
12+
def _setup_retriever_cls(self):
13+
self._retriever_pipeline_cls = [NanoGraphRAGRetrieverPipeline]
14+
15+
def get_retriever_pipelines(
16+
self, settings: dict, user_id: int, selected: Any = None
17+
) -> list["BaseFileIndexRetriever"]:
18+
_, file_ids, _ = selected
19+
retrievers = [
20+
NanoGraphRAGRetrieverPipeline(
21+
file_ids=file_ids,
22+
Index=self._resources["Index"],
23+
)
24+
]
25+
26+
return retrievers

0 commit comments

Comments
 (0)