Add retrieval RequestProcessor and end-to-end RAG examples #148

frreiss · 2025-04-16T02:03:12Z

This PR adds a RequestProcessor that performs the retrieval phase of the RAG pattern. There is a basic implementation that uses an in-memory vector database and an extension point for adding support for other vector databases in the future.

The PR also includes tests for the new functionality.
I have refactored the RequestProcessor for hallucinations so that it can also be used to perform query rewrite.

This PR also includes a notebook that shows several end-to-end RAG examples that use different combinations of intrinsics.

Signed-off-by: Fred Reiss <[email protected]>

hickeyma

@frreiss Do you mind fixing the gate issues?

Signed-off-by: Fred Reiss <[email protected]>

frreiss · 2025-04-16T17:26:52Z

Linter issues fixed.

@hickeyma and @markstur I'm seeing test failures because the test cases can't download the data files they need. Is there a way to get around that limitations?

Signed-off-by: Fred Reiss <[email protected]>

frreiss · 2025-04-16T18:29:47Z

CI problems with base data fixed now, but am seeing issues with using an embedding model from SentenceTransformers:

@staticmethod
      def load(input_path) -> Pooling:
  >       with open(os.path.join(input_path, "config.json")) as fIn:
  E       FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/.cache/huggingface/hub/models--sentence-transformers--multi-qa-mpnet-base-dot-v1/snapshots/4633e80e17ea975bc090c97b049da[260](https://github.com/ibm-granite/granite-io/actions/runs/14499640836/job/40676006992?pr=148#step:8:262)62b054d3/1_Pooling/config.json'

@hickeyma @markstur is there a special trick to get models into the Hugging Face cache directory for CI?

Signed-off-by: Fred Reiss <[email protected]>

Move to test workflow as its not Ollama specific. Signed-off-by: Martin Hickey <[email protected]>

hickeyma · 2025-04-17T13:53:50Z

@frreiss I pushed commit ce00ef3 and that fixes the download of embedding model issue.

There are now the following issues:

Test fail: AssertionError: assert [-0.110380716...87530518, ...] == approx([-0.11...17 ± 1.2e-08])
There are also issues with Nvidia GPUs not available: RuntimeError: Found no NVIDIA driver on your system. Can potentially use the following check: if torch.cuda.is_available():

Do you mind addressing those issues?

Signed-off-by: Fred Reiss <[email protected]>

frreiss · 2025-04-17T20:41:16Z

Tests are passing now.

Did an internal review of the notebook rag.ipynb with the researchers who created the models involved this morning; some additional changes recommended before we merge this PR.

hickeyma

Thanks @frreiss for the PR.

Overall, it looks good. Small nit inline and maybe could squash the 2 notebooks into 1 as rag.ipynb incorporates retrieval.ipynb.

You mentioned that you wanted to update the notebooks. I am going to merge for now and lets do that in a follow up PR.

hickeyma · 2025-04-18T10:18:02Z

src/granite_io/io/retrieval/util.py

+        os.makedirs(target_root)
+
+    part_num = 1
+    repo_root = "https://github.com/frreiss/mt-rag-embeddings"


Ok for the moment. However, need to find a repo which is part of the community.

Add retrieval and end-to-end RAG examples

6dcf4cd

Signed-off-by: Fred Reiss <[email protected]>

frreiss requested review from hickeyma and markstur April 16, 2025 02:03

hickeyma reviewed Apr 16, 2025

View reviewed changes

frreiss added 2 commits April 16, 2025 09:56

Make linter happy

c7695d2

Signed-off-by: Fred Reiss <[email protected]>

More linter stuff

1deeef1

Signed-off-by: Fred Reiss <[email protected]>

Shrink data for tests and check into repo

f42de70

Signed-off-by: Fred Reiss <[email protected]>

frreiss and others added 3 commits April 16, 2025 13:22

Download embedding model

8cffc2a

Signed-off-by: Fred Reiss <[email protected]>

Another attempt at downloading

2e6a27f

Signed-off-by: Fred Reiss <[email protected]>

Use HF CLI to download embedding model for tests

ce00ef3

Move to test workflow as its not Ollama specific. Signed-off-by: Martin Hickey <[email protected]>

Fix tests

ed1c1ae

Signed-off-by: Fred Reiss <[email protected]>

hickeyma approved these changes Apr 18, 2025

View reviewed changes

hickeyma merged commit 18b663a into ibm-granite:main Apr 18, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retrieval RequestProcessor and end-to-end RAG examples #148

Add retrieval RequestProcessor and end-to-end RAG examples #148

Uh oh!

frreiss commented Apr 16, 2025

Uh oh!

hickeyma left a comment

Uh oh!

frreiss commented Apr 16, 2025

Uh oh!

frreiss commented Apr 16, 2025

Uh oh!

hickeyma commented Apr 17, 2025 •

edited

Loading

Uh oh!

frreiss commented Apr 17, 2025 •

edited

Loading

Uh oh!

hickeyma left a comment

Uh oh!

hickeyma Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

Add retrieval RequestProcessor and end-to-end RAG examples #148

Add retrieval RequestProcessor and end-to-end RAG examples #148

Uh oh!

Conversation

frreiss commented Apr 16, 2025

Uh oh!

hickeyma left a comment

Choose a reason for hiding this comment

Uh oh!

frreiss commented Apr 16, 2025

Uh oh!

frreiss commented Apr 16, 2025

Uh oh!

hickeyma commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frreiss commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hickeyma left a comment

Choose a reason for hiding this comment

Uh oh!

hickeyma Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hickeyma commented Apr 17, 2025 •

edited

Loading

frreiss commented Apr 17, 2025 •

edited

Loading