Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
This commit updates the README file to include a step-by-step of how
this repository can be used to generate the RAG for the OpenShift
documentation version 4.15 as an example.

Signed-off-by: Lucas Alvares Gomes <[email protected]>
  • Loading branch information
umago committed Jan 14, 2025
1 parent 36dd15d commit 7f9f72d
Showing 1 changed file with 54 additions and 1 deletion.
55 changes: 54 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,61 @@ requirements-build.txt
requirements.txt
```

it is neede to start the following command:
The following command must be executed:

```bash
scripts/generate_packages_to_prefetch.py
```

# Generating the RAG for OpenShift

This guide outlines the steps for generating the OpenShift Lightspeed RAG.

Install the dependencies and activate the virtualenv:

```
pdm install
source .venv/bin/activate
```

## Download the OCP documentation

The command below downloads the OCP documentation version 4.15 and
converts it to plain text:

```
./scripts/get_ocp_plaintext_docs.sh 4.15
```

Note, this step requires the command "asciidoctor" to be installed. See
https://docs.asciidoctor.org/asciidoctor/latest/install for installation
instructions.

## Download the embedding model

The embedding model used by OpenShift Lightspeed is the
**sentence-transformers/all-mpnet-base-v2**, in order to download it run
the following command:

```
python scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2
```

## Generating the RAG vector database

In order to generating the RAG vector database using the
**sentend-transformers/all-mpnet-base-v2** embedding model and OpenShift
documentation version 4.15 run the following commands:

```
mkdir -p vector_db/ocp_product_docs/4.15
python scripts/generate_embeddings.py -o ./vector_db/ocp_product_docs/4.15 -f ocp-product-docs-plaintext/4.15/ -r runbooks/ -md embeddings_model/ -mn sentence-transformers/all-mpnet-base-v2 -v 4.15 -i ocp-product-docs-4_15
```

Once the command is done, you can find the vector database at
**vector_db/**, the embedding model at **embeddings_model/** and the
Index ID set to **ocp-product-docs-4_15**.

These dictories and index ID can now be used to configure OpenShift
Lightspeed.

0 comments on commit 7f9f72d

Please sign in to comment.