This project is a sophisticated chatbot built with Retrieval-Augmented Generation (RAG) to assist researchers and developers working with the NOMAD platform. It can answer questions about documentation, features, and best practices by retrieving relevant information from a knowledge base and generating concise, accurate answers.
The project features a standalone FastAPI backend for the RAG pipeline, a Gradio web interface for user interaction, and a complete evaluation suite.
This project follows a standard src layout to separate source code from project configuration and data. The core logic is located within the src/nomad_ragbot package.
nomad-bot-rag-docs-discord/
├── .env # Local environment variables (ignored by Git)
├── .env.example # Template for environment variables
├── pyproject.toml # Project metadata and dependencies
├── uv.lock # Pinned versions for reproducible builds
├── data/ # Holds the input data for the knowledge base
├── chroma_store/ # Local vector database storage (ignored by Git)
└── src/
└── nomad_ragbot/
├── api/ # FastAPI Backend
│ ├── main.py
│ ├── config.py
│ └── ...
│
├── query/ # Core RAG logic and query engine
│ └── query.py
│
├── gradio_app.py # Standalone Gradio Web UI
├── llm_client.py # Client for interacting with the LLM
└── eval/ # Evaluation scripts and dashboard logic
src/nomad_ragbot/api/: A self-contained FastAPI application that serves the RAG pipeline. It handles indexing the data into a ChromaDB vector store and exposing an/askendpoint.src/nomad_ragbot/query/: The heart of the RAG system. It contains theRAGQueryEnginewhich manages retrieving context, reranking results, and generating answers.src/nomad_ragbot/gradio_app.py: A standalone Gradio web interface for easy interaction with the chatbot. It calls the RAG logic directly.data/: Your source documents (e.g.,docs.dynamic.jsonl) that will be indexed into the vector database.chroma_store/: The directory where the Chroma vector database is persisted locally. This is automatically generated.
Follow these steps to set up your local environment. This project uses uv for fast package and environment management.
First, clone the repository to your local machine.
git clone [https://github.com/FAIRmat-NFDI/nomad-bot-rag-docs-discord.git](https://github.com/FAIRmat-NFDI/nomad-bot-rag-docs-discord.git)
cd nomad-bot-rag-docs-discordNext, create a virtual environment and install all necessary dependencies using uv.
# Create a virtual environment named .venv
uv venv
# Activate the environment (on macOS/Linux)
source .venv/bin/activate
# Install all packages from pyproject.toml
uv syncCopy the example environment file and edit it with your local settings.
cp .env.example .envNow, open the .env file and configure the paths and model endpoints. The defaults should work for a local setup.
# .env file
# --- Paths ---
JSONL_PATH="data/chunks/docs.dynamic.jsonl"
CHROMA_DIR="chroma_store"
# --- Model Endpoints ---
EMBED_BASE_URL="[http://127.0.0.1:11434](http://127.0.0.1:11434)"
GENERATOR_BASE_URL="[http://127.0.0.1:11434/v1](http://127.0.0.1:11434/v1)"
# You can also customize the models used by the RAG engine here
# EMBED_MODEL_NAME="nomic-embed-text"
# GENERATOR_MODEL="gpt-oss:20b"The API server and the Gradio UI are two separate applications. You must run them in two separate terminals.
This server handles the RAG logic and indexing. The first time you run it, it will build the ChromaDB vector store, which may take a few minutes.
uvicorn src.nomad_ragbot.api.main:app --reloadThe API will be available at http://127.0.0.1:8000.
This command launches the user-friendly web interface for asking questions.
uv run python -m src.nomad_ragbot.gradio_appYou can now open your browser and navigate to http://127.0.0.1:7860 to interact with the chatbot!
The project includes a suite for evaluating the performance of the RAG pipeline.
Install the project in editable mode with the optional [eval] dependencies.
pip install -e ".[eval]"Execute the evaluation script against a "golden dataset" of questions and answers.
ragbot-eval --data_path data/evaluation/gold_all.jsonl --out_dir runs/your-run-name --use_llm_judgeLaunch the evaluation dashboard to visualize the results from your run.
ragbot-eval-dash --results_path runs/your-run-name/eval_results.parquet