feat(rag): secret vault injection #64

wwwehr · 2025-02-21T18:38:46Z

this is not yet ready for review

repaired nilql pip import updating to nilrag that pins to nilql@alpha10 added support for git branches in pyproject requirements removed gpu affinity for api service

…tadata based on arbitary values; TODO: add retry mechanism to the llm transform so that it retries until getting the schema format right

…ul; added output to model response

…at the nildb save always uses a worker/tools model

…ons for templates

caddy/Caddyfile

docker-compose.dev.yml

docker-compose.yml

docker/api.Dockerfile

jcabrero · 2025-03-04T12:26:30Z

docker/compose/docker-compose.npw.deepseek14.yml

Does this differ significantly in behaviour from the original Deepseek 14B docker compose that we had before? If the behaviour is better (which I assume, given your improvements), would it make sense to merge both into a single Deepseek 14B file? This avoids having to maintain multiple source files after this.

I committed the new one so that we can discuss the differences to see if we want to integrate new points.

I noticed that start up of two models simultaneously on the H100 often failed. I put in the depends_on for the worker model. I propose we always have a worker model available, but it doesn't need to live on the same server...

I didn't include the tensor-parallel-size, it's set to the default value

I added the "reasoning" MODEL_ROLE flag

wdyt?

I don't really know. I have a discussion with ecosystem and product to decide which models we include by default. I wonder whether the worker model preserves privacy and it's something we are willing to live with.

docker/compose/docker-compose.npw.llama32.yml

docker/compose/docker-compose.npw.watt.yml

jcabrero · 2025-03-04T12:30:02Z

docker/compose/docker-compose.npw.llama32.yml

How do we use this MODEL_ROLE in practice? Is this something standardized that LangChain or autogen use in any way? If this is the case, and given we've got it as a model parameter description, I would add it to the rest of the docker-compose files.

from the client, I needed a way to distinguish between "this is a reasoning model" and "this is a worker model". rather than a client needing knowledge of the model itself, as a developer, I simply want a "tag" that informs me so that I can automatically select the kind of model I want. wdyt?

That sounds perfect. I agree. As I comment in the review, I think we should extend it to all models. Potentially, even considering it an enum or a list of potential values in a Pydantic model.

docker/compose/tool_chat_template_DeepSeek-R1-Distill-Llama-70B.jinja

docker/compose/tool_chat_template_DeepSeek-R1-Distill-Qwen-32B-AWQ.jinja

nilai-api/pyproject.toml

nilai-api/src/nilai_api/routers/private.py

jcabrero

I think the PR is mostly ready. However, I wonder whether we should handle cases where the functionality depends on the Watt model or a “worker” model. If neither exists and users try to use NilDB, it won’t work. We should probably add a check to either ignore NilDB or return an error if no model worker is available, even before proceeding with trying to connect to nilDB, inference, etc...

docker/compose/docker-compose.llama-1b-gpu.yml

docker/compose/docker-compose.npw.deepseek14.yml

nilai-api/src/nilai_api/config/mainnet.py

packages/nilai-common/src/nilai_common/api_model.py

nilai-api/src/nilai_api/vault.py

nilai-api/src/nilai_api/routers/private.py

… backend service instead of inline gpu verifier

…w other types

wwwehr · 2025-03-26T00:24:50Z

@jcabrero one added benefit of my validator move was that we don't host a copy of the vendor repository

wwwehr added 9 commits February 20, 2025 08:01

changes prior to deploying and testing on H100 TEE machine

e04e4e0

repaired nilql pip import updating to nilrag that pins to nilql@alpha10 added support for git branches in pyproject requirements removed gpu affinity for api service

added tool support; added model "role" concept for filtering model me…

fc009f1

…tadata based on arbitary values; TODO: add retry mechanism to the llm transform so that it retries until getting the schema format right

transforming inference output into schema and saving records successf…

6599a49

…ul; added output to model response

removed some vi swap files

7c6c361

added data injection pattern

44dcb57

added common routine for building internal chat client; make it so th…

28b8b21

…at the nildb save always uses a worker/tools model

added deepseek docker compose

8ba3c2c

added jinja templates

219bafc

lint fixes

159cc3e

wwwehr requested a review from jcabrero March 2, 2025 23:57

wwwehr marked this pull request as ready for review March 2, 2025 23:57

added healthcheck for watt; use lower resources; absolute file locati…

160cc7f

…ons for templates