-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rag): secret vault injection #64
base: main
Are you sure you want to change the base?
Conversation
repaired nilql pip import updating to nilrag that pins to nilql@alpha10 added support for git branches in pyproject requirements removed gpu affinity for api service
…tadata based on arbitary values; TODO: add retry mechanism to the llm transform so that it retries until getting the schema format right
…ul; added output to model response
…at the nildb save always uses a worker/tools model
…ons for templates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this differ significantly in behaviour from the original Deepseek 14B docker compose that we had before? If the behaviour is better (which I assume, given your improvements), would it make sense to merge both into a single Deepseek 14B file? This avoids having to maintain multiple source files after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I committed the new one so that we can discuss the differences to see if we want to integrate new points.
- I noticed that start up of two models simultaneously on the H100 often failed. I put in the
depends_on
for the worker model. I propose we always have a worker model available, but it doesn't need to live on the same server... - I didn't include the tensor-parallel-size, it's set to the default value
- I added the "reasoning" MODEL_ROLE flag
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know. I have a discussion with ecosystem and product to decide which models we include by default. I wonder whether the worker model preserves privacy and it's something we are willing to live with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we use this MODEL_ROLE
in practice? Is this something standardized that LangChain
or autogen
use in any way? If this is the case, and given we've got it as a model parameter description, I would add it to the rest of the docker-compose files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the client, I needed a way to distinguish between "this is a reasoning model" and "this is a worker model". rather than a client needing knowledge of the model itself, as a developer, I simply want a "tag" that informs me so that I can automatically select the kind of model I want. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds perfect. I agree. As I comment in the review, I think we should extend it to all models. Potentially, even considering it an enum
or a list of potential values in a Pydantic
model.
docker/compose/tool_chat_template_DeepSeek-R1-Distill-Llama-70B.jinja
Outdated
Show resolved
Hide resolved
docker/compose/tool_chat_template_DeepSeek-R1-Distill-Qwen-32B-AWQ.jinja
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is mostly ready. However, I wonder whether we should handle cases where the functionality depends on the Watt model or a “worker” model. If neither exists and users try to use NilDB, it won’t work. We should probably add a check to either ignore NilDB or return an error if no model worker is available, even before proceeding with trying to connect to nilDB, inference, etc...
"""Build a validator to validate the candidate document against loaded schema.""" | ||
return validators.extend(Draft7Validator) | ||
|
||
def data_reveal(self, filter: dict = {}) -> list[dict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above for: filter: dict{}
. Can you make it into a Pydantic model?
logger.info(f"Error retrieving records in node: {e!r}") | ||
return [] | ||
|
||
def post(self, data_to_store: list) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment for data_to_store
.
this is not yet ready for review