Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add support for RAG specialized model PleIAs_Pleias-Nano #3343

Open
ThiloteE opened this issue Dec 21, 2024 · 3 comments
Open

[Feature] Add support for RAG specialized model PleIAs_Pleias-Nano #3343

ThiloteE opened this issue Dec 21, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@ThiloteE
Copy link
Collaborator

Feature Request

Add support for https://huggingface.co/PleIAs/Pleias-Nano

@ThiloteE ThiloteE added the enhancement New feature or request label Dec 21, 2024
@manyoso
Copy link
Collaborator

manyoso commented Dec 22, 2024

Looking at the model it seems to have some specialized chat template for RAG that we'd need to adapt. Fortunately, the new jinja system will make this considerably easier.

The truth is we would need some more help from community to step up and do the coding to get this supported in short term. The coding would be a combination of c++/qml and jinja.

I am happy to help and can offer advice/guidance and even screen share/ tutorials for a dedicated member of community who wanted to step up implement.

If no one from community steps up it will be a more long/medium term situation and would have to be triaged for priority.

@ThiloteE
Copy link
Collaborator Author

ThiloteE commented Dec 22, 2024

Thank you for your offer. The benchmarks sound really good, the model seems to be best in its parameter class, so small that it is really fast in terms of t/s) and licensing is phantastic, but it only supports a 2k context window. I can fiddle with jinja, but c++/qml would be a first for me. You would need to help me a lot.

If it is difficult to implement, let's postpone and wait for a version that supports longer context. Meanwhile, users can use llama-3.2-3b or qwen 3b. IMHO, fixing jinja stuff is more important right now.

@manyoso
Copy link
Collaborator

manyoso commented Dec 22, 2024

The Jinja stuff can be fixed in some combo of four possible ways:

  1. Make jinja2cpp more compatible with the python jinja parser
  2. Adding more built-in compat templates to swap with the ones detected in sideloaded ggufs
  3. Adding more curated models to our model list
  4. Educating the user base on how to modify and amend the jinja templates

#1 is obviously the best solution, but perhaps also the one that takes the most time. #2 is a good stop gap, but there will always be models missing. #3 we should do anyway. #4 is also a great idea as modified templates will be necessary to support all the internal tools we plan on adding.

Keep in mind the majority of users use the curated models. The ones who are doing sideloading are the minority and also the ones who should be most capable of #4. Hopefully now with the new reasoning capability people will sympathize with the reason we made this change. It unlocks a whole lot of future features and functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants