-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add support for RAG specialized model PleIAs_Pleias-Nano #3343
Comments
Looking at the model it seems to have some specialized chat template for RAG that we'd need to adapt. Fortunately, the new jinja system will make this considerably easier. The truth is we would need some more help from community to step up and do the coding to get this supported in short term. The coding would be a combination of c++/qml and jinja. I am happy to help and can offer advice/guidance and even screen share/ tutorials for a dedicated member of community who wanted to step up implement. If no one from community steps up it will be a more long/medium term situation and would have to be triaged for priority. |
Thank you for your offer. The benchmarks sound really good, the model seems to be best in its parameter class, so small that it is really fast in terms of t/s) and licensing is phantastic, but it only supports a 2k context window. I can fiddle with jinja, but c++/qml would be a first for me. You would need to help me a lot. If it is difficult to implement, let's postpone and wait for a version that supports longer context. Meanwhile, users can use llama-3.2-3b or qwen 3b. IMHO, fixing jinja stuff is more important right now. |
The Jinja stuff can be fixed in some combo of four possible ways:
#1 is obviously the best solution, but perhaps also the one that takes the most time. #2 is a good stop gap, but there will always be models missing. #3 we should do anyway. #4 is also a great idea as modified templates will be necessary to support all the internal tools we plan on adding. Keep in mind the majority of users use the curated models. The ones who are doing sideloading are the minority and also the ones who should be most capable of #4. Hopefully now with the new reasoning capability people will sympathize with the reason we made this change. It unlocks a whole lot of future features and functionality. |
Feature Request
Add support for https://huggingface.co/PleIAs/Pleias-Nano
The text was updated successfully, but these errors were encountered: