-
Notifications
You must be signed in to change notification settings - Fork 47
Shark May-25 release #951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Observations from responses of llama serving
CC: @pdhirajkumarprasad @kumardeepakamd @rsuderman @stbaione we still have answer repetition for ![]() For Prompt
o/p
|
In the image below, first invocation is the response for Llama-3.1-70B-Instruct and the second is for Llama-3.1-70B. The response for the Llama-3.1-70B has additional unrelated text following the correct response. gguf is generated with fp16 output type using llama.cpp |
I tested SDXL, FLUX-Dev, and the FLUX-Schnell model with the SharkUI. All three are working fine, but both flux models have to be run with |
For the llama 3.1 405B variant, I am running into the following error. Is it because of any missing flags / optimizations while compiling the model?
Followed the commands specified here to generate mlir and vmfb |
@PhaneeshB has merged his PR nod-ai/shark-ai#1375 to fix the precompiled issue. He has uploaded all the required MLIR and vmfb files. I've tested it locally, it is working fine at my end. @pdhirajkumarprasad we would need Phaneesh's changes in the release, so can you please suggest who will be taking care of it? |
@pravg-amd @IanNod @pdhirajkumarprasad test Llama-3.1-8B f16 without server on 0501, result looks good. llama_8b_f16_0501.sh
|
llama 3.1 405B-Instruct variant works using the following commands as suggested by @AmosLewis
We need to update the docs with the above options for tp8 |
Tested flux (both dev and schnell) with
|
@pravg-amd can you please create PR to update the steps for 405B as well as README for expected o/p in json form. |
Created a PR nod-ai/shark-ai#1386 |
Uh oh!
There was an error while loading. Please reload this page.
Version used: pip freeze | grep -E 'iree|shark|shortfin'
QA status
P : It's working fine and no issue
F : Failed, add detail of issue
The text was updated successfully, but these errors were encountered: