Why memory usage not change when add different input with GGML format #566
Unanswered
SiraHaruethaipree
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I don't know too much about GGML format. But I know memory usage in vram GPU was changed depending on input sequence if input is long sequence it will increase the memory usage like when I test with load_8_bit or load_4_bit method from huggingface. So I need to know how memory usage is always the same value when use GGML format with GPU. Please someone explain.

Thanks
Beta Was this translation helpful? Give feedback.
All reactions