Cache the model’s state to pre-eval large responses #11022
Unanswered
CodeDruidX
asked this question in
Q&A
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Is it possible to partially cache the model’s response in the same way as the
--prompt-cache
?. The thing is that I need to re-generate very large responses to the same short prompt, varying only the seed to generate the very last token of response.I understand that this task would be better handled by the classic GPT architecture without internal state, but I would like to implement something similar with llama. It seems to me that for this i need to somehow learn how to save the internal state of the model for further reuse...
Beta Was this translation helpful? Give feedback.
All reactions