Replies: 1 comment
-
This will be implemented eventually: #64 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I noticed that applictions based on llama uses different long prompts to pre-condition the model. With 7B and 13B model weights the model usually takes a while to read the prompt until process user inputs. So there is a waiting time before the first response. For example, when I use chat-13B.sh, it takes about 1 mintue before the first response from my input, yet response time after that is fairly fast.
If I understand correctly the long prompt can put the model in certain internal state, and these internal state make the model process user input as the way user expected. After the model reach this internal state, most of the original prompt does not need to be read and process again.
Is it theortical possible to store this internal state on a disk file, then read it back in a new session, instead of process the initial prompt text again? This could save a lot of initial waiting time, as long as the energy used to recompute the internal state.
I know there might be a lot of engineering implications, but just want to know whether this is a feasible idea or there are other blocking things.
Beta Was this translation helpful? Give feedback.
All reactions