Conversation
| data["input_extra"] = input_extra; // default to empty array if it's not exist | ||
|
|
||
| std::string prompt = json_value(data, "prompt", std::string()); | ||
| std::vector<llama_tokens> tokenized_prompts = tokenize_input_prompts(ctx_server.ctx, prompt, true, true); |
There was a problem hiding this comment.
We should probably return an error if there is more than 1 resulting prompt?
There was a problem hiding this comment.
Because above we already checked if data["prompt"] is string, here we can be sure that we only have one single prompt to deal with. Probably an GGML_ASSERT here make more sense?
(The expected behavior of tokenize_input_prompts is that if prompt is a string, then output vector size is == 1)
|
@ggerganov I added a test using Feel free to add more complicated test case(s) if you need! |
|
Ah, the infill endpoint should be used only with the So the tests should be updated to use the |
|
OK so I've tried the non-instruction model but I think the problem is related to placement of
If the prompt is placed at the beginning, it should work:
We can fix this in another PR, for now I'm gonna comment out the |
|
FIM should not add instructions such as "Complete this", as can be seen in the technical report. The |
Ok thanks for the clarification. I updated the test to reflect this. The |
|
Yes, perfect. The |
* server : fix format_infill * fix * rename * update test * use another model * update test * update test * test_invalid_input_extra_req

Should fix #10691 (comment) , I remove the
format_chat/_infill/_rerankfromhandle_completions_genericbut forgot to put it back inhandle_infill