Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli : auto activate conversation mode if chat template is available #11214

Merged
merged 6 commits into from
Jan 13, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Jan 13, 2025

The goal of this PR is to provide an automatic way to enable -cnv whenever chat template is available (either built-in or via --chat-template)

Please note that, this is not related to the recent discussion on chat UX (#11203), but rather focus on OOTB experience.

With this PR, the OOTB experience will become:

# install
brew install llama.cpp

# use local gguf file
llama-cli -m Qwen2.5-7B-Instruct-IQ2_M.gguf

# use HF hosted model
llama-cli -hf bartowski/Llama-3.2-1B-Instruct-GGUF

User can still force-disable it via -no-cnv

This also fixes #11157 since it's not obvious that some models need system message.

@ngxson ngxson requested a review from ggerganov January 13, 2025 12:44
@ngxson ngxson merged commit 84a4481 into ggerganov:master Jan 13, 2025
48 checks passed
@ggerganov
Copy link
Owner

I think ggml-ci needs to be updated after this change: https://github.com/ggml-org/ci/tree/results/llama.cpp/84/a44815f704aaed8e8edec7a39e846a975c7ba9/ggml-2-x86-cpu

ngxson added a commit to huggingface/huggingface.js that referenced this pull request Jan 17, 2025
This change is related to these upstream PR:
- ggerganov/llama.cpp#11195 allows using
tag-based repo name like on ollama
- ggerganov/llama.cpp#11214 automatically turn
on `--conversation` mode for models having chat template

Example:

```sh
# for "instruct" model, conversation mode is enabled automatically
llama-cli -hf bartowski/Llama-3.2-1B-Instruct-GGUF

# for non-instruct model, it runs as completion
llama-cli -hf TheBloke/Llama-2-7B-GGUF -p "Once upon a time,"
```
@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Jan 19, 2025

For what it's worth, this was a fairly problematic change for me as my CLI scripts suddenly broke, due to Llama-CLI now deciding that my prompts (passed with -p) are now system prompts instead because of auto-conversation.

I just have to pass -no-cnv to resume script usage, which isn't a big deal, but was quite confusing out of the blue. As a suggestion, perhaps a message under == Running in interactive mode. == that "Conversation is now the default mode; pass "--no-conversation" for scripted usage" would be prudent?

@strawberrymelonpanda
Copy link
Contributor

strawberrymelonpanda commented Jan 19, 2025

Also maybe worth mentioning the "Example usage" at the bottom of llama-cli --help could use updating.

example usage:

  text generation:     ./build/bin/llama-cli -m your_model.gguf -p "I believe the meaning of life is" -n 128

  chat (conversation): ./build/bin/llama-cli -m your_model.gguf -p "You are a helpful assistant" -cnv

The first example is no longer text gen, but conversation. The readme adds "-no-cnv" to this. The second example would work, but is redundant, and as per the readme change would need a template if it's being manually specified for an unsupported model.

aykutkardas pushed a commit to gokayfem/huggingface.js that referenced this pull request Jan 20, 2025
This change is related to these upstream PR:
- ggerganov/llama.cpp#11195 allows using
tag-based repo name like on ollama
- ggerganov/llama.cpp#11214 automatically turn
on `--conversation` mode for models having chat template

Example:

```sh
# for "instruct" model, conversation mode is enabled automatically
llama-cli -hf bartowski/Llama-3.2-1B-Instruct-GGUF

# for non-instruct model, it runs as completion
llama-cli -hf TheBloke/Llama-2-7B-GGUF -p "Once upon a time,"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Eval bug: phi 4 - input is empty
3 participants