Skip to content

Misc. bug: Decreased success rate for tool calling #13769

Open
@jean-rl

Description

@jean-rl

Name and Version

version: 5478 (f5cd27b)

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Command line

.\llama-server.exe -m "C:\Users\Jean\AppData\Local\llama.cpp\bartowski_Meta-Llama-3.1-8B-Instruct-GGUF_Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf" --port 8080 -v --verbose-prompt --log-colors --special --jinja

Problem description & steps to reproduce

While performing a small test for function calling that I did previously with release version b5142 I noticed that the current release I'm testing [b5478] is not parsing the tool calls correctly. Here is the code I used to test this:

import openai

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current temperature for a given location.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country e.g. Bogotá, Colombia"
                }
            },
            "required": [
                "location"
            ]
        }
    }
}]

client = openai.OpenAI(
    base_url="http://localhost:8080/v1", 
    api_key = "sk-no-key-required"
)

for i in range(10):
    print(f"Tool call {i+1}:")
    completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "user", "content": "What is the weather in Bogotá, Colombia?"},
    ],
    tools=tools,
    tool_choice="auto",
    parallel_tool_calls=False,
    )
    print("Completion: ", completion.__verbose['content'])
    if completion.choices[0].message.tool_calls is None:
        print("No tool calls made.")
        print(completion.choices[0].message.content)
    else:
        print("Tool call made!")
        print(completion.choices[0].message.tool_calls)

Previous version b5142 made calls 10 out of 10 times while current one does 2 out of 10 times. Current version's completion.__verbose['content'] when no tool calls are made is <|python_tag|>{"name": "get_weather", "parameters": {"location": "Bogot\u00f1a, Colombia"}} while previous version is <|python_tag|>{"type": "function", "name": "get_weather", "parameters": {"location": "Bogot\u00f1a, Colombia"}}.

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions