GitHub - katanemo/Arch-Function

Arch-Function: Advanced Function Calling Models

Arch-Function represents a comprehensive research and development initiative focused on creating state-of-the-art function calling capabilities in large language models. Our mission is to build AI systems that can seamlessly understand, interpret, and execute complex function calls with unprecedented accuracy and reliability.

This project encompasses multiple model families specifically engineered for function calling tasks, designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. The current release includes three major collections with models available in multiple sizes, with additional breakthrough models planned for future releases that will further advance the state-of-the-art in function calling capabilities.

📰 News & Updates

[2025-06]: 🏆🏆🏆 Arch-Agent collection released for advanced multi-turn, multi-step workflow automation, achieving Top-3 performance on the BFCL Leaderboard!
[2025-02]: 🚀🚀🚀 Arch-Function-Chat collection launched with conversational function calling capabilities!
[2024-12]: 🔥🔥🔥 Complete model suite updated with latest improvements across all sizes for Arch-Function collection!
[2024-09]: 🏆🏆🏆 Arch-Function collection officially launched on Hugging Face, achieving Top-7 performance on the BFCL Leaderboard!

🚀 Current Model Collections

Collection 1: Base Function Calling Models

Hugging Face Collection: Arch-Function

Model Name	Size	Key Features	Downloads
Arch-Function-1.5B	1.5B	• Compact size for edge deployment • Efficient function calling • Low resource requirements	🤗 HuggingFace
Arch-Function-3B	3B	• Balanced performance and efficiency • High accuracy function calling • Production-ready	🤗 HuggingFace
Arch-Function-7B	7B	• Maximum performance • Complex function handling • Enterprise-grade capabilities	🤗 HuggingFace

Collection 2: Chat-Optimized Models

Hugging Face Collection: Arch-Function-Chat

Model Name	Size	Key Features	Downloads
Arch-Function-Chat-1.5B	1.5B	• Conversational function calling • Interactive agent capabilities • Lightweight deployment	🤗 HuggingFace
Arch-Function-Chat-3B	3B	• Advanced dialogue management • Context-aware function usage • Multi-turn conversations	🤗 HuggingFace
Arch-Function-Chat-7B	7B	• Sophisticated reasoning • Complex multi-step workflows • Premium chat experience	🤗 HuggingFace

Collection 3: Agentic Models

Hugging Face Collection: Arch-Agent

Model Name	Size	Key Features	Downloads
Arch-Agent-1.5B	1.5B	• Lightweight autonomous workflows • Edge-optimized performance • Low resource requirements	🤗 HuggingFace
Arch-Agent-3B	3B	• Balanced autonomous performance • Multi-step task execution • High accuracy workflows	🤗 HuggingFace
Arch-Agent-7B	7B	• Advanced autonomous behavior • Complex workflow orchestration • Maximum performance	🤗 HuggingFace
Arch-Agent-32B	32B	• Premium autonomous systems • Sophisticated multi-step workflows • Superior capabilities	🤗 HuggingFace

📚 1. Fine-tuning Arch-Function Models

Here we provide a script to fine-tune Arch-Function models with LLaMA-Factory:

1.1 Set up environment

Create the environment following the instructions of LLaMA-Factory
If you would like to use deepspeed and flash-attn, you can install packages with the following command:

pip install deepspeed
pip install flash-attn --no-build-isolation

1.2 Prepare training data

LLaMA-Factory supports datasets in alpaca and sharegpt format. We recommend using the sharegpt format for function calling tasks. Below is an example of dataset in:

[
	{
		"conversations": [
			{
				"from": "human",
				"value": "user instruction"
			},
			{
				"from": "function_call",
				"value": "tool arguments"
			},
			{
				"from": "observation",
				"value": "tool result"
			},
			{
				"from": "gpt",
				"value": "model response"
			}
		],
		"system": "system prompt (optional)",
		"tools": "tool description (optional)"
	}
]

Next, update data/dataset_info.json with the dataset description below:

"dataset_name": {
	"file_name": "data.json",
	"formatting": "sharegpt",
	"columns": {
		"messages": "conversations",
		"system": "system",
		"tools": "tools"
	}
}

1.3 Training

LLaMA-Factory provides diverse examples of training for LLMs under examples. You can follow these examples and create a training script for your purpose. To kick off training, run the following command:

CUDA_VISIBLE_DEVICES={YOUR_DEVICE_IDS} llamafactory-cli train {PATH_TO_YOUR_TRAINING_SCRIPT}

Key considerations for fine-tuning:

Prepare high-quality function calling examples with proper format
Use gradient accumulation for larger effective batch sizes
Monitor validation loss to prevent overfitting
Consider using LoRA for parameter-efficient fine-tuning

📚 2. Inference with Arch-Function Models

To run inference with Arch-Function models for function calling tasks, follow the steps below:

2.1 Set up environment

Arch-Function models have been in the Hugging Face transformers library and we advise you to install latest version with the following command:

pip install transformers>=4.51.0

2.2 Inference

Below is a script demonstrating how to use Arch-Function models for function calling tasks.

2.2.1 Create models and tokenizers

You can specify the desired model name and create models and corresponding tokenizers with the following script:

import json
from typing import Any, Dict, List
from transformers import AutoModelForCausalLM, AutoTokenizer

# Specify the desired model name here
model_name = "katanemo/Arch-Agent-7B"

model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

2.2.2 Format prompts

Our models perform best when using the recommended prompt format, which can be found in the corresponding model cards on Hugging Face. You can run the following script to format prompts:

# Please use the recommended prompt for each model.
TASK_PROMPT = (
    "You are a helpful assistant designed to assist with the user query by making one or more function calls if needed."
    "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\n"
    "You are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{tool_text}"
    "\n</tools>\n\nFor each function call, return a json object with function name and arguments within "
    """<tool_call></tool_call> XML tags:\n<tool_call>\n{{"name": <function-name>, """
    """"arguments": <args-json-object>}}\n</tool_call>"""
)

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "str",
                        "description": "The city and state, e.g. San Francisco, New York",
                    },
                    "unit": {
                        "type": "str",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return",
                    },
                },
                "required": ["location"],
            },
        },
    }
]


# Helper function to create the system prompt for our model
def format_prompt(tools: List[Dict[str, Any]]):
    tool_text = "\n".join(
        [json.dumps(tool["function"], ensure_ascii=False) for tool in tools]
    )
    return TASK_PROMPT.format(tool_text=tool_text)


system_prompt = format_prompt(tools)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "What is the weather in Seattle?"},
]

2.2.3 Run inference

Now, you can run the following script to do inference with Arch-Function models.

#### 2.2.3 Run inference
model_inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Inference optimization tips:

Use appropriate temperature settings (0.0 - 0.1 for function calling)
User proper prompt formatting for best results
Consider batching for multiple requests
Use quantized models for faster inference

📚 3. Deployment with Popular Hosting Frameworks

Below we show how to deploy Arch-Function models using popular model hosting frameworks.

3.1 vLLM Deployment

vLLM provides high-throughput serving with advanced optimizations. Following the steps below to deploy Arch-Function models with vLLM

3.1.1 Set up environment

# Install vLLM
pip install vllm

3.1.2 Start vLLM server

vllm serve katanemo/Arch-Agent-7B \
    --host 127.0.0.1 \
    --port 8000 \
    --tensor-parallel-size 1

3.1.3 Get responses

To get responses from the vLLM server for function calling, first format prompts following here. Then, replace messages in the script below with the formatted prompts and run the script.

from openai import OpenAI

# Point to the local server
client = OpenAI(
    api_key="EMPTY",
    base_url="http://127.0.0.1:8000/v1",
)

# Send requests and get responses from the server
completion = client.chat.completions.create(
    model="katanemo/Arch-Agent-7B",
    messages=[
        {"role": "user", "content": "Get the current temperature in San Francisco"}
    ],
    temperature=0.01,
    max_tokens=1024
)

print(completion.choices[0].message.content)

3.2 ollama Deployment

ollama provides easy local deployment with automatic model management. Below we provide scripts to show how to use ollama for deployment.

3.2.1 Install ollama

Please see ollama for installation. If necessary, use the following command to install ollama python library.

pip install ollama

3.2.2 Start ollama server

Specify your desired model name below and run the follwoing command to start the ollama server. Note that ollama only supports gguf format.

ollama run hf.co/katanemo/Arch-Agent-7B.gguf

3.2.3 Get responses

Format prompts following here, and the replace formatted_prompt in the script below and run the script to get responses.

from ollama import Client

# Point to the local server. By default, it uses port 11434.
client = Client(host="http://127.0.0.1:11434")

# Send requests and get responses from the server
completion = client.chat(
    model="hf.co/katanemo/Arch-Agent-1.5B.gguf",
    messages=[
        {"role": "user", "content": "Get the current temperature in San Francisco"}
    ],
    options={"temperature": 0.01, "num_ctx": 1024}
)

print(completion.message.content)

3.3 SGLang Deployment

SGLang offers structured generation capabilities with high performance. To use SGLang for deployment, follow the steps below.

3.3.1 Set up experiment

# Install SGLang
pip install sglang[all]

3.3.2 Start SGLang server

python -m sglang.launch_server \
    --model-path katanemo/Arch-Agent-7B \
    --host 127.0.0.1 \
    --port 8000 \
    --tp 1 \
    --trust-remote-code

3.3.3 Get responses

As sglang provides OpenAI-compatible APIs, you can follow the same way as vLLM to get responses from the server. First format prompts following here. Then, replace messages in the script below with the formatted prompts and run the script.

# Client code for vLLM
from openai import OpenAI

# Point to the local server
client = OpenAI(
    api_key="EMPTY",
    base_url="http://127.0.0.1:8000/v1",
)

# 
completion = client.chat.completions.create(
    model="katanemo/Arch-Agent-7B",
    messages=[
        {"role": "user", "content": "Get the current temperature in San Francisco"}
    ],
    temperature=0.01,
    max_tokens=1024
)

print(completion.choices[0].message.content)

🔬 Research & Development

The Arch-Function project is actively developing next-generation models that will:

Further advance function calling accuracy beyond current SOTA
Introduce novel architectures optimized for tool usage
Expand to multimodal function calling capabilities
Support more complex reasoning patterns in function selection

📄 License

Please refer to the individual model pages on Hugging Face for specific licensing information.

🤝 Contributing

We welcome contributions to improve the Arch-Function tutorials and documentation! You can help by:

Fixing errors or improving existing tutorials
Adding new deployment examples or use cases
Suggesting additional framework integrations
Improving documentation clarity

Feel free to open an issue or submit a pull request with your improvements.

📞 Support

For questions and support:

Open an issue in this repository
Visit our Hugging Face Hub
Check the Katanemo organization on Github

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

katanemo/Arch-Function

Folders and files

Latest commit

History

Repository files navigation