Skip to content

katanemo/Arch-Function

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Arch-Function: Advanced Function Calling Models

Hugging Face Spaces Discord License

Arch-Function represents a comprehensive research and development initiative focused on creating state-of-the-art function calling capabilities in large language models. Our mission is to build AI systems that can seamlessly understand, interpret, and execute complex function calls with unprecedented accuracy and reliability.

This project encompasses multiple model families specifically engineered for function calling tasks, designed to understand complex function signatures, identify required parameters, and produce accurate function call outputs based on natural language prompts. The current release includes three major collections with models available in multiple sizes, with additional breakthrough models planned for future releases that will further advance the state-of-the-art in function calling capabilities.

πŸ“° News & Updates

πŸš€ Current Model Collections

Collection 1: Base Function Calling Models

Hugging Face Collection: Arch-Function

Model Name Size Key Features Downloads
Arch-Function-1.5B 1.5B β€’ Compact size for edge deployment
β€’ Efficient function calling
β€’ Low resource requirements
πŸ€— HuggingFace
Arch-Function-3B 3B β€’ Balanced performance and efficiency
β€’ High accuracy function calling
β€’ Production-ready
πŸ€— HuggingFace
Arch-Function-7B 7B β€’ Maximum performance
β€’ Complex function handling
β€’ Enterprise-grade capabilities
πŸ€— HuggingFace

Collection 2: Chat-Optimized Models

Hugging Face Collection: Arch-Function-Chat

Model Name Size Key Features Downloads
Arch-Function-Chat-1.5B 1.5B β€’ Conversational function calling
β€’ Interactive agent capabilities
β€’ Lightweight deployment
πŸ€— HuggingFace
Arch-Function-Chat-3B 3B β€’ Advanced dialogue management
β€’ Context-aware function usage
β€’ Multi-turn conversations
πŸ€— HuggingFace
Arch-Function-Chat-7B 7B β€’ Sophisticated reasoning
β€’ Complex multi-step workflows
β€’ Premium chat experience
πŸ€— HuggingFace

Collection 3: Agentic Models

Hugging Face Collection: Arch-Agent

Model Name Size Key Features Downloads
Arch-Agent-1.5B 1.5B β€’ Lightweight autonomous workflows
β€’ Edge-optimized performance
β€’ Low resource requirements
πŸ€— HuggingFace
Arch-Agent-3B 3B β€’ Balanced autonomous performance
β€’ Multi-step task execution
β€’ High accuracy workflows
πŸ€— HuggingFace
Arch-Agent-7B 7B β€’ Advanced autonomous behavior
β€’ Complex workflow orchestration
β€’ Maximum performance
πŸ€— HuggingFace
Arch-Agent-32B 32B β€’ Premium autonomous systems
β€’ Sophisticated multi-step workflows
β€’ Superior capabilities
πŸ€— HuggingFace

πŸ“š 1. Fine-tuning Arch-Function Models

Here we provide a script to fine-tune Arch-Function models with LLaMA-Factory:

1.1 Set up environment

  • Create the environment following the instructions of LLaMA-Factory
  • If you would like to use deepspeed and flash-attn, you can install packages with the following command:
pip install deepspeed
pip install flash-attn --no-build-isolation

1.2 Prepare training data

LLaMA-Factory supports datasets in alpaca and sharegpt format. We recommend using the sharegpt format for function calling tasks. Below is an example of dataset in:

[
	{
		"conversations": [
			{
				"from": "human",
				"value": "user instruction"
			},
			{
				"from": "function_call",
				"value": "tool arguments"
			},
			{
				"from": "observation",
				"value": "tool result"
			},
			{
				"from": "gpt",
				"value": "model response"
			}
		],
		"system": "system prompt (optional)",
		"tools": "tool description (optional)"
	}
]

Next, update data/dataset_info.json with the dataset description below:

"dataset_name": {
	"file_name": "data.json",
	"formatting": "sharegpt",
	"columns": {
		"messages": "conversations",
		"system": "system",
		"tools": "tools"
	}
}

1.3 Training

LLaMA-Factory provides diverse examples of training for LLMs under examples. You can follow these examples and create a training script for your purpose. To kick off training, run the following command:

CUDA_VISIBLE_DEVICES={YOUR_DEVICE_IDS} llamafactory-cli train {PATH_TO_YOUR_TRAINING_SCRIPT}

Key considerations for fine-tuning:

  • Prepare high-quality function calling examples with proper format
  • Use gradient accumulation for larger effective batch sizes
  • Monitor validation loss to prevent overfitting
  • Consider using LoRA for parameter-efficient fine-tuning

πŸ“š 2. Inference with Arch-Function Models

To run inference with Arch-Function models for function calling tasks, follow the steps below:

2.1 Set up environment

Arch-Function models have been in the Hugging Face transformers library and we advise you to install latest version with the following command:

pip install transformers>=4.51.0

2.2 Inference

Below is a script demonstrating how to use Arch-Function models for function calling tasks.

2.2.1 Create models and tokenizers

You can specify the desired model name and create models and corresponding tokenizers with the following script:

import json
from typing import Any, Dict, List
from transformers import AutoModelForCausalLM, AutoTokenizer

# Specify the desired model name here
model_name = "katanemo/Arch-Agent-7B"

model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

2.2.2 Format prompts

Our models perform best when using the recommended prompt format, which can be found in the corresponding model cards on Hugging Face. You can run the following script to format prompts:

# Please use the recommended prompt for each model.
TASK_PROMPT = (
    "You are a helpful assistant designed to assist with the user query by making one or more function calls if needed."
    "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\n"
    "You are provided with function signatures within <tools></tools> XML tags:\n<tools>\n{tool_text}"
    "\n</tools>\n\nFor each function call, return a json object with function name and arguments within "
    """<tool_call></tool_call> XML tags:\n<tool_call>\n{{"name": <function-name>, """
    """"arguments": <args-json-object>}}\n</tool_call>"""
)

# Define available tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "str",
                        "description": "The city and state, e.g. San Francisco, New York",
                    },
                    "unit": {
                        "type": "str",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "The unit of temperature to return",
                    },
                },
                "required": ["location"],
            },
        },
    }
]


# Helper function to create the system prompt for our model
def format_prompt(tools: List[Dict[str, Any]]):
    tool_text = "\n".join(
        [json.dumps(tool["function"], ensure_ascii=False) for tool in tools]
    )
    return TASK_PROMPT.format(tool_text=tool_text)


system_prompt = format_prompt(tools)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "What is the weather in Seattle?"},
]

2.2.3 Run inference

Now, you can run the following script to do inference with Arch-Function models.

#### 2.2.3 Run inference
model_inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
).to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=32768)
generated_ids = [
    output_ids[len(input_ids) :]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Inference optimization tips:

  • Use appropriate temperature settings (0.0 - 0.1 for function calling)
  • User proper prompt formatting for best results
  • Consider batching for multiple requests
  • Use quantized models for faster inference

πŸ“š 3. Deployment with Popular Hosting Frameworks

Below we show how to deploy Arch-Function models using popular model hosting frameworks.

3.1 vLLM Deployment

vLLM provides high-throughput serving with advanced optimizations. Following the steps below to deploy Arch-Function models with vLLM

3.1.1 Set up environment

# Install vLLM
pip install vllm

3.1.2 Start vLLM server

vllm serve katanemo/Arch-Agent-7B \
    --host 127.0.0.1 \
    --port 8000 \
    --tensor-parallel-size 1

3.1.3 Get responses

To get responses from the vLLM server for function calling, first format prompts following here. Then, replace messages in the script below with the formatted prompts and run the script.

from openai import OpenAI

# Point to the local server
client = OpenAI(
    api_key="EMPTY",
    base_url="http://127.0.0.1:8000/v1",
)

# Send requests and get responses from the server
completion = client.chat.completions.create(
    model="katanemo/Arch-Agent-7B",
    messages=[
        {"role": "user", "content": "Get the current temperature in San Francisco"}
    ],
    temperature=0.01,
    max_tokens=1024
)

print(completion.choices[0].message.content)

3.2 ollama Deployment

ollama provides easy local deployment with automatic model management. Below we provide scripts to show how to use ollama for deployment.

3.2.1 Install ollama

Please see ollama for installation. If necessary, use the following command to install ollama python library.

pip install ollama

3.2.2 Start ollama server

Specify your desired model name below and run the follwoing command to start the ollama server. Note that ollama only supports gguf format.

ollama run hf.co/katanemo/Arch-Agent-7B.gguf

3.2.3 Get responses

Format prompts following here, and the replace formatted_prompt in the script below and run the script to get responses.

from ollama import Client

# Point to the local server. By default, it uses port 11434.
client = Client(host="http://127.0.0.1:11434")

# Send requests and get responses from the server
completion = client.chat(
    model="hf.co/katanemo/Arch-Agent-1.5B.gguf",
    messages=[
        {"role": "user", "content": "Get the current temperature in San Francisco"}
    ],
    options={"temperature": 0.01, "num_ctx": 1024}
)

print(completion.message.content)

3.3 SGLang Deployment

SGLang offers structured generation capabilities with high performance. To use SGLang for deployment, follow the steps below.

3.3.1 Set up experiment

# Install SGLang
pip install sglang[all]

3.3.2 Start SGLang server

python -m sglang.launch_server \
    --model-path katanemo/Arch-Agent-7B \
    --host 127.0.0.1 \
    --port 8000 \
    --tp 1 \
    --trust-remote-code

3.3.3 Get responses

As sglang provides OpenAI-compatible APIs, you can follow the same way as vLLM to get responses from the server. First format prompts following here. Then, replace messages in the script below with the formatted prompts and run the script.

# Client code for vLLM
from openai import OpenAI

# Point to the local server
client = OpenAI(
    api_key="EMPTY",
    base_url="http://127.0.0.1:8000/v1",
)

# 
completion = client.chat.completions.create(
    model="katanemo/Arch-Agent-7B",
    messages=[
        {"role": "user", "content": "Get the current temperature in San Francisco"}
    ],
    temperature=0.01,
    max_tokens=1024
)

print(completion.choices[0].message.content)

πŸ”¬ Research & Development

The Arch-Function project is actively developing next-generation models that will:

  • Further advance function calling accuracy beyond current SOTA
  • Introduce novel architectures optimized for tool usage
  • Expand to multimodal function calling capabilities
  • Support more complex reasoning patterns in function selection

πŸ“„ License

Please refer to the individual model pages on Hugging Face for specific licensing information.

🀝 Contributing

We welcome contributions to improve the Arch-Function tutorials and documentation! You can help by:

  • Fixing errors or improving existing tutorials
  • Adding new deployment examples or use cases
  • Suggesting additional framework integrations
  • Improving documentation clarity

Feel free to open an issue or submit a pull request with your improvements.

πŸ“ž Support

For questions and support:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published