LLamaWorker is a HTTP API server developed based on the LLamaSharp project. It provides an OpenAI-compatible API, making it easy for developers to integrate Large Language Models (LLM) into their applications.
English | 中文
- OpenAI API Compatible: Offers an API similar to OpenAI's / Azure OpenAI, making migration and integration easy.
- Multi-Model Support: Supports configuring and switching between different models to meet the needs of various scenarios.
- Streaming Response: Supports streaming responses to improve the efficiency of processing large responses.
- Embedding Support: Provides text embedding functionality with support for various embedding models.
- chat templates: Provides some common chat templates.
- Auto-Release: Supports automatic release of loaded models.
- Function Call: Supports function calls.
- API Key Authentication: Supports API Key authentication.
- Gradio UI Demo: Provides a UI demo based on Gradio.NET.
A Vulkan backend compiled version is provided in the release, you can download the corresponding compiled version from Releases:
LLamaWorker-Vulkan-win-x64.zip
LLamaWorker-Vulkan-linux-x64.zip
After downloading and unzipping, modify the configuration in the appsettings.json
file, and you can run the software and start using it.
For other backends, you can also download the
Vulkan
version, go to llama.cpp to download the corresponding compiled version, and replace the relevant libraries. You can also compile thellama.cpp
project yourself to get the required libraries.
LLamaWorker supports function calls, and currently provides three templates in the configuration file, and has tested the function call effect of Phi-3
, Qwen2
and Llama3.1
.
Function calls are compatible with OpenAI's API, You can test it with the following JSON request:
POST /v1/chat/completions
{
"model": "default",
"messages": [
{
"role": "user",
"content": "Where is the temperature high between Beijing and Shanghai?"
}
],
"tools": [
{
"function": {
"name": "GetWeatherPlugin-GetCurrentTemperature",
"description": "Get the current temperature of the specified city。",
"parameters": {
"type": "object",
"required": [
"city"
],
"properties": {
"city": {
"type": "string",
"description": "City Name"
}
}
}
},
"type": "function"
},
{
"function": {
"name": "EmailPlugin-SendEmail",
"description": "Send an email to the recipient.",
"parameters": {
"type": "object",
"required": [
"recipientEmails",
"subject",
"body"
],
"properties": {
"recipientEmails": {
"type": "string",
"description": "A recipient email list separated by semicolons"
},
"subject": {
"type": "string"
},
"body": {
"type": "string"
}
}
}
},
"type": "function"
}
],
"tool_choice": "auto"
}
-
Clone the repository locally
git clone https://github.com/sangyuxiaowu/LLamaWorker.git
-
Enter the project directory
cd LLamaWorker
-
Choose the project file according to your needs. The project provides three versions of the project files:
LLamaWorker.Backend.Cpu
: For CPU environments.LLamaWorker.Backend.Cuda11
: For GPU environments with CUDA 11.LLamaWorker.Backend.Cuda12
: For GPU environments with CUDA 12.LLamaWorker.Backend.Vulkan
: Vulkan.
Select the project file that suits your environment for the next step.
-
Install dependencies
dotnet restore LLamaWorker.Backend.Cpu\LLamaWorker.Backend.Cpu.csproj
If you are using a CUDA version, replace the project file name accordingly.
-
Modify the configuration file
appsettings.json
. The default configuration includes some common open-source model configurations, you only need to modify the model file path (ModelPath
) as needed. -
Start the server
dotnet run --project LLamaWorker.Backend.Cpu\LLamaWorker.Backend.Cpu.csproj
If you are using a CUDA version, replace the project file name accordingly.
LLamaWorker offers the following API endpoints:
/v1/chat/completions
: Chat completion requests/v1/completions
: Prompt completion requests/v1/embeddings
: Create embeddings/models/info
: Returns basic information about the model/models/config
: Returns information about configured models/models/{modelId}/switch
: Switch to a specified model
This ui is based on Gradio.NET.
You can also try the Gradio UI demo by running the following command:
dotnet restore ChatUI\ChatUI.csproj
dotnet run --project ChatUI\ChatUI.csproj
Then open the browser and visit the Gradio UI demo.