Serge - LLaMA made easy 🦙

Serge is a chat interface crafted with llama.cpp for running GGUF models. No API keys, entirely self-hosted!

🌐 SvelteKit frontend
💾 Redis for storing chat history & parameters
⚙️ FastAPI + LangChain for the API, wrapping calls to llama.cpp using the python bindings

🎥 Demo:

demo.webm

⚡️ Quick start

🐳 Docker:

docker run -d \
    --name serge \
    -v weights:/usr/src/app/weights \
    -v datadb:/data/db/ \
    -p 8008:8008 \
    ghcr.io/serge-chat/serge:latest

🐙 Docker Compose:

services:
  serge:
    image: ghcr.io/serge-chat/serge:latest
    container_name: serge
    restart: unless-stopped
    ports:
      - 8008:8008
    volumes:
      - weights:/usr/src/app/weights
      - datadb:/data/db/

volumes:
  weights:
  datadb:

Then, just visit http://localhost:8008, You can find the API documentation at http://localhost:8008/api/docs

🌍 Environment Variables

The following Environment Variables are available:

Variable Name	Description	Default Value
`SERGE_DATABASE_URL`	Database connection string	`sqlite:////data/db/sql_app.db`
`SERGE_JWT_SECRET`	Key for auth token encryption. Use a random string	`uF7FGN5uzfGdFiPzR`
`SERGE_SESSION_EXPIRY`	Duration in minutes before a user must reauthenticate	`60`
`NODE_ENV`	Node.js running environment	`production`

🖥️ Windows

Ensure you have Docker Desktop installed, WSL2 configured, and enough free RAM to run models.

☁️ Kubernetes

Instructions for setting up Serge on Kubernetes can be found in the wiki.

🧠 Supported Models

Category	Models
Alfred	40B-1023
BioMistral	7B
Code	13B, 33B
CodeLLaMA	7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python
Codestral	22B v0.1
Gemma	2B, 1.1-2B-Instruct, 7B, 1.1-7B-Instruct, 2-9B, 2-9B-Instruct, 2-27B, 2-27B-Instruct
Gorilla	Falcon-7B-HF-v0, 7B-HF-v1, Openfunctions-v1, Openfunctions-v2
Falcon	7B, 7B-Instruct, 11B, 40B, 40B-Instruct
LLaMA 2	7B, 7B-Chat, 7B-Coder, 13B, 13B-Chat, 70B, 70B-Chat, 70B-OASST
LLaMA 3	11B-Instruct, 13B-Instruct, 16B-Instruct
LLaMA Pro	8B, 8B-Instruct
Mathstral	7B
Med42	70B, v2-8B, v2-70B
Medalpaca	13B
Medicine	Chat, LLM
Meditron	7B, 7B-Chat, 70B, 3-8B
Meta-LlaMA-3	3-8B, 3.1-8B, 3.2-1B-Instruct, 3-8B-Instruct, 3.1-8B-Instruct, 3.2-3B-Instruct, 3-70B, 3.1-70B, 3-70B-Instruct, 3.1-70B-Instruct
Mistral	7B-V0.1, 7B-Instruct-v0.2, 7B-OpenOrca, Nemo-Instruct
MistralLite	7B
Mixtral	8x7B-v0.1, 8x7B-Dolphin-2.7, 8x7B-Instruct-v0.1
Neural-Chat	7B-v3.3
Notus	7B-v1
Notux	8x7b-v1
Nous-Hermes 2	Mistral-7B-DPO, Mixtral-8x7B-DPO, Mistral-8x7B-SFT
OpenChat	7B-v3.5-1210? 8B-v3.6-20240522
OpenCodeInterpreter	DS-6.7B, DS-33B, CL-7B, CL-13B, CL-70B
OpenLLaMA	3B-v2, 7B-v2, 13B-v2
Orca 2	7B, 13B
Phi	2-2.7B, 3-mini-4k-instruct, 3.1-mini-4k-instruct, 3.1-mini-128k-instruct,3.5-mini-instruct, 3-medium-4k-instruct, 3-medium-128k-instruct
Python Code	13B, 33B
PsyMedRP	13B-v1, 20B-v1
Starling LM	7B-Alpha
SOLAR	10.7B-v1.0, 10.7B-instruct-v1.0
TinyLlama	1.1B
Vicuna	7B-v1.5, 13B-v1.5, 33B-v1.3, 33B-Coder
WizardLM	2-7B, 13B-v1.2, 70B-v1.0
Zephyr	3B, 7B-Alpha, 7B-Beta

Additional models can be requested by opening a GitHub issue. Other models are also available at Serge Models.

⚠️ Memory Usage

LLaMA will crash if you don't have enough available memory for the model

💬 Support

Need help? Join our Discord

🧾 License

Nathan Sarrazin and Contributors. Serge is free and open-source software licensed under the MIT License and Apache-2.0.

🤝 Contributing

If you discover a bug or have a feature idea, feel free to open an issue or PR.

To run Serge in development mode:

git clone https://github.com/serge-chat/serge.git
cd serge/
docker compose -f docker-compose.dev.yml up --build

The solution will accept a python debugger session on port 5678. Example launch.json for VSCode:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Remote Debug",
            "type": "python",
            "request": "attach",
            "connect": {
                "host": "localhost",
                "port": 5678
            },
            "pathMappings": [
                {
                    "localRoot": "${workspaceFolder}/api",
                    "remoteRoot": "/usr/src/app/api/"
                }
            ],
            "justMyCode": false
        }
    ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,141 Commits
.github		.github
api		api
charts		charts
docs		docs
scripts		scripts
vendor		vendor
web		web
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Serge - LLaMA made easy 🦙

⚡️ Quick start

🌍 Environment Variables

🖥️ Windows

☁️ Kubernetes

🧠 Supported Models

⚠️ Memory Usage

💬 Support

🧾 License

🤝 Contributing

About

Licenses found

Releases 27

Packages

Contributors 34

Languages

License

Licenses found

serge-chat/serge

Folders and files

Latest commit

History

Repository files navigation

Serge - LLaMA made easy 🦙

⚡️ Quick start

🌍 Environment Variables

🖥️ Windows

☁️ Kubernetes

🧠 Supported Models

⚠️ Memory Usage

💬 Support

🧾 License

🤝 Contributing

About

Topics

Resources

License

Licenses found

Code of conduct

Stars

Watchers

Forks

Releases 27

Packages 0

Contributors 34

Languages

Packages