|
ParrotPark is a Infrastructure as Code solution for self-hosting a LLM assistant with all required components, including inference servers and a chat interface. It was developed by CorrelAid with the support of D64. This project is currently not being worked on and ended with an evaluation (see evaluation directory). |
- LiteLLM: api.parrotpark.correlaid.org
- Admin UI: api.parrotpark.correlaid.org/ui
- API Doc: api.parrotpark.correlaid.org/docs
- LibreChat: parrotpark.correlaid.org
- Find Instructions on how to use Parrotpark here
---
config:
flowchart:
htmlLabels: false
---
%%{ init: { 'flowchart': { 'curve': '' } } }%%
flowchart LR
subgraph proxy["`**Entrance Server**`"]
direction TB
Caddy
Databases
Scheduler
end
subgraph gpu["`**Ephemeral GPU Server**`"]
direction TB
vLLM
LiteLLM
LibreChat
end
LibreChat ---|SSO Auth| Keycloak
buckets["`S3 Buckets`"]
LibreChat ---|Storage| buckets
LiteLLM & LibreChat --- Databases
Caddy -->|Proxy| LiteLLM & LibreChat
Scheduler -->|Creates Periodically| gpu
- Across servers, the services are connected via a netbird VPN
- SSH access is restricted to the VPN
- Internal networking is configured with docker networks
- The entrance server additionally runs a telegraf agent for scraping metrics from services and host systems
- The GPU servers hosts a nvidia smi exporter container for scraping metrics from the GPU (nvidia smi)
- Metrics are sent to a timescale database, to which a metabase instance has access to
- The scheduler is a python script that runs on the entrance server. It executes opentofu and ansible commands to create and destroy the GPU server. It is packaged as a systemd service.
- IaC code refers to OpenTofu and Ansible scripts.
- This is a nested Infrastructure as Code project, because the initial IaC script creates an entrance server which will contain IaC code to automatically create a second ephemeral GPU server.
Because this project grew dynamically, infrastructure is a bit all over the place (spread across multiple cloud providers), Theoretically, as long as its the same type of service/has the same functionality, most components can be swapped out. For example hetzner has object storage (buckets) as well, you could use Tailscale instead of netbird or Azure Entra instead of keycloak.
-
Hetzner account and Cloud project
- A domain in the Hetzner DNS console
- A mailbox in a Hetzner KonsoleH webhosting account
-
Digital Ocean Account account for LibreChat asset and OpenTofu state storage
-
Scaleway account with access to creating L4 GPU instances (if you open an account you will have to write the support to request access)
-
Netbird for connection via VPN.
- Set up one or multple Groups with access configured so that you can connect to the servers via SSH and the servers can connect to each other.
-
Keycloak instance with a configured realm, e.g. on cloud-iam, for user management
-
A domain configured on Hetzner DNS
-
Infisical instance for secret management for the IaC code.
- Contains secrests for accessing some of the existing infrastructure programatically
- Is used by the IaC code to create managed secrets that can be read automatically at other places in the code
-
uv sync --all-groupsto install other dependencies -
Set up pre-commit
uv run pre-commit install -
Install ansible requirements
uv run ansible-galaxy install -r ansible/requirements.yml -
When you were added as a peer to the required Netbird Organisation, run
netbird up -
Create a new set of SSH keys and adjust the entrance_server_settings.public_ssh_key variable in the
opentofu/variables.tffile. Also adjust ansible/files/ansible.cfg depending on where you saved the private key. -
Adjust netbird_vps_group, infisical_workspace_id an scaleway_project variables in the
opentofu/variables.tffile. -
Manually create a bucket on scaleway and adjust the s3 information in
opentofu/meta.tf. Create access credentials and save it as described in the next step. -
In your infisical project, create all vars you see at the top of
ansible/playbook.ymland in the nested IaC inansible/files/ansible/group_vars/unmanaged.yml -
Save these environemtn variables somewhere for easy copy and paste (but do not locally hard code them):
export AWS_ACCESS_KEY_ID="" export AWS_SECRET_ACCESS_KEY="" export TF_VAR_infisical_client_secret="" #### MAKE SURE YOU HAVE DISABLED HISTORY FOR THESE VARS ### -
Decide wether you want to scrape metrics with telegraf. Metric visualisation and alerting is not part of this IaC project, but you can set up a telegraf agent that can send metrics to a TimeScale DB. If you do not want this, just do not include the
deploy_telegraf.ymlinansible/playbook.yml. If you do, you need to have a TimeScale DB available and adjust the vars in the main playbook. -
For the Keycloak Setup, follow this tutorial: https://www.librechat.ai/docs/configuration/authentication/OAuth2-OIDC/keycloak
-
To initialize the OpenTofu backend, run
tofu init
- For both OpenTofu and Ansible, you need to have the environment variables set as described in 10.
- While in the
opentofudirectory, runtofu applyto create the entrance server
- Because Ansible requires SSH access to the servers, you need to have the Netbird client running
- Run ansible with:
uv run ansible-playbook ansible/playbook.yml -vv
While the scheduler takes care of creating the GPU instances as specified in scripts/scheduler.py, you can also manually run the nested IaC code.
For this, you first need to SSH into the entrance server, have the same environment variables set as described in 9. and while in the /home/correlaid/scheduler/opentofu directory, run tofu apply, followed by 'ansible-playbook ansible/playbook.yml -vv' while in the /home/correlaid/scheduler directory.
