A Talos Linux system extension that automatically bootstraps Kubernetes clusters without manual intervention.
EXPERIMENTAL: This project is in an early stage of development and is not yet ready for production use. APIs may break between minor releases. The project does however adhere to semantic versioning, so patch releases will never break the API.
The extension runs as a Talos system extension service on control plane nodes. It:
- Detects control plane nodes by checking for the presence of
/system/secrets/etcd - Generates admin credentials by reading the machine CA from the STATE partition
- Connects to the local Talos API (apid) on port 50000
- Discovers peer nodes by scanning the local network CIDR
- Elects a leader deterministically based on boot time
- Bootstraps the cluster when quorum is reached
┌──────────────────────────────────────────────────────────────────────────┐
│ TALOS AUTO-BOOTSTRAP │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ STARTUP PHASE │ │
│ │ │ │
│ │ 1. Check /system/secrets/etcd (control plane detection) │ │
│ │ 2. Exit if worker node (etcd secrets don't exist) │ │
│ │ 3. Get network info (local IP, CIDR, gateway) │ │
│ │ 4. Read machine CA from STATE partition (/dev/disk/by-partlabel) │ │
│ │ 5. Generate admin TLS credentials from machine CA │ │
│ │ 6. Connect to local apid on port 50000 │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ MAIN BOOTSTRAP LOOP │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │ │
│ │ │ Bootstrap │ │ CIDR │ │ Node Detection │ │ │
│ │ │ Check │───▶│ Scanner │───▶│ & Validation │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ etcd members │ │ Probe :50000 │ │ Get machine type │ │ │
│ │ │ (5s timeout) │ │ on each IP │ │ Get boot time │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────────────────────────────────────────────────────┐│ │
│ │ │ LEADER ELECTION ││ │
│ │ │ ││ │
│ │ │ 1. Collect all control plane nodes (including self) ││ │
│ │ │ 2. Wait for quorum (configurable number of nodes) ││ │
│ │ │ 3. Sort by boot time (ascending) ││ │
│ │ │ 4. Tie-break by IP address (lowest wins) ││ │
│ │ │ 5. First node in sorted list = LEADER ││ │
│ │ └────────────────────────────────────────────────────────────────┘│ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┴───────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────────────────┐ ┌─────────────────────────────────┐ │
│ │ LEADER PATH │ │ FOLLOWER PATH │ │
│ │ │ │ │ │
│ │ 1. Pre-bootstrap delay │ │ 1. Wait for check interval │ │
│ │ 2. Check not bootstrapped │ │ 2. Check if bootstrapped │ │
│ │ 3. Execute Bootstrap() │ │ 3. If yes: exit success │ │
│ │ 4. Wait for etcd ready │ │ 4. If no: re-run discovery │ │
│ │ 5. Exit success │ │ │ │
│ └─────────────────────────────┘ └─────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
Eliminates the need for operators to manually run talosctl bootstrap. The extension automatically discovers peers, elects a leader, and bootstraps the cluster when quorum is reached.
The extension generates its own admin credentials by:
- Mounting the STATE partition read-only
- Reading the machine CA certificate and key from the machine config
- Generating a short-lived (24h) client certificate with
os:adminrole - Using these credentials to authenticate with the local Talos API
This means the extension works without any external secrets or pre-configured credentials.
Discovers peer Talos nodes via CIDR network scanning:
- Reads network configuration from
/proc/net/routeand network interfaces - Scans the local CIDR range for other Talos nodes on port 50000
- Uses insecure TLS for discovery (required for unknown nodes)
- Identifies control plane vs worker nodes via machine type
- Retrieves boot time for leader election
Implements a deterministic leader election algorithm:
- Collect all control plane nodes (including self)
- Wait until quorum is reached (configurable)
- Sort by boot time (oldest first)
- Tie-break by IP address (lowest wins)
- First node in sorted list becomes leader
This ensures the same leader is elected given the same conditions, preventing race conditions.
The leader performs multiple safety checks before bootstrapping:
- Pre-bootstrap delay to allow late-joining nodes to participate
- Final verification that cluster hasn't already been bootstrapped
- Waits for etcd to become ready after bootstrap
- Retries on transient failures with exponential backoff
- Configurable maximum backoff duration
- Graceful handling of worker nodes (exits cleanly)
- Continues operation if some peers are unreachable
- Short timeout (5s) on bootstrap status checks to prevent hanging
The extension is configured via ExtensionServiceConfig in the Talos machine config. Environment variables are passed through the environment field:
| Environment Variable | Description | Default |
|---|---|---|
LOG_LEVEL |
Logging verbosity: debug, info, warn, error | info |
TALOS_AUTO_BOOTSTRAP_SCAN_INTERVAL |
Interval between network discovery scans | 30s |
TALOS_AUTO_BOOTSTRAP_FOLLOWER_CHECK_INTERVAL |
How often followers check bootstrap status | 15s |
TALOS_AUTO_BOOTSTRAP_QUORUM_NODES |
Number of control plane nodes required before bootstrapping | 1 |
TALOS_AUTO_BOOTSTRAP_PRE_BOOTSTRAP_DELAY |
Leader wait time before executing bootstrap | 10s |
TALOS_AUTO_BOOTSTRAP_MAX_BACKOFF |
Maximum retry backoff duration | 2m |
TALOS_AUTO_BOOTSTRAP_SCAN_TIMEOUT |
Timeout for probing each node during discovery | 2s |
TALOS_AUTO_BOOTSTRAP_SCAN_CONCURRENCY |
Maximum concurrent node probes | 50 |
The extension must be included in your Talos image. Use the Talos imager:
# Using docker
docker run --rm -t -v $PWD/_out:/out ghcr.io/siderolabs/imager:v1.11.3 \
--system-extension-image ghcr.io/kommodity-io/kommodity-autobootstrap-extension:latest \
metalOr with talosctl:
talosctl image default \
--image-extension ghcr.io/kommodity-io/kommodity-autobootstrap-extension:latestConfigure the extension using Talos strategic patches (works with CAPI and standalone):
apiVersion: v1alpha1
kind: ExtensionServiceConfig
name: kommodity-autobootstrap
environment:
- LOG_LEVEL=info
- TALOS_AUTO_BOOTSTRAP_SCAN_INTERVAL=30s
- TALOS_AUTO_BOOTSTRAP_QUORUM_NODES=3
- TALOS_AUTO_BOOTSTRAP_PRE_BOOTSTRAP_DELAY=15s
- TALOS_AUTO_BOOTSTRAP_MAX_BACKOFF=2m
- TALOS_AUTO_BOOTSTRAP_SCAN_TIMEOUT=2sFor a 3-node HA control plane cluster:
apiVersion: v1alpha1
kind: ExtensionServiceConfig
name: kommodity-autobootstrap
environment:
- LOG_LEVEL=info
- TALOS_AUTO_BOOTSTRAP_QUORUM_NODES=3
- TALOS_AUTO_BOOTSTRAP_PRE_BOOTSTRAP_DELAY=20sThis ensures:
- All 3 nodes must be discovered before bootstrap
- 20 second delay gives time for late-joining nodes
- Leader election picks the oldest node
For development or single-node clusters:
apiVersion: v1alpha1
kind: ExtensionServiceConfig
name: kommodity-autobootstrap
environment:
- LOG_LEVEL=info
- TALOS_AUTO_BOOTSTRAP_QUORUM_NODES=1
- TALOS_AUTO_BOOTSTRAP_PRE_BOOTSTRAP_DELAY=5sThe extension runs as a Talos extension service with the following characteristics:
- Depends on:
apidservice and network connectivity - Restart policy:
untilSuccess(keeps trying until bootstrap succeeds) - Mounts:
/system/secrets(read-only) - for control plane detection/procas/host/proc(read-only) - for boot time and network routes/etc(read-only) - for hostname/dev(read-only) - for STATE partition access/run(read-write) - for temporary mount points
- Go 1.24+
- Make
- Docker or Podman (for container builds)
# Build binary
make build
# Build for local development (native OS/arch)
make build-local
# Run tests
make test
# Run linter
make lint
# Build container image
make container VERSION=1.0.0
# Push to registry
make container-push VERSION=1.0.0The extension can be tested in a local Talos cluster using talosctl cluster create with a custom image that includes the extension.
- Only runs on control plane nodes (exits gracefully on workers)
- Requires all control plane nodes to be on the same network segment/CIDR
- Does not support multi-cluster coordination
- Does not integrate with external service discovery (Consul, etc.)
- TLS verification is disabled during peer discovery (required for unknown nodes)
- Network interface selection prefers the first interface with a valid IPv4 address
talosctl logs ext-kommodity-autobootstrap{"level":"info","msg":"starting kommodity-autobootstrap-extension","version":"..."}
{"level":"info","msg":"control plane node detected, starting bootstrap process"}
{"level":"info","msg":"resolved apid endpoint","endpoint":"10.0.0.5:50000"}
{"level":"info","msg":"reading machine CA from STATE partition"}
{"level":"info","msg":"generating admin TLS credentials from machine CA"}
{"level":"info","msg":"connected to apid with admin credentials"}
{"level":"info","msg":"network discovered","localIP":"10.0.0.5","cidr":"10.0.0.0/24","gateway":"10.0.0.1"}
{"level":"info","msg":"peer discovery complete","peers_found":2}
{"level":"info","msg":"leader election complete","leader":"10.0.0.5","is_leader":true,"candidates":3}
{"level":"info","msg":"elected as leader, initiating bootstrap"}
{"level":"info","msg":"waiting before bootstrap","delay":10}
{"level":"info","msg":"executing bootstrap"}
{"level":"info","msg":"waiting for etcd to become ready"}
{"level":"info","msg":"bootstrap successful"}
{"level":"info","msg":"bootstrap service completed successfully"}
| Issue | Cause | Solution |
|---|---|---|
| Extension exits immediately | Worker node detected | Expected behavior - extension only runs on control plane |
| "failed to read machine CA" | STATE partition not accessible | Check /dev/disk/by-partlabel/STATE exists |
| "failed to connect to apid" | apid not ready | Extension will retry automatically |
| No peers discovered | Network segmentation | Ensure all nodes are on same CIDR |
| Bootstrap hangs | etcd not starting | Check etcd service logs |
| Component | Version |
|---|---|
| Talos Linux | >= v1.6.0 |
| Go | 1.24+ |
Talos Auto-Bootstrap Extension is licensed under the Apache License 2.0.