Skip to content

doc: complete fab.yaml file #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions docs/concepts/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,16 +51,17 @@ Wiring Diagram consists of the following resources:

## Fabricator

Installer builder and VLAB.

* Installer builder based on a preset (currently: `vlab` for virtual and `lab` for physical)
* Main input: Wiring Diagram
* All input artifacts coming from OCI registry
* Always full airgap (everything running from private registry)
Creates installation media.

* Features of fabricator:
* Inputs: [Wiring Diagram](../install-upgrade/build-wiring.md) and
[Config](../install-upgrade/config.md)
* All input artifacts delivered via OCI registry
* Capable of full airgap (everything running from private registry)
installation
* Flatcar Linux for Control Node, generated `ignition.json`
* Automatic K3s installation and private registry setup
* All components and their dependencies running in Kubernetes
* Integrated Virtual Lab (VLAB) management
* Future:
* In-cluster (control) Operator to manage all components
* Upgrades handling for everything starting Control Node OS
Expand Down
61 changes: 12 additions & 49 deletions docs/install-upgrade/build-wiring.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,55 +3,18 @@

## Overview

A wiring diagram is a YAML file that is a digital representation of your network. You can find more YAML level details in the User Guide section [switch features and port naming](../user-guide/profiles.md) and the [api](../reference/api.md). It's mandatory for all switches to reference a `SwitchProfile` in the `spec.profile` of the `Switch` object. Only port naming defined by switch profiles could be used in the wiring diagram, NOS (or any other) port names aren't supported.

In the meantime, to have a look at working wiring diagram for Hedgehog Fabric, run the sample generator that produces
working wiring diagrams:

```console
ubuntu@sl-dev:~$ hhfab sample -h

NAME:
hhfab sample - generate sample wiring diagram

USAGE:
hhfab sample command [command options]

COMMANDS:
spine-leaf, sl generate sample spine-leaf wiring diagram
collapsed-core, cc generate sample collapsed-core wiring diagram
help, h Shows a list of commands or help for one command

OPTIONS:
--help, -h show help
```

Or you can generate a wiring diagram for a VLAB environment with flags to customize number of switches, links, servers, etc.:

```console
ubuntu@sl-dev:~$ hhfab vlab gen --help
NAME:
hhfab vlab generate - generate VLAB wiring diagram

USAGE:
hhfab vlab generate [command options]

OPTIONS:
--bundled-servers value number of bundled servers to generate for switches (only for one of the second switch in the redundancy group or orphan switch) (default: 1)
--eslag-leaf-groups value eslag leaf groups (comma separated list of number of ESLAG switches in each group, should be 2-4 per group, e.g. 2,4,2 for 3 groups with 2, 4 and 2 switches)
--eslag-servers value number of ESLAG servers to generate for ESLAG switches (default: 2)
--fabric-links-count value number of fabric links if fabric mode is spine-leaf (default: 0)
--help, -h show help
--mclag-leafs-count value number of mclag leafs (should be even) (default: 0)
--mclag-peer-links value number of mclag peer links for each mclag leaf (default: 0)
--mclag-servers value number of MCLAG servers to generate for MCLAG switches (default: 2)
--mclag-session-links value number of mclag session links for each mclag leaf (default: 0)
--no-switches do not generate any switches (default: false)
--orphan-leafs-count value number of orphan leafs (default: 0)
--spines-count value number of spines if fabric mode is spine-leaf (default: 0)
--unbundled-servers value number of unbundled servers to generate for switches (only for one of the first switch in the redundancy group or orphan switch) (default: 1)
--vpc-loopbacks value number of vpc loopbacks for each switch (default: 0)
```
A wiring diagram is a YAML file that is a digital representation of your
network. You can find more YAML level details in the User Guide section [switch
features and port naming](../user-guide/profiles.md) and the
[api](../reference/api.md). It's mandatory for all switches to reference a
`SwitchProfile` in the `spec.profile` of the `Switch` object. Only port naming
defined by switch profiles could be used in the wiring diagram, NOS (or any
other) port names aren't supported. An complete example wiring diagram is
[below](build-wiring.md#sample-wiring-diagram).

A good place to start building a wiring diagram is with the switch profiles.
Start with the switches, then move onto the fabric links, and finally the
server connections.

### Sample Switch Configuration
``` { .yaml .annotate linenums="1" }
Expand Down
241 changes: 102 additions & 139 deletions docs/install-upgrade/config.md
Original file line number Diff line number Diff line change
@@ -1,149 +1,33 @@
# Fabric Configuration
## Overview
The `fab.yaml` file is the configuration file for the fabric. It supplies the configuration of the users, their credentials, logging, telemetry, and other non wiring related settings. The `fab.yaml` file is composed of multiple YAML documents inside of a single file. Per the YAML spec 3 hyphens (`---`) on a single line separate the end of one document from the beginning of the next. There are two YAML documents in the `fab.yaml` file. For more information about how to use `hhfab init`, run `hhfab init --help`.
The `fab.yaml` file is the configuration file for the fabric. It supplies
the configuration of the users, their credentials, logging, telemetry, and
other non wiring related settings. The `fab.yaml` file is composed of multiple
YAML documents inside of a single file. Per the YAML spec 3 hyphens (`---`) on
a single line separate the end of one object from the beginning of the next.
There are two YAML objects in the `fab.yaml` file. For more information about
how to use `hhfab init`, run `hhfab init --help`.

## HHFAB workflow

## Typical HHFAB workflows
After `hhfab` has been [downloaded](../getting-started/download.md):

### HHFAB for VLAB

For a VLAB user, the typical workflow with hhfab is:

1. `hhfab init --dev`
1. `hhfab vlab gen`
1. `hhfab vlab up`

The above workflow will get a user up and running with a spine-leaf VLAB.

### HHFAB for Physical Machines

It's possible to start from scratch:

1. `hhfab init` (see different flags to customize initial configuration)
1. `hhfab init`(see different flags to customize initial configuration)
1. Adjust the `fab.yaml` file to your needs
1. `hhfab validate`
1. `hhfab build`

Or import existing config and wiring files:
Or import existing `fab.yaml` and wiring files:

1. `hhfab init -c fab.yaml -w wiring-file.yaml -w extra-wiring-file.yaml`
1. `hhfab validate`
1. `hhfab build`

After the above workflow a user will have a .img file suitable for installing the control node, then bringing up the switches which comprise the fabric.

## Fab.yaml

### Configure control node and switch users

Configuring control node and switch users is done either passing `--default-password-hash` to `hhfab init` or editing the resulting `fab.yaml` file emitted by `hhfab init`. You can specify users to be configured on the control node(s) and switches in the following format:

``` {.yaml .annotation linenums="1"}
spec:
config:
control:
defaultUser: # user 'core' on all control nodes
password: "hashhashhashhashhash" # password hash
authorizedKeys:
- "ssh-ed25519 SecREKeyJumblE"

fabric:
mode: spine-leaf # "spine-leaf" or "collapsed-core"

defaultSwitchUsers:
admin: # at least one user with name 'admin' and role 'admin'
role: admin
#password: "$5$8nAYPGcl4..." # password hash
#authorizedKeys: # optional SSH authorized keys
# - "ssh-ed25519 AAAAC3Nza..."
op: # optional read-only user
role: operator
#password: "$5$8nAYPGcl4..." # password hash
#authorizedKeys: # optional SSH authorized keys
# - "ssh-ed25519 AAAAC3Nza..."

```

Control node(s) user is always named `core`.

The role of the user,`operator` is read-only access to `sonic-cli` command on the switches. In order to avoid conflicts, do not use the following usernames: `operator`,`hhagent`,`netops`.

### NTP and DHCP
The control node uses public ntp servers from cloudflare and google by default. The control node runs a dhcp server on the management network. See the [example file](#complete-example-file).

## Control Node
The control node is the host that manages all the switches, runs k3s, and serves images. This is the YAML document configure the control node:
``` {.yaml .annotation linenums="1"}
apiVersion: fabricator.githedgehog.com/v1beta1
kind: ControlNode
metadata:
name: control-1
namespace: fab
spec:
bootstrap:
disk: "/dev/sda" # disk to install OS on, e.g. "sda" or "nvme0n1"
external:
interface: enp2s0 # interface for external
ip: dhcp # IP address for external interface
management:
interface: enp2s1 # interface for management

# Currently only one ControlNode is supported
```
The **management** interface is for the control node to manage the fabric switches, *not* end-user management of the control node. For end-user management of the control node specify the **external** interface name.

### Forward switch metrics and logs

There is an option to enable Grafana Alloy on all switches to forward metrics and logs to the configured targets using
Prometheus Remote-Write API and Loki API. If those APIs are available from Control Node(s), but not from the switches,
it's possible to enable HTTP Proxy on Control Node(s) that will be used by Grafana Alloy running on the switches to
access the configured targets. It could be done by passing `--control-proxy=true` to `hhfab init`.

Metrics includes port speeds, counters, errors, operational status, transceivers, fans, power supplies, temperature
sensors, BGP neighbors, LLDP neighbors, and more. Logs include agent logs.

Configuring the exporters and targets is currently only possible by editing the `fab.yaml` configuration file. An example configuration is provided below:

``` {.yaml .annotation linenums="1"}
spec:
config:
...
defaultAlloyConfig:
agentScrapeIntervalSeconds: 120
unixScrapeIntervalSeconds: 120
unixExporterEnabled: true
lokiTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all logs
env: env-1
url: https://logs-prod-021.grafana.net/loki/api/v1/push
useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy
prometheusTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all metrics
env: env-1
sendIntervalSeconds: 120
url: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/push
useControlProxy: true # if the Loki API is not available from the switches directly, use the Control Node as a proxy
unixExporterCollectors: # list of node-exporter collectors to enable, https://grafana.com/docs/alloy/latest/reference/components/prometheus.exporter.unix/#collectors-list
- cpu
- filesystem
- loadavg
- meminfo
collectSyslogEnabled: true # collect /var/log/syslog on switches and forward to the lokiTargets
```

For additional options, see the `AlloyConfig` [struct in Fabric repo](https://github.com/githedgehog/fabric/blob/master/api/meta/alloy.go).

## Complete Example File

``` {.yaml .annotation linenums="1" title="fab.yaml"}
``` { .yaml .annotate title="fab.yaml" linenums="1"}
apiVersion: fabricator.githedgehog.com/v1beta1
kind: Fabricator
metadata:
Expand All @@ -159,25 +43,25 @@ spec:
- time.cloudflare.com
- time1.google.com

defaultUser: # user 'core' on all control nodes
password: "hash..." # password hash
defaultUser: # username 'core' on all control nodes
password: "hash..." # generate hash with openssl passwd -5
authorizedKeys:
- "ssh-ed25519 hash..."
- "ssh-ed25519 key..." # generate ssh key with ssh-keygen

fabric:
mode: spine-leaf # "spine-leaf" or "collapsed-core"
includeONIE: true
defaultSwitchUsers:
admin: # at least one user with name 'admin' and role 'admin'
role: admin
password: "hash..." # password hash
password: "hash..." # generate hash with openssl passwd -5
authorizedKeys:
- "ssh-ed25519 hash..."
- "ssh-ed25519 key..."
op: # optional read-only user
role: operator
password: "hash..." # password hash
password: "hash..." # generate hash with openssl passwd -5
authorizedKeys:
- "ssh-ed25519 hash..."
- "ssh-ed25519 key..." # generate ssh key with ssh-keygen

defaultAlloyConfig:
agentScrapeIntervalSeconds: 120
Expand All @@ -187,13 +71,11 @@ spec:
lokiTargets:
lab:
url: http://url.io:3100/loki/api/v1/push
useControlProxy: true
labels:
descriptive: name
prometheusTargets:
lab:
url: http://url.io:9100/api/v1/push
useControlProxy: true
labels:
descriptive: name
sendIntervalSeconds: 120
Expand All @@ -208,10 +90,91 @@ spec:
bootstrap:
disk: "/dev/sda" # disk to install OS on, e.g. "sda" or "nvme0n1"
external:
interface: eno2 # interface for external
interface: eno2 # customer interface to manage control node
ip: dhcp # IP address for external interface
management:
management: # interface that manages switches in private management network
interface: eno1

# Currently only one ControlNode is supported
```

### Configure Control Node and Switch Users

#### Control Node Users
Configuring control node and switch users is done either passing
`--default-password-hash` to `hhfab init` or editing the resulting `fab.yaml`
file emitted by `hhfab init`. The default username on the control node is
`core`.

#### Switch Users
There are two users on the switches, `admin` and `operator`. The `operator` user has
read-only access to `sonic-cli` command on the switches. The `admin` user has
broad administrative power on the switch.
In order to avoid conflicts, do not use the following usernames: `operator`,`hhagent`,`netops`.

### NTP and DHCP
The control node uses public NTP servers from Cloudflare and Google by default.
The control node runs a DHCP server on the management network. See the [example
file](#complete-example-file).

### Control Node
The control node is the host that manages all the switches, runs k3s, and serves images.
The **management** interface is for the control node to manage the fabric
switches, *not* end-user management of the control node. For end-user
management of the control node specify the **external** interface name.

### Telemetry

There is an option to enable [Grafana
Alloy](https://grafana.com/docs/alloy/latest/) on all switches to forward metrics and logs to the configured targets using
[Prometheus Remote-Write
API](https://prometheus.io/docs/specs/prw/remote_write_spec/) and Loki API. Metrics includes port speeds, counters,
errors, operational status, transceivers, fans, power supplies, temperature
sensors, BGP neighbors, LLDP neighbors, and more. Logs include Hedgehog agent logs.

Telemetry can be enabled after installation of the fabric. Open the following
YAML file in an editor on the control node. Modify the fields as needed. Logs
can be pushed to a Grafana instance at the customer environment, or to Grafana
cloud.

```{ .yaml title="telemetry.yaml" linenums="1" }
spec:
config:
fabric:
defaultAlloyConfig:
agentScrapeIntervalSeconds: 120
unixScrapeIntervalSeconds: 120
unixExporterEnabled: true
lokiTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all logs
env: env-1
url: https://logs-prod-021.grafana.net/loki/api/v1/push
prometheusTargets:
grafana_cloud: # target name, multiple targets can be configured
basicAuth: # optional
password: "<password>"
username: "<username>"
labels: # labels to be added to all metrics
env: env-1
sendIntervalSeconds: 120
url: https://prometheus-prod-36-prod-us-west-0.grafana.net/api/prom/push
unixExporterCollectors: # list of node-exporter collectors to enable, https://grafana.com/docs/alloy/latest/reference/components/prometheus.exporter.unix/#collectors-list
- cpu
- filesystem
- loadavg
- meminfo
collectSyslogEnabled: true # collect /var/log/syslog on switches and forward to the lokiTargets
```

To enable the telemetry after install use:

``` shell
kubectl patch -n fab --type merge fabricator/default --patch-file telemetry.yaml
```

For additional options, see the `AlloyConfig` [struct in Fabric repo](https://github.com/githedgehog/fabric/blob/master/api/meta/alloy.go).

Loading