|
| 1 | +--- |
| 2 | +title: At-Scale Infrastructure Testing with Sentinels |
| 3 | +description: Automate testing over any set of Kubernetes Clusters with the Sentinel Resource |
| 4 | +--- |
| 5 | + |
| 6 | +Validating the correctness of any infrastructure change is a meaningfully complex task that has no parallel to a local unit test that is effective at the application layer. Slight differences almost always require some degree of live integration testing. Multiply this by n kubernetes clusters, for any large n, and you definitely need to automate. |
| 7 | + |
| 8 | +Sentinels are meant to provide a flexible abstraction to solve for this. In particular, they allow you to bundle a sequence of checks that can: |
| 9 | + |
| 10 | +* Run terratest-based integration tests across any subset of your clusters and aggregate the results |
| 11 | +* Tail logs across any set of clusters using search filters, and analyze it with AI and git-source rules files |
| 12 | +* Deep-query a kubernetes resource on a cluster and analyze its health with AI and git-sourced rules files |
| 13 | + |
| 14 | +Once a sentinel is defined, it can be run anytime on-demand via API. This can be triggered: |
| 15 | + |
| 16 | +* in our UI |
| 17 | +* in github actions or other CI systems |
| 18 | +* in Plural pipelines |
| 19 | + |
| 20 | +Some common usecases that we find they are particularly well suited for are: |
| 21 | + |
| 22 | +1. Validating kubernetes upgrades do not introduce regressions |
| 23 | +2. Cross-cutting kubernetes operator changes (eg istio upgrades) |
| 24 | +3. Validating network reconfigurations are safe. |
| 25 | + |
| 26 | +But there are likely many more. |
| 27 | + |
| 28 | +The motivation behind all of these, and the use of AI, is that oftentimes confirming infra health requires aggregating multiple textual datasources and interpreting them using some degree of discretion that consumes meaningful man-hours as a result. You simply cannot do that deterministically, so a governed AI-based approach is needed. For deterministic correctness, a full terratest run can exercise common paths like validating pods start, storage volumes can be mounted, networking is enabled, etc. |
| 29 | + |
| 30 | +## Set Up Your First Sentinel |
| 31 | + |
| 32 | +Defining a new sentinel is best done via CRD. If you set up Plural with `plural up` you can register this at a file like `bootstrap/sentinels/example.yaml`: |
| 33 | + |
| 34 | +```yaml |
| 35 | +apiVersion: deployments.plural.sh/v1alpha1 |
| 36 | +kind: Sentinel |
| 37 | +metadata: |
| 38 | + name: example |
| 39 | +spec: |
| 40 | + description: Test baseline kubernetes health |
| 41 | + repositoryRef: |
| 42 | + name: infra |
| 43 | + namespace: infra |
| 44 | + git: |
| 45 | + ref: main |
| 46 | + folder: rules |
| 47 | + checks: |
| 48 | + - name: console-logs |
| 49 | + type: LOG |
| 50 | + ruleFile: logrule.md |
| 51 | + configuration: |
| 52 | + log: |
| 53 | + query: error |
| 54 | + duration: 5m |
| 55 | + namespaces: |
| 56 | + - cert-manager |
| 57 | + - external-dns |
| 58 | + - kube-system |
| 59 | + - name: integration-tests |
| 60 | + type: INTEGRATION_TEST |
| 61 | + configuration: |
| 62 | + integrationTest: |
| 63 | + format: JUNIT |
| 64 | + tags: |
| 65 | + tier: dev |
| 66 | + |
| 67 | + # notice no job image is specified, we ship with a working integration test out of the box that can be used |
| 68 | + # without upfront development. |
| 69 | + jobSpec: |
| 70 | + namespace: plrl-deploy-operator |
| 71 | + serviceAccount: deployment-operator |
| 72 | +``` |
| 73 | +
|
| 74 | +{% callout severity="info" %} |
| 75 | +To see the full api spec, go to our [Management API Docs](https://docs.plural.sh/overview/management-api-reference#sentinel) |
| 76 | +{% /callout %} |
| 77 | +
|
| 78 | +What this particular sentinel will do when run is, in parallel: |
| 79 | +
|
| 80 | +1. Query the logs for the configured namespaces (cert-manager, external-dns, and kube-system, some common low-level operator namespaces) for 5m for errors, and then analyze any results found according to a rule file specified in git. You as the engineer can tune how the AI operates with that rule file. |
| 81 | +2. Launch our default terratest job across all `tier: dev` clusters, doing a basic sequence of health checks. |
| 82 | + |
| 83 | +You can run a sentinel at any time in your Plural Console instance by navigating to `AI -> Sentinels -> {sentinel-name}`, and once run, you'll see an experience something like this: |
| 84 | + |
| 85 | + |
| 86 | + |
| 87 | + |
| 88 | + |
0 commit comments