Skip to content

Commit 3e776b7

Browse files
Document Service Catalogs + AI
Documents new functionality under our service catalog + AI.
1 parent 1862cf4 commit 3e776b7

File tree

12 files changed

+383
-235
lines changed

12 files changed

+383
-235
lines changed

pages/ai/architecture.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Plural AI Architecture
3+
description: How Plural AI Works
4+
---
5+
6+
## Overview
7+
8+
At its core, Plural AI has three main components:
9+
10+
* A causal graph of the high-level objects that define your infrastructure. An example is this Plural Service owns this Kubernetes Deployment, which owns a ReplicaSet which owns a set of Pods.
11+
* A permission engine to ensure any set of objects within the graph are interactable by the given user of Plural's AI. This hardens the governance process around access to the completions for our AI. The presence of Plural's agent in your Kubernetes fleet also makes the ability to query end-clusters much more secure from a networking perspective.
12+
* Our PR Automation framework - this allows us to hook into SCM providers and automate code fixes in a reviewable, safe way.
13+
14+
In the parlance of the AI industry, you can think of it as a highly advanced RAG (retrieval augmented generation), with an agent-like behavior, since it's always on and triggered by any emergent issue in your infrastructure.
15+
16+
## In Detail
17+
18+
Here's a detailed walkthrough of how the AI engine works in the case of a Plural Service with a failing Kubernetes deployment.
19+
20+
1. The engine is told the service is failing from our internal event bus
21+
2. The failing components of that service are collected, with the failing deployment selected first
22+
3. The metadata of the service are added to the prompt (what cluster its on, how its sourcing its configuration, etc)
23+
4. The events, replica sets, spec of the k8s deployment are queried and added to a prompt
24+
5. The failing pods for the deployment are selected from the replica sets, and a random subset are queried individually
25+
6. Each failing pods events and spec are added to the growing prompt
26+
27+
This will then craft an insight for the Deployment node, which can be combined with insights from any other components to collect to a service-level insight.
28+
29+
If this investigation were done again, we'd be able to cache any non-stale insights and prevent rerunning the inference a second time where it would be unnecessary.

pages/ai/cost.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
title: Plural AI Cost Analysis
3+
description: How much will Plural AI cost me?
4+
---
5+
6+
Plural AI is built to be extremely cost efficient. We've found it will often heavily outcompete the spend you would use on advanced APM tools, and likely even DIY prometheus setups. That said, AI inference is not cheap in general, and we do a number of things to work around that:
7+
8+
* Our causal knowledge graph heavily caches at each layer of the tree. This allows us to ensure repeated attempts to generate the same insight are deduplicated, reducing inference API calls dramatically
9+
* You can split the model used by usecase. Insight generation can leverage cheap, fast models, whereas the tool calls that ultimately generate PRs use smarter, advanced models, but are executed less frequently so the cost isn't felt as hard.
10+
* We use AI sparingly. Inference is only done when we know something is wrong.
11+
12+
That said, what does that actually mean?
13+
14+
## Basic Cost Analysis
15+
16+
We at Plural dogfood our own AI functionality in our own infrastructure. This includes a sandbox test fleet of over 10 clusters, and a production fleet of around 5 clusters for both our main services and Plural Cloud. Plural's AI Engine runs on the management clusters for each of these domains since launch, and while we might do a decent-ish job of caretaking those environments, or current daily OpenAI bill is $~2.64 per day, or roughly $81 per month.
17+
18+
This is staggeringly cost effective, when you consider a Datadog bill for our equivalent infrastructure is at minimum $10k, even a prometheus setup is well over 100/mo for the necessary compute including datastore, grafana, grafana's database, load balancers, and agents. Granted, some of these services will ultimately be necessary to have Plural AI reach its full potential, but we could see a world where:
19+
20+
```sh
21+
OpenTelemetry + Plural AI >> Datadog/New Relic
22+
```
23+
24+
as a general debugging platform, while being a miniscule fraction of the current cost.

pages/ai/overview.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: Plural AI
3+
description: Plural's AI Engine Removes the Gruntwork from Infrastructure
4+
---
5+
6+
{% callout severity="info" %}
7+
If you just want to skip the text and see it in action, skip to the [demo video](/ai/overview#demo-video)
8+
{% /callout %}
9+
10+
Managing infrastructure is full of mind-numbing tasks, from troubleshooting the same misconfiguration for the hundredth time, to whack-a-moling Datadog alerts, to playing internal IT support to application developers who cannot be bothered to learn the basics of foundational technology like Kubernetes. Plural AI allows you to outsource all those time-sucks to LLMs so you can focus on building value-added platforms for your enterprise.
11+
12+
In particular, Plural AI has a few differentiators to its approach:
13+
14+
* A bring-your-own-LLM model - allows you to use the LLM already approved by your enterprise and not worry about us as a MITM
15+
* An always-on troubleshooting engine - taking signals from failed kubernetes services, failed terraform runs, and other misfires in your infrastructure to run a consistent investigative process and summarize the results. Eliminate manual digging and just fix the issue instead.
16+
* Automated Fixes - Take any insight from our troubleshooting engine and generate a fix PR automatically, generated from our ability to introspect the GitOps code defining that piece of infrastructure.
17+
* AI Explanation - Complex or Domain-specific pages can be explained w/ one click with AI, eliminating internal support burdens for engineers.
18+
* AI Chat - any workflow above can be further refined or expanded in a full ChatGPT-like experience. Paste additional context into chats automatically, or generate PRs once you and the AI has found the fix.
19+
20+
21+
# Demo Video
22+
23+
To see this all in action, feel free to browse our live demo video on Youtube of our GenAI integration:
24+
25+
{% embed url="https://youtu.be/LxurfPikps8" aspectRatio="16 / 9" /%}

pages/ai/setup.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: Setup Plural AI
3+
description: How to configure Plural AI
4+
---
5+
6+
Plural AI can easily be configured via the `DeploymentSettings` CRD or at `/settings/global/ai-provider` in your Plural Console instance. An example `DeploymentSettings` config is below:
7+
8+
```yaml
9+
apiVersion: deployments.plural.sh/v1alpha1
10+
kind: DeploymentSettings
11+
metadata:
12+
name: global
13+
namespace: plrl-deploy-operator
14+
spec:
15+
managementRepo: pluralsh/plrl-boot-aws
16+
17+
ai:
18+
enabled: true
19+
provider: OPENAI
20+
anthropic: # example anthropic config
21+
model: claude-3-5-sonnet-latest
22+
tokenSecretRef:
23+
name: ai-config
24+
key: anthropic
25+
26+
openAI: # example openai config
27+
tokenSecretRef:
28+
name: ai-config
29+
key: openai
30+
31+
vertex: # example VertexAI config
32+
project: pluralsh-test-384515
33+
location: us-east1
34+
model: gemini-1.5-pro-002
35+
serviceAccountJsonSecretRef:
36+
name: ai-config
37+
key: vertex
38+
```
39+
40+
You can see the full schema at our [Operator API Reference](/deployments/operator/api#deploymentsettings).
41+
42+
In all these cases, you need to create an additional secret in `plrl-deploy-operator` to reference api keys and auth secrets. It would look something like this:
43+
44+
```yaml
45+
apiVersion: v1
46+
kind: Secret
47+
metadata:
48+
name: ai-config
49+
namespace: plrl-deploy-operator
50+
stringData:
51+
vertex: <service account json string>
52+
openai: <access-token>
53+
anthropic: <access-token>
54+
```

pages/catalog/contributing.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
---
2+
title: Contribution Program
3+
description: Contributing to Plural's Service Catalog
4+
---
5+
6+
We run a continuous Contributor Program to help maintain our catalog from the community. The bounties for the various tasks are as follows:
7+
8+
* $100 for an application update (note that many applications should auto-update)
9+
* $250 for an application onboarding
10+
11+
To qualify for the bounty, you'll need to submit a PR to https://github.com/pluralsh/scaffolds.git and once it's been approved and merged, DM a member of the Plural staff on Discord to receive your payout.

pages/catalog/creation.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
title: Creating Your Own Catalog
3+
description: Defining your own service catalogs with Plural
4+
---
5+
6+
## Overview
7+
8+
{% callout severity="info" %}
9+
TLDR: skip to [Examples](/catalog/creation#examples) to see a link to our Github repository with our full default catalog for working examples
10+
{% /callout %}
11+
12+
Plural Service Catalogs are ultimately driven off of two kubernetes custom resources: `Catalog` and `PrAutomation`. Here are examples of both:
13+
14+
```yaml
15+
apiVersion: deployments.plural.sh/v1alpha1
16+
kind: Catalog
17+
metadata:
18+
name: data-engineering
19+
spec:
20+
name: data-engineering
21+
category: data
22+
icon: https://docs.plural.sh/favicon-128.png
23+
author: Plural
24+
description: |
25+
Sets up OSS data infrastructure using Plural
26+
bindings:
27+
create:
28+
- groupName: developers # controls who can spawn prs from this catalog
29+
```
30+
31+
```yaml
32+
apiVersion: deployments.plural.sh/v1alpha1
33+
kind: PrAutomation
34+
metadata:
35+
name: airbyte
36+
spec:
37+
name: airbyte
38+
icon: https://plural-assets.s3.us-east-2.amazonaws.com/uploads/repos/d79a69b7-dfcd-480a-a51d-518865fd6e7c/airbyte.png
39+
identifier: mgmt
40+
documentation: |
41+
Sets up an airbyte instance for a given cloud
42+
creates:
43+
git:
44+
ref: sebastian/prod-2981-set-up-catalog-pipeline # TODO set to main
45+
folder: catalogs/data/airbyte
46+
templates:
47+
- source: helm
48+
destination: helm/airbyte/{{ context.cluster }}
49+
external: true
50+
- source: services/oauth-proxy-ingress.yaml.liquid
51+
destination: services/apps/airbyte/oauth-proxy-ingress.yaml.liquid
52+
external: true
53+
- source: "terraform/{{ context.cloud }}"
54+
destination: "terraform/apps/airbyte/{{ context.cluster }}"
55+
external: true
56+
- source: airbyte-raw-servicedeployment.yaml
57+
destination: "bootstrap/apps/airbyte/{{ context.cluster }}/airbyte-raw-servicedeployment.yaml"
58+
external: true
59+
- source: airbyte-servicedeployment.yaml
60+
destination: "bootstrap/apps/airbyte/{{ context.cluster }}/airbyte-servicedeployment.yaml"
61+
external: true
62+
- source: airbyte-stack.yaml
63+
destination: "bootstrap/apps/airbyte/{{ context.cluster }}/airbyte-stack.yaml"
64+
external: true
65+
- source: oauth-proxy-config-servicedeployment.yaml
66+
destination: "bootstrap/apps/airbyte/{{ context.cluster }}/oauth-proxy-config-servicedeployment.yaml"
67+
external: true
68+
- source: README.md
69+
destination: documentation/airbyte/README.md
70+
external: true
71+
repositoryRef:
72+
name: scaffolds
73+
catalogRef: # <-- NOTE this references the Catalog CRD
74+
name: data-engineering
75+
scmConnectionRef:
76+
name: plural
77+
title: "Setting up airbyte on cluster {{ context.cluster }} for {{ context.cloud }}"
78+
message: |
79+
Set up airbyte on {{ context.cluster }} ({{ context.cloud }})
80+
81+
Will set up an airbyte deployment, including object storage and postgres setup
82+
configuration:
83+
- name: cluster
84+
type: STRING
85+
documentation: Handle of the cluster you want to deploy airbyte to.
86+
- name: stackCluster
87+
type: STRING
88+
documentation: Handle of the cluster used to run Infrastructure Stacks for provisioning the infrastructure. Defaults to the management cluster.
89+
default: mgmt
90+
- name: cloud
91+
type: ENUM
92+
documentation: Cloud provider you want to deploy airbyte to.
93+
values:
94+
- aws
95+
- name: bucket
96+
type: STRING
97+
documentation: The name of the S3/GCS/Azure Blob bucket you'll use for airbyte logs. This must be globally unique.
98+
- name: hostname
99+
type: STRING
100+
documentation: The DNS name you'll host airbyte under.
101+
- name: region
102+
type: STRING
103+
documentation: The cloud provider region you're going to use to deploy cloud resources.
104+
```
105+
106+
A catalog is a container for many PRAutomations which themselves control the code-generation to accomplish the provisioning task being implemented. In this case, we're provisioning [Airbyte](https://airbyte.com/). The real work is being done in the referenced templates.
107+
108+
## Examples
109+
110+
The best way to get some inspiration on how to write your own templates is to look through some examples, and that's why we've made our default service catalog open source. You can browse it here:
111+
112+
https://github.com/pluralsh/scaffolds/tree/main/setup/catalogs

pages/catalog/overview.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
title: Service Catalog
3+
description: Enterprise Grade Self-Service with Plural
4+
---
5+
6+
{% callout severity="info" %}
7+
If you just want to skip the text and see it in action, skip to the [demo video](/catalog/overview#demo-video)
8+
{% /callout %}
9+
10+
Plural provides a full-stack GitOps platform for provisioning resources with both IaC frameworks like terraform and Kubernetes manifests like helm and kustomize. This alone is very powerful, but most enterprises want to go a step beyond and implement full self-service. This provides two main benefits:
11+
12+
* Reduction of manual toil and error in repeatable infrastructure provisioning paths
13+
* Ensuring compliance with enterprise cybersecurity and reliabilty standards in the creation of new infrastructure, eg the creation of "Golden Paths".
14+
15+
Plural accomplishes this via our Catalog feature, which allows [PR Automations](/deployments/pr-automation) to be bundled according to common infrastructure provisioning usecases. We like the code generation approach for a number of reasons:
16+
17+
* Clear tie-in with established review-and-approval mechanisms in the PR-process
18+
* Great customizability throughout the lifecycle.
19+
* Generality - any infrastructure provisioning task can be represented as some terraform + GitOps code in theory
20+
21+
# Demo Video
22+
23+
To see this all in action in provisioning a relatively complex application in [Dagster](https://dagster.io/), feel free to browse our live demo video on Youtube of our GenAI integration:
24+
25+
{% embed url="https://youtu.be/5D6myZ7sm2k" aspectRatio="16 / 9" /%}

pages/faq/certifications.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ title: Certifications
33
description: What certifications does Plural have?
44
---
55

6-
Plural is currently a part of the **Cloud Native Computing Foundation** and **Cloud Native Landscape**, and is certified to be **GDPR-compliant**.
6+
Plural is currently a part of the **Cloud Native Computing Foundation** and **Cloud Native Landscape**. In addition we maintain the following certifications:
77

8-
We are currently working toward **SOC 2 compliance**.
8+
* **GDPR**
9+
* **SOC 2 Type 2**

pages/introduction.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,26 @@ Plural is a unified cloud orchestrator for the management of Kubernetes at scale
1515

1616
In addition, we support a robust, enterprise-ready [Architecture](/deployments/architecture). This uses a separation of management cluster and an agent w/in each workload cluster to achieve scalability and enhanced security to compensate for the risks caused by introducing a Single-Pane-of-Glass to Kubernetes. The agent can only communicate to the management cluster via egress networking, and executes all write operations with local credentials, removing the need for the management cluster to be a repository of global credentials. If you want to learn more about the nuts-and-bolts feel free to visit our [Architecture Page](/deployments/architecture).
1717

18-
## Plural Open Source Marketplace
18+
## Plural AI
19+
20+
Plural integrates heavily with LLMs to enable complex automation within the realm of GitOps ordinary deterministic methods struggle to get right. This includes:
21+
22+
* running root cause analysis on failing kubernetes services, using a hand-tailored evidence graph Plural extracts from its own fleet control plane
23+
* using AI troubleshooting insights to autogenerate fix prs by introspecting Plural's own GitOps and IaC engines
24+
* using AI code generation to generate PRs for scaling recommendations from our Kubecost integration
25+
* "Explain with AI" and chat feature to explain any complex kubernetes object in the system to reduce Platform engineering support burden from app developers still new to kubernetes.
26+
27+
The goal of Plural's AI implementation is not to shoehorn LLMs into every infrastructure workflow, which is not just misguided but actually dangerous. Instead, we're trying to automate all the mindless gruntwork that comes with infrastructure, like troubleshooting well-known bugs, fixing YAML typos, and explaining the details of well-known, established technology like Kubernetes. This is the sort of thing that wastes precious engineering time, and bogs down enterprises trying to build serious developer platforms.
28+
29+
You can read more about it under [Plural AI](/ai/overview).
30+
31+
## Plural Service Catalog
32+
33+
We also maintain a catalog of open source applications like Airbyte, Airflow, etc. that can be deployed to kubernetes on most major clouds. The entire infrastructure is extensible and recreatable as users and software vendors as well.
34+
35+
You can also define your own internal catalogs and vendors can share catalogs of their own software. It is meant to be a standard interface to support developer self-service for virtually any infrastructure provisioning workflow.
36+
37+
The full docs are available under [Service Catalog](/catalog/overview).
38+
1939

20-
We also maintain a catalog of open source applications like Airbyte, Airflow, etc. that can be deployed to kubernetes on most major clouds. We're in progress to merging that experience with our modernized Fleet Management platform, but if you're interested in any of them, we're happy to support them in the context of a commercial plan.
2140

0 commit comments

Comments
 (0)