-
Notifications
You must be signed in to change notification settings - Fork 6
Add CSI driver support for internal service authentication (GCP) #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements Secrets Store CSI Driver integration for GCP to enable secure, dynamic secret mounting from Google Secret Manager into Kubernetes. Changes: - Install Secrets Store CSI Driver (v1.4.7) and GCP provider (v1.9.0) via Helm/kubectl manifest in kube-system namespace - Create weave_worker GCP service account with Workload Identity - Configure IAM policies for read-only access to secrets - Generate random weave worker auth token and store in Google Secret Manager - Add Helm chart configuration for weave-trace-worker with CSI driver support - Add kubectl provider requirement for applying GCP provider manifest The CSI driver automatically syncs secrets from Google Secret Manager to Kubernetes, enabling secret rotation and eliminating the need to manage secrets in both places. SecretProviderClass resources are created by the operator-wandb Helm chart. This provides the infrastructure for internal service authentication where services authenticate via Kubernetes secrets that are dynamically mounted from cloud secrets managers.
WalkthroughIntroduces Secrets Store CSI Driver integration and GCP Secret Manager-backed weave worker authentication: adds kubectl provider requirement, a secrets_store submodule that installs CSI driver and GCP provider, GCP secret + service account + IAM bindings, Workload Identity wiring, Helm-values for weave-trace-worker, and related variables/outputs. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Terraform
participant GCP as GCP
participant Helm as Helm
participant K8s as Kubernetes
participant SecretMgr as Secret Manager
User->>Terraform: apply
rect rgb(230,245,255)
Note over Terraform,GCP: Provision GCP auth & secrets
Terraform->>GCP: create google_service_account.weave_worker
Terraform->>SecretMgr: create secret + secret_version (weave_worker_auth)
Terraform->>GCP: add IAM member (secret access)
Terraform->>GCP: add workload identity binding
end
rect rgb(235,255,230)
Note over Terraform,K8s: Install Secrets Store CSI Driver & provider
Terraform->>Helm: helm_release(secrets-store-csi-driver)
Terraform->>K8s: kubectl_manifest(provider-gcp-plugin)
end
rect rgb(255,245,230)
Note over K8s,SecretMgr: Runtime secret flow
K8s->>SecretMgr: CSI Driver requests secret
SecretMgr-->>K8s: secret data mounted as k8s secret
K8s->>K8s: wandb weave-trace-worker uses mounted secret
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
main.tf (1)
129-148: Add k8s_namespace as a configurable root module variable.The k8s_namespace is hardcoded as
"default"in main.tf (line 132), preventing users from deploying weave-trace-worker in different namespaces. While the app_gke module supports this as a configurable parameter (with default"default"), the root module must expose it:
- Add a
k8s_namespacevariable to the root module'svariables.tfwith an appropriate default value- Update main.tf line 132 to use
k8s_namespace = var.k8s_namespaceinstead of the hardcoded string
🧹 Nitpick comments (2)
modules/app_gke/versions.tf (1)
1-15: Provider version constraints use different styles than root-level declarations.This module uses
>= X.Yconstraints (e.g.,>= 5.30for google), while the root versions.tf uses caret constraints~> X.Y(e.g.,~> 5.30). The>=constraint is more permissive and could introduce version drift. For consistency and to avoid potential incompatibilities, align these with the root-level constraint style (~> 5.30for google,~> 2.10for helm,~> 1.19for kubectl).Consider aligning version constraints to match the root module's style:
terraform { required_providers { google = { source = "hashicorp/google" - version = ">= 5.30" + version = "~> 5.30" } kubectl = { source = "gavinbunney/kubectl" - version = ">= 1.19.0" + version = "~> 1.19" } helm = { source = "hashicorp/helm" - version = ">= 2.10" + version = "~> 2.10" } } }modules/app_gke/secrets_store/secrets_store.tf (1)
31-50: Manifest parsing approach could silently drop invalid manifests; consider more explicit error handling.The local value uses
can(yamldecode(manifest))to filter valid YAML, which will silently exclude any manifest that fails to decode. This could hide corruption, truncation, or other issues with the fetched manifest. While it prevents Terraform errors, it might leave the CSI driver partially or incorrectly configured without clear diagnostics.Additionally, the
for_eachloop uses array indices as keys (for idx, manifest in ... : idx => manifest). If manifests are reordered in a future version of the GCP provider, Terraform will see this as a destroy/recreate cycle, potentially causing brief disruptions to the CSI driver and dependent workloads.Consider these improvements:
- Add explicit manifest validation to catch and report errors:
locals { gcp_provider_manifests = [ for manifest in split("---", data.http.gcp_provider_manifest.response_body) : manifest - if trimspace(manifest) != "" && can(yamldecode(manifest)) + if trimspace(manifest) != "" ] + + # Validate all manifests can be decoded + _manifests_validated = [ + for idx, manifest in local.gcp_provider_manifests : + yamldecode(manifest) # Will fail explicitly if invalid YAML + ] }
- Use stable identifiers for
for_eachto avoid unnecessary resource churn:resource "kubectl_manifest" "gcp_provider" { - for_each = { for idx, manifest in local.gcp_provider_manifests : idx => manifest } + for_each = { for idx, manifest in local.gcp_provider_manifests : + "${yamldecode(manifest).kind}-${yamldecode(manifest).metadata.name}" => manifest + }This ensures that individual manifests are tracked by their Kubernetes resource kind and name, making the configuration more resilient to manifest ordering changes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
README.md(1 hunks)main.tf(2 hunks)modules/app_gke/main.tf(1 hunks)modules/app_gke/outputs.tf(1 hunks)modules/app_gke/secrets_store/secrets_store.tf(1 hunks)modules/app_gke/secrets_store/variables.tf(1 hunks)modules/app_gke/secrets_store/versions.tf(1 hunks)modules/app_gke/variables.tf(1 hunks)modules/app_gke/versions.tf(1 hunks)versions.tf(1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
modules/app_gke/secrets_store/variables.tf
[medium] 1-21: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 1-10: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall
(CKV2_GCP_18)
modules/app_gke/secrets_store/secrets_store.tf
[medium] 7-17: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 27-47: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 30-50: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 1-21: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
[medium] 11-28: Ensure Big Table Instances have deletion protection enabled
(CKV_GCP_122)
[medium] 1-10: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall
(CKV2_GCP_18)
modules/app_gke/outputs.tf
[medium] 30-50: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 41-67: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
modules/app_gke/versions.tf
[medium] 1-21: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 1-10: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall
(CKV2_GCP_18)
modules/app_gke/secrets_store/versions.tf
[medium] 1-21: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 1-10: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall
(CKV2_GCP_18)
modules/app_gke/main.tf
[medium] 145-184: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
🔇 Additional comments (13)
modules/app_gke/secrets_store/versions.tf (1)
1-15: Provider declarations are appropriate for module scope.The required_providers block correctly specifies the providers needed by the secrets_store module: kubectl (for manifest application), helm (for chart deployment), and http (for fetching the GCP provider manifest). The version constraints are reasonable for their respective libraries.
modules/app_gke/secrets_store/variables.tf (1)
1-9: Inconsistency between summary and annotated code regarding variable defaults.The AI summary indicates these variables have defaults ("1.4.7" and "1.9.0"), but the annotated code on lines 1-9 shows no default values. However, looking at modules/app_gke/variables.tf lines 104-114, the same variables are declared there WITH defaults. This suggests the submodule delegates versioning to its parent module, which is a valid pattern. Please verify that the actual file includes appropriate defaults or that the parent module correctly passes these values.
Can you confirm whether this submodule's variables have defaults defined, or if the parent module (app_gke) is responsible for providing them?
versions.tf (1)
24-27: Root-level kubectl provider requirement is appropriate.The caret constraint (
~> 1.19) is consistent with other provider version constraints in this file and provides reasonable flexibility for patch version updates.modules/app_gke/variables.tf (1)
98-114: New variables are properly structured with sensible defaults.The three new variables (
k8s_namespace,secrets_store_csi_driver_version,secrets_store_csi_driver_provider_gcp_version) are correctly typed and documented. The defaults align with the PR objectives. Note thatk8s_namespacedefaults to"default", which means W&B resources will be deployed in the Kubernetes default namespace unless overridden—verify this aligns with your deployment strategy.Confirm that these variables are correctly passed to the
secrets_storemodule invocation in main.tf.README.md (1)
84-84: README correctly documents new kubectl provider requirement.The requirements table has been properly updated with the new kubectl provider entry, following the existing format and matching the root-level version constraint.
modules/app_gke/secrets_store/secrets_store.tf (2)
1-23: Helm release configuration for CSI driver is well-structured.The secrets-store-csi-driver Helm release is properly configured with:
- Correct namespace (
kube-system) for a cluster-wide CSI driver- Appropriate feature flags enabled: secret synchronization, rotation, and reasonable rotation interval (120s)
- Proper referencing of version variable
This configuration enables the CSI driver to sync secrets and support rotation as intended.
39-50: Kubectl manifest application has appropriate ordering and configuration.The
depends_onrelationship ensures the GCP provider manifests are applied only after the CSI driver Helm release is deployed. Theserver_side_apply = trueandwait = truesettings are appropriate for Kubernetes resources and CRDs.modules/app_gke/outputs.tf (1)
42-50: All referenced resources are properly defined.Verification confirms that
google_secret_manager_secret.weave_worker_auth(line 160 in main.tf) andgoogle_service_account.weave_worker(line 176 in main.tf) are both defined in the module. The outputs are correctly referencing these existing resources, and Terraform will successfully resolve them at plan/apply time.modules/app_gke/main.tf (4)
153-173: Secret and password generation looks good.The random password (32 characters, no special chars) and Secret Manager secret creation are well-configured with auto-replication for high availability. Ensure the namespace used in
secret_id(line 161) meets GCP Secret Manager naming requirements (alphanumeric, underscore, hyphen only).
175-182: Good defensive naming with GCP limit awareness.The service account ID correctly truncates the namespace to 20 characters to accommodate the "-weave-wkr" suffix within GCP's 30-character service account ID limit. The descriptive display_name and description are helpful for operational clarity.
198-199: Clear documentation of CSI Driver integration.The NOTE provides good context for future maintainers about how secrets are managed through the Secrets Store CSI Driver and operator-wandb Helm chart.
141-151: Secrets Store module configuration is correct.Both version variables are properly defined in
variables.tfwith appropriate defaults (secrets_store_csi_driver_version: "1.4.7", secrets_store_csi_driver_provider_gcp_version: "1.9.0"). The module invocation correctly passes these variables and maintains proper dependency ordering on cluster and node pool creation.main.tf (1)
530-552: Outputs verified and correctly structured.The weave-trace-worker and secretsStore configuration properly references both outputs from the app_gke module:
weave_worker_service_account_emailreturns the service account email for Workload Identity bindingweave_worker_auth_secret_namereturns the full secret path including/versions/latestBoth underlying resources exist and outputs are correctly computed.
| # Workload Identity binding - allows Kubernetes service account to impersonate GCP service account | ||
| resource "google_service_account_iam_member" "weave_worker_workload_identity" { | ||
| service_account_id = google_service_account.weave_worker.name | ||
| role = "roles/iam.workloadIdentityUser" | ||
| member = "serviceAccount:${local.project_id}.svc.id.goog[${var.k8s_namespace}/wandb-weave-trace-worker]" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Hardcoded Kubernetes service account name in Workload Identity binding.
Line 195 hardcodes wandb-weave-trace-worker as the Kubernetes service account name in the Workload Identity binding. If the actual deployed service account uses a different name, this binding will fail silently, preventing the pod from accessing GCP resources.
Verify that the hardcoded service account name matches what's deployed. Check:
- What service account name is actually created/deployed for weave-trace-worker
- Whether this should be parameterized as a variable
- Cross-reference with the main.tf weave-trace-worker config to ensure consistency
🏁 Script executed:
#!/bin/bash
# Search for weave-trace-worker service account references
rg -n 'weave-trace-worker|wandb-weave-trace-worker' --type tfLength of output: 266
I need to examine the actual service account definitions to confirm the exact name being deployed. Let me check the main.tf:530 context and any Kubernetes manifests:<!-- [scratchpad_update]
[task review_file_2/2 review_segment_1/1]
[observations_cumulative]
From initial script output:
- main.tf:530: "weave-trace-worker" (key without "wandb-" prefix)
- modules/app_gke/main.tf:181: description mentioning "weave-trace-worker pods"
- modules/app_gke/main.tf:195: IAM member "wandb-weave-trace-worker" (WITH "wandb-" prefix)
From web search on Workload Identity:
- K8s SA name in binding MUST match actual deployed K8s SA name exactly
- Format: serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/{KSA_NAME}]
- Any mismatch causes silent auth failure
[analysis]
Critical issue confirmed:
- There is a definite name mismatch
- Line 195 hardcodes "wandb-weave-trace-worker"
- Line 530 references "weave-trace-worker" (without "wandb-" prefix)
- These appear to be DIFFERENT names
- If deployment uses "weave-trace-worker", the binding will fail
Need to verify: What is the actual K8s service account name created by the deployment?
[pending]
- Find exact K8s service account name being deployed
- Check if there's a deployment or manifest that creates this SA
- Confirm if this is a variable or hardcoded name mismatch
[scratchpad_end] -->
I need to examine the actual service account name being deployed. Let me search for the K8s manifests or deployment configuration:
---
🏁 Script executed:
```shell
#!/bin/bash
# Get full context around main.tf:530 and modules/app_gke/main.tf:195
echo "=== Context around main.tf:530 ==="
sed -n '525,535p' main.tf
echo ""
echo "=== Full IAM binding section in modules/app_gke/main.tf (around lines 184-196) ==="
sed -n '184,196p' modules/app_gke/main.tf
echo ""
echo "=== Search for kubernetes service account definitions ==="
rg -n 'kind.*ServiceAccount|metadata.*name.*weave' --type yaml --type yml -A 2 -B 2
Length of output: 1233
Let me search for where the K8s service account is actually created and verify the exact name being used:
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for where weave-trace-worker K8s service account is created or referenced
echo "=== Search for weave-trace-worker deployment/service account creation ==="
rg -n 'weave-trace-worker' --type yaml -C 3 2>/dev/null || echo "No YAML files found with rg"
echo ""
echo "=== Search for K8s manifest files ==="
fd -e yaml -e yml . | head -20
echo ""
echo "=== Check if var.k8s_namespace is used and what k8s_namespace variable is ==="
rg -n 'k8s_namespace' modules/app_gke/main.tf -B 2 -A 2 | head -40
echo ""
echo "=== Check helm chart values or deployment config ==="
rg -n 'weave-trace-worker|serviceAccount:' -C 3 | head -50
Length of output: 3372
Let me search for Helm chart information and how the weave-trace-worker is deployed:
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for Helm chart references and deployment
echo "=== Search for Helm chart references ==="
rg -n 'helm|chart|release.*weave|weave.*release' --type tf -i -C 2 | head -60
echo ""
echo "=== Search for how weave-trace-worker deployment is configured ==="
rg -n 'helm_release|kubernetes_manifest|module.*weave|weave.*module' --type tf -C 3 | head -60
echo ""
echo "=== Look for the full context where weave-trace-worker is in main.tf ==="
sed -n '510,550p' main.tf
Length of output: 5336
Let me search for the local.k8s_sa_map definition to understand the naming convention:
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for local.k8s_sa_map definition
echo "=== Search for k8s_sa_map local variable ==="
rg -n 'k8s_sa_map' --type tf -B 5 -A 5
echo ""
echo "=== Check if there's a default service account name pattern ==="
rg -n 'weave_trace|local\.k8s_sa_map' --type tf | grep -E 'weave_trace|locals'
Length of output: 7248
Hardcoded Kubernetes service account name mismatches the established naming pattern.
The IAM binding at line 195 hardcodes wandb-weave-trace-worker, but reviewing the configuration shows:
- Other services (weave-trace, app, console, etc.) use entries from
local.k8s_sa_mapto set their K8s service account names - The
weave-tracedeployment useslocal.k8s_sa_map.weave_trace=wandb-weave-trace - The
weave-trace-workerHelm values block doesn't specify a service account name and has no correspondingk8s_sa_mapentry
The hardcoded wandb-weave-trace-worker in the binding appears inconsistent with this pattern. Either:
- The K8s service account name should be parameterized via
local.k8s_sa_map(like other services), or - The binding should reference the actual deployed service account name
Without the explicit name in Helm values, the actual deployed K8s service account name is unclear, risking a silent Workload Identity binding failure.
🤖 Prompt for AI Agents
In modules/app_gke/main.tf around lines 191–196 the Workload Identity member
hardcodes "wandb-weave-trace-worker" which breaks the established pattern of
using local.k8s_sa_map entries and can mismatch the actual deployed SA; fix by
making the binding reference the correct k8s SA entry instead of a hardcoded
name: either add a new key (e.g. weave_trace_worker) to local.k8s_sa_map and
replace the literal with local.k8s_sa_map.weave_trace_worker, or if the worker
should use the existing weave_trace SA, replace the hardcoded name with
local.k8s_sa_map.weave_trace; alternatively ensure the Helm values explicitly
set the serviceAccount.name and reference that mapped value in the IAM member.
Store the GCP Secrets Manager Provider manifest locally instead of fetching it via HTTP at runtime. This ensures: - Deterministic deployments (no dependency on external URLs) - Faster Terraform plan/apply (no network fetch) - Works in air-gapped environments - Matches pattern used in core repository The manifest file is versioned based on secrets_store_csi_driver_provider_gcp_version variable, allowing easy updates when upgrading the provider version.
6347118 to
1ecb975
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml (1)
72-77: Pin the busybox init container image to a specific version.The init container references
image: busyboxwithout a version tag, which can result in non-deterministic deployments if the tag is re-released or updated. Pin it to a specific digest or version.- name: chown-provider-mount - image: busybox + image: busybox:1.35Consider using a digest for stronger reproducibility:
busybox@sha256:<digest>.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
README.md(1 hunks)modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml(1 hunks)modules/app_gke/secrets_store/secrets_store.tf(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- README.md
🧰 Additional context used
🪛 Checkov (3.2.334)
modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml
[medium] 7-17: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 27-47: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 30-50: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 1-21: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 7-63: Ensure Kubernetes Cluster is created with Private cluster enabled
(CKV_GCP_25)
[medium] 7-63: Ensure Kubernetes Cluster is created with Private cluster enabled
(CKV_GCP_25)
[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
[medium] 41-67: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
[medium] 11-28: Ensure Big Table Instances have deletion protection enabled
(CKV_GCP_122)
[high] 49-96: Ensure all Cloud SQL database instance requires all incoming connections to use SSL
(CKV_GCP_6)
[medium] 1-10: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall
(CKV2_GCP_18)
[medium] 52-131: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
modules/app_gke/secrets_store/secrets_store.tf
[medium] 7-17: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 27-47: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 30-50: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 1-21: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell
(CKV_GCP_73)
[medium] 11-28: Ensure Big Table Instances have deletion protection enabled
(CKV_GCP_122)
[medium] 1-10: Ensure Terraform module sources use a commit hash
(CKV_TF_1)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket
(CKV_GCP_114)
[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall
(CKV2_GCP_18)
🔇 Additional comments (6)
modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml (2)
15-50: RBAC and manifest structure are well-designed.The ServiceAccount, ClusterRole, and ClusterRoleBinding correctly implement least-privilege access for token creation and serviceaccount read operations, which are the minimal permissions needed for the GCP provider plugin to authenticate via Workload Identity.
52-131: SecurityContext and resource limits are appropriately configured.The DaemonSet enforces a strong security posture with non-root user, dropped capabilities, read-only filesystem, and runtime seccomp. Resource requests and limits prevent resource contention on nodes. Liveness probe ensures unhealthy pods are restarted.
modules/app_gke/secrets_store/secrets_store.tf (4)
1-23: Helm release configuration is well-structured.The secrets_store_csi_driver Helm release enables secret synchronization, rotation, and polling with a 120-second interval, which supports dynamic secret updates from Google Secret Manager. Version and namespace are properly parameterized.
27-36: Manifest parsing logic is robust and deterministic.The locals block correctly handles multi-document YAML by splitting on delimiters, filtering whitespace, and validating YAML syntax before applying. This approach ensures only valid manifests are deployed and addresses the past concern about reproducibility by using a versioned local file instead of a GitHub URL fetch.
38-49: kubectl_manifest orchestration is correct with proper dependencies.Server-side apply and wait flags ensure manifests are fully reconciled before Terraform considers the resource complete. Dependency on the Helm release ensures the CSI driver is deployed before the GCP provider plugin manifests are applied, preventing timing issues.
51-53: Documentation of SecretProviderClass management is clear.The comment correctly documents that SecretProviderClass resources are managed by the application Helm chart (operator-wandb), not Terraform. This architectural choice avoids CRD timing issues and maintains separation of concerns between infrastructure setup and application configuration.
JIRA Issue(s)
https://wandb.atlassian.net/browse/WB-28628
Description
Implements Secrets Store CSI Driver integration for GCP to enable secure, dynamic secret mounting from Google Secret Manager into Kubernetes.
Changes:
The CSI driver automatically syncs secrets from Google Secret Manager to Kubernetes, enabling secret rotation and eliminating the need to manage secrets in both places. SecretProviderClass resources are created by the operator-wandb Helm chart.
This provides the infrastructure for internal service authentication where services authenticate via Kubernetes secrets that are dynamically mounted from cloud secrets managers.
Testing
How did you test your PR? What is the blast radius of your change? Provide a test plan.
Summary by CodeRabbit
Documentation
New Features