Skip to content

Conversation

@amwarrier
Copy link
Contributor

@amwarrier amwarrier commented Oct 23, 2025

JIRA Issue(s)

https://wandb.atlassian.net/browse/WB-28628

Description

Implements Secrets Store CSI Driver integration for GCP to enable secure, dynamic secret mounting from Google Secret Manager into Kubernetes.

Changes:

  • Install Secrets Store CSI Driver (v1.4.7) and GCP provider (v1.9.0) via Helm/kubectl manifest in kube-system namespace
  • Create weave_worker GCP service account with Workload Identity
  • Configure IAM policies for read-only access to secrets
  • Generate random weave worker auth token and store in Google Secret Manager
  • Add Helm chart configuration for weave-trace-worker with CSI driver support
  • Add kubectl provider requirement for applying GCP provider manifest

The CSI driver automatically syncs secrets from Google Secret Manager to Kubernetes, enabling secret rotation and eliminating the need to manage secrets in both places. SecretProviderClass resources are created by the operator-wandb Helm chart.

This provides the infrastructure for internal service authentication where services authenticate via Kubernetes secrets that are dynamically mounted from cloud secrets managers.

Testing

How did you test your PR? What is the blast radius of your change? Provide a test plan.

  • _Tested on managed GCP

Summary by CodeRabbit

  • Documentation

    • Updated kubectl requirement to ~> 1.19 and added README requirements entry.
  • New Features

    • Configurable Kubernetes namespace for deployments (default: "default").
    • Installed and configured Secrets Store CSI Driver with GCP provider for secret syncing.
    • Automated creation and provisioning of worker credentials and a dedicated service account.
    • Enabled workload identity binding so cluster workloads can access GCP secrets securely.

Implements Secrets Store CSI Driver integration for GCP to enable
secure, dynamic secret mounting from Google Secret Manager into Kubernetes.

Changes:
- Install Secrets Store CSI Driver (v1.4.7) and GCP provider (v1.9.0)
  via Helm/kubectl manifest in kube-system namespace
- Create weave_worker GCP service account with Workload Identity
- Configure IAM policies for read-only access to secrets
- Generate random weave worker auth token and store in Google Secret Manager
- Add Helm chart configuration for weave-trace-worker with CSI driver support
- Add kubectl provider requirement for applying GCP provider manifest

The CSI driver automatically syncs secrets from Google Secret Manager to
Kubernetes, enabling secret rotation and eliminating the need to manage
secrets in both places. SecretProviderClass resources are created by
the operator-wandb Helm chart.

This provides the infrastructure for internal service authentication
where services authenticate via Kubernetes secrets that are dynamically
mounted from cloud secrets managers.
@amwarrier amwarrier requested a review from a team as a code owner October 23, 2025 17:38
@coderabbitai
Copy link

coderabbitai bot commented Oct 23, 2025

Walkthrough

Introduces Secrets Store CSI Driver integration and GCP Secret Manager-backed weave worker authentication: adds kubectl provider requirement, a secrets_store submodule that installs CSI driver and GCP provider, GCP secret + service account + IAM bindings, Workload Identity wiring, Helm-values for weave-trace-worker, and related variables/outputs.

Changes

Cohort / File(s) Summary
Documentation & Root Providers
README.md, versions.tf
Added kubectl provider requirement (~> 1.19) to README and root Terraform required_providers.
Root Module Configuration
main.tf
Passed k8s_namespace to module.app_gke; added weave-trace-worker serviceAccount annotation and secretsStore Helm values for GCP-backed secret mapping.
App GKE Module — Core Resources
modules/app_gke/main.tf
Added module "secrets_store"; created random_password.weave_worker_auth, google_secret_manager_secret* resources, google_service_account.weave_worker, secret IAM member, and Workload Identity binding.
App GKE Module — Outputs & Inputs
modules/app_gke/outputs.tf, modules/app_gke/variables.tf
Added outputs weave_worker_auth_secret_name and weave_worker_service_account_email; added variables k8s_namespace, secrets_store_csi_driver_version, and secrets_store_csi_driver_provider_gcp_version.
App GKE Module — Provider Versions
modules/app_gke/versions.tf, modules/app_gke/secrets_store/versions.tf
Declared required_providers including kubectl, helm, and google for the module and secrets_store submodule.
Secrets Store Submodule
modules/app_gke/secrets_store/secrets_store.tf, modules/app_gke/secrets_store/variables.tf, modules/app_gke/secrets_store/...
New submodule installs Secrets Store CSI Driver via Helm, parses & applies GCP provider manifest (provider-gcp-plugin-v1.9.0.yaml) using kubectl_manifest, and exposes variables for chart/provider versions.
K8s Manifest
modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml
Adds ServiceAccount, ClusterRole, ClusterRoleBinding, and DaemonSet for the GCP Secrets Store CSI provider plugin with RBAC and deployment details.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Terraform
    participant GCP as GCP
    participant Helm as Helm
    participant K8s as Kubernetes
    participant SecretMgr as Secret Manager

    User->>Terraform: apply

    rect rgb(230,245,255)
    Note over Terraform,GCP: Provision GCP auth & secrets
    Terraform->>GCP: create google_service_account.weave_worker
    Terraform->>SecretMgr: create secret + secret_version (weave_worker_auth)
    Terraform->>GCP: add IAM member (secret access)
    Terraform->>GCP: add workload identity binding
    end

    rect rgb(235,255,230)
    Note over Terraform,K8s: Install Secrets Store CSI Driver & provider
    Terraform->>Helm: helm_release(secrets-store-csi-driver)
    Terraform->>K8s: kubectl_manifest(provider-gcp-plugin)
    end

    rect rgb(255,245,230)
    Note over K8s,SecretMgr: Runtime secret flow
    K8s->>SecretMgr: CSI Driver requests secret
    SecretMgr-->>K8s: secret data mounted as k8s secret
    K8s->>K8s: wandb weave-trace-worker uses mounted secret
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay extra attention to: Workload Identity IAM binding resource (google_service_account_iam_member), Secret Manager IAM member binding, and correctness of the kubectl_manifest parsing/application logic in the secrets_store submodule.

Possibly related PRs

Suggested reviewers

  • wandb-kc

Poem

🐰 I hid a secret in the cloud so deep,
A tiny key for workers to keep,
CSI hums, Workload IDs sing,
Kubernetes wakes — secrets take wing,
A rabbit hops — secure and neat! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Add CSI driver support for internal service authentication (GCP)" directly and accurately reflects the main objective of the changeset. The PR implements Secrets Store CSI Driver integration for Google Cloud Platform to enable dynamic secret mounting from Google Secret Manager, which aligns perfectly with what the title conveys. The title is specific and descriptive—mentioning both the feature being added (CSI driver support) and the platform (GCP)—without being overly verbose or using generic terms. A developer scanning the repository history would clearly understand that this PR introduces CSI driver capabilities for authentication purposes in a GCP environment.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aravind/csi-driver-support-gcp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main.tf (1)

129-148: Add k8s_namespace as a configurable root module variable.

The k8s_namespace is hardcoded as "default" in main.tf (line 132), preventing users from deploying weave-trace-worker in different namespaces. While the app_gke module supports this as a configurable parameter (with default "default"), the root module must expose it:

  1. Add a k8s_namespace variable to the root module's variables.tf with an appropriate default value
  2. Update main.tf line 132 to use k8s_namespace = var.k8s_namespace instead of the hardcoded string
🧹 Nitpick comments (2)
modules/app_gke/versions.tf (1)

1-15: Provider version constraints use different styles than root-level declarations.

This module uses >= X.Y constraints (e.g., >= 5.30 for google), while the root versions.tf uses caret constraints ~> X.Y (e.g., ~> 5.30). The >= constraint is more permissive and could introduce version drift. For consistency and to avoid potential incompatibilities, align these with the root-level constraint style (~> 5.30 for google, ~> 2.10 for helm, ~> 1.19 for kubectl).

Consider aligning version constraints to match the root module's style:

 terraform {
   required_providers {
     google = {
       source  = "hashicorp/google"
-      version = ">= 5.30"
+      version = "~> 5.30"
     }
     kubectl = {
       source  = "gavinbunney/kubectl"
-      version = ">= 1.19.0"
+      version = "~> 1.19"
     }
     helm = {
       source  = "hashicorp/helm"
-      version = ">= 2.10"
+      version = "~> 2.10"
     }
   }
 }
modules/app_gke/secrets_store/secrets_store.tf (1)

31-50: Manifest parsing approach could silently drop invalid manifests; consider more explicit error handling.

The local value uses can(yamldecode(manifest)) to filter valid YAML, which will silently exclude any manifest that fails to decode. This could hide corruption, truncation, or other issues with the fetched manifest. While it prevents Terraform errors, it might leave the CSI driver partially or incorrectly configured without clear diagnostics.

Additionally, the for_each loop uses array indices as keys (for idx, manifest in ... : idx => manifest). If manifests are reordered in a future version of the GCP provider, Terraform will see this as a destroy/recreate cycle, potentially causing brief disruptions to the CSI driver and dependent workloads.

Consider these improvements:

  1. Add explicit manifest validation to catch and report errors:
 locals {
   gcp_provider_manifests = [
     for manifest in split("---", data.http.gcp_provider_manifest.response_body) :
     manifest
-    if trimspace(manifest) != "" && can(yamldecode(manifest))
+    if trimspace(manifest) != ""
   ]
+  
+  # Validate all manifests can be decoded
+  _manifests_validated = [
+    for idx, manifest in local.gcp_provider_manifests :
+    yamldecode(manifest) # Will fail explicitly if invalid YAML
+  ]
 }
  1. Use stable identifiers for for_each to avoid unnecessary resource churn:
 resource "kubectl_manifest" "gcp_provider" {
-  for_each = { for idx, manifest in local.gcp_provider_manifests : idx => manifest }
+  for_each = { for idx, manifest in local.gcp_provider_manifests : 
+    "${yamldecode(manifest).kind}-${yamldecode(manifest).metadata.name}" => manifest
+  }

This ensures that individual manifests are tracked by their Kubernetes resource kind and name, making the configuration more resilient to manifest ordering changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed99ff7 and 6347118.

📒 Files selected for processing (10)
  • README.md (1 hunks)
  • main.tf (2 hunks)
  • modules/app_gke/main.tf (1 hunks)
  • modules/app_gke/outputs.tf (1 hunks)
  • modules/app_gke/secrets_store/secrets_store.tf (1 hunks)
  • modules/app_gke/secrets_store/variables.tf (1 hunks)
  • modules/app_gke/secrets_store/versions.tf (1 hunks)
  • modules/app_gke/variables.tf (1 hunks)
  • modules/app_gke/versions.tf (1 hunks)
  • versions.tf (1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
modules/app_gke/secrets_store/variables.tf

[medium] 1-21: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 1-10: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall

(CKV2_GCP_18)

modules/app_gke/secrets_store/secrets_store.tf

[medium] 7-17: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 27-47: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 30-50: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 1-21: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)


[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)


[medium] 11-28: Ensure Big Table Instances have deletion protection enabled

(CKV_GCP_122)


[medium] 1-10: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall

(CKV2_GCP_18)

modules/app_gke/outputs.tf

[medium] 30-50: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 41-67: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)

modules/app_gke/versions.tf

[medium] 1-21: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 1-10: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall

(CKV2_GCP_18)

modules/app_gke/secrets_store/versions.tf

[medium] 1-21: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 1-10: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall

(CKV2_GCP_18)

modules/app_gke/main.tf

[medium] 145-184: Ensure Terraform module sources use a commit hash

(CKV_TF_1)

🔇 Additional comments (13)
modules/app_gke/secrets_store/versions.tf (1)

1-15: Provider declarations are appropriate for module scope.

The required_providers block correctly specifies the providers needed by the secrets_store module: kubectl (for manifest application), helm (for chart deployment), and http (for fetching the GCP provider manifest). The version constraints are reasonable for their respective libraries.

modules/app_gke/secrets_store/variables.tf (1)

1-9: Inconsistency between summary and annotated code regarding variable defaults.

The AI summary indicates these variables have defaults ("1.4.7" and "1.9.0"), but the annotated code on lines 1-9 shows no default values. However, looking at modules/app_gke/variables.tf lines 104-114, the same variables are declared there WITH defaults. This suggests the submodule delegates versioning to its parent module, which is a valid pattern. Please verify that the actual file includes appropriate defaults or that the parent module correctly passes these values.

Can you confirm whether this submodule's variables have defaults defined, or if the parent module (app_gke) is responsible for providing them?

versions.tf (1)

24-27: Root-level kubectl provider requirement is appropriate.

The caret constraint (~> 1.19) is consistent with other provider version constraints in this file and provides reasonable flexibility for patch version updates.

modules/app_gke/variables.tf (1)

98-114: New variables are properly structured with sensible defaults.

The three new variables (k8s_namespace, secrets_store_csi_driver_version, secrets_store_csi_driver_provider_gcp_version) are correctly typed and documented. The defaults align with the PR objectives. Note that k8s_namespace defaults to "default", which means W&B resources will be deployed in the Kubernetes default namespace unless overridden—verify this aligns with your deployment strategy.

Confirm that these variables are correctly passed to the secrets_store module invocation in main.tf.

README.md (1)

84-84: README correctly documents new kubectl provider requirement.

The requirements table has been properly updated with the new kubectl provider entry, following the existing format and matching the root-level version constraint.

modules/app_gke/secrets_store/secrets_store.tf (2)

1-23: Helm release configuration for CSI driver is well-structured.

The secrets-store-csi-driver Helm release is properly configured with:

  • Correct namespace (kube-system) for a cluster-wide CSI driver
  • Appropriate feature flags enabled: secret synchronization, rotation, and reasonable rotation interval (120s)
  • Proper referencing of version variable

This configuration enables the CSI driver to sync secrets and support rotation as intended.


39-50: Kubectl manifest application has appropriate ordering and configuration.

The depends_on relationship ensures the GCP provider manifests are applied only after the CSI driver Helm release is deployed. The server_side_apply = true and wait = true settings are appropriate for Kubernetes resources and CRDs.

modules/app_gke/outputs.tf (1)

42-50: All referenced resources are properly defined.

Verification confirms that google_secret_manager_secret.weave_worker_auth (line 160 in main.tf) and google_service_account.weave_worker (line 176 in main.tf) are both defined in the module. The outputs are correctly referencing these existing resources, and Terraform will successfully resolve them at plan/apply time.

modules/app_gke/main.tf (4)

153-173: Secret and password generation looks good.

The random password (32 characters, no special chars) and Secret Manager secret creation are well-configured with auto-replication for high availability. Ensure the namespace used in secret_id (line 161) meets GCP Secret Manager naming requirements (alphanumeric, underscore, hyphen only).


175-182: Good defensive naming with GCP limit awareness.

The service account ID correctly truncates the namespace to 20 characters to accommodate the "-weave-wkr" suffix within GCP's 30-character service account ID limit. The descriptive display_name and description are helpful for operational clarity.


198-199: Clear documentation of CSI Driver integration.

The NOTE provides good context for future maintainers about how secrets are managed through the Secrets Store CSI Driver and operator-wandb Helm chart.


141-151: Secrets Store module configuration is correct.

Both version variables are properly defined in variables.tf with appropriate defaults (secrets_store_csi_driver_version: "1.4.7", secrets_store_csi_driver_provider_gcp_version: "1.9.0"). The module invocation correctly passes these variables and maintains proper dependency ordering on cluster and node pool creation.

main.tf (1)

530-552: Outputs verified and correctly structured.

The weave-trace-worker and secretsStore configuration properly references both outputs from the app_gke module:

  • weave_worker_service_account_email returns the service account email for Workload Identity binding
  • weave_worker_auth_secret_name returns the full secret path including /versions/latest

Both underlying resources exist and outputs are correctly computed.

Comment on lines +191 to +196
# Workload Identity binding - allows Kubernetes service account to impersonate GCP service account
resource "google_service_account_iam_member" "weave_worker_workload_identity" {
service_account_id = google_service_account.weave_worker.name
role = "roles/iam.workloadIdentityUser"
member = "serviceAccount:${local.project_id}.svc.id.goog[${var.k8s_namespace}/wandb-weave-trace-worker]"
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Hardcoded Kubernetes service account name in Workload Identity binding.

Line 195 hardcodes wandb-weave-trace-worker as the Kubernetes service account name in the Workload Identity binding. If the actual deployed service account uses a different name, this binding will fail silently, preventing the pod from accessing GCP resources.

Verify that the hardcoded service account name matches what's deployed. Check:

  1. What service account name is actually created/deployed for weave-trace-worker
  2. Whether this should be parameterized as a variable
  3. Cross-reference with the main.tf weave-trace-worker config to ensure consistency

🏁 Script executed:

#!/bin/bash
# Search for weave-trace-worker service account references
rg -n 'weave-trace-worker|wandb-weave-trace-worker' --type tf

Length of output: 266


I need to examine the actual service account definitions to confirm the exact name being deployed. Let me check the main.tf:530 context and any Kubernetes manifests:<!-- [scratchpad_update]
[task review_file_2/2 review_segment_1/1]

[observations_cumulative]
From initial script output:

  • main.tf:530: "weave-trace-worker" (key without "wandb-" prefix)
  • modules/app_gke/main.tf:181: description mentioning "weave-trace-worker pods"
  • modules/app_gke/main.tf:195: IAM member "wandb-weave-trace-worker" (WITH "wandb-" prefix)

From web search on Workload Identity:

  • K8s SA name in binding MUST match actual deployed K8s SA name exactly
  • Format: serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/{KSA_NAME}]
  • Any mismatch causes silent auth failure

[analysis]
Critical issue confirmed:

  • There is a definite name mismatch
  • Line 195 hardcodes "wandb-weave-trace-worker"
  • Line 530 references "weave-trace-worker" (without "wandb-" prefix)
  • These appear to be DIFFERENT names
  • If deployment uses "weave-trace-worker", the binding will fail

Need to verify: What is the actual K8s service account name created by the deployment?

[pending]

  1. Find exact K8s service account name being deployed
  2. Check if there's a deployment or manifest that creates this SA
  3. Confirm if this is a variable or hardcoded name mismatch

[scratchpad_end] -->

I need to examine the actual service account name being deployed. Let me search for the K8s manifests or deployment configuration:


---

🏁 Script executed:

```shell
#!/bin/bash
# Get full context around main.tf:530 and modules/app_gke/main.tf:195
echo "=== Context around main.tf:530 ==="
sed -n '525,535p' main.tf

echo ""
echo "=== Full IAM binding section in modules/app_gke/main.tf (around lines 184-196) ==="
sed -n '184,196p' modules/app_gke/main.tf

echo ""
echo "=== Search for kubernetes service account definitions ==="
rg -n 'kind.*ServiceAccount|metadata.*name.*weave' --type yaml --type yml -A 2 -B 2

Length of output: 1233


Let me search for where the K8s service account is actually created and verify the exact name being used:


---

🏁 Script executed:

```shell
#!/bin/bash
# Search for where weave-trace-worker K8s service account is created or referenced
echo "=== Search for weave-trace-worker deployment/service account creation ==="
rg -n 'weave-trace-worker' --type yaml -C 3 2>/dev/null || echo "No YAML files found with rg"

echo ""
echo "=== Search for K8s manifest files ==="
fd -e yaml -e yml . | head -20

echo ""
echo "=== Check if var.k8s_namespace is used and what k8s_namespace variable is ==="
rg -n 'k8s_namespace' modules/app_gke/main.tf -B 2 -A 2 | head -40

echo ""
echo "=== Check helm chart values or deployment config ==="
rg -n 'weave-trace-worker|serviceAccount:' -C 3 | head -50

Length of output: 3372


Let me search for Helm chart information and how the weave-trace-worker is deployed:


---

🏁 Script executed:

```shell
#!/bin/bash
# Search for Helm chart references and deployment
echo "=== Search for Helm chart references ==="
rg -n 'helm|chart|release.*weave|weave.*release' --type tf -i -C 2 | head -60

echo ""
echo "=== Search for how weave-trace-worker deployment is configured ==="
rg -n 'helm_release|kubernetes_manifest|module.*weave|weave.*module' --type tf -C 3 | head -60

echo ""
echo "=== Look for the full context where weave-trace-worker is in main.tf ==="
sed -n '510,550p' main.tf

Length of output: 5336


Let me search for the local.k8s_sa_map definition to understand the naming convention:


---

🏁 Script executed:

```shell
#!/bin/bash
# Search for local.k8s_sa_map definition
echo "=== Search for k8s_sa_map local variable ==="
rg -n 'k8s_sa_map' --type tf -B 5 -A 5

echo ""
echo "=== Check if there's a default service account name pattern ==="
rg -n 'weave_trace|local\.k8s_sa_map' --type tf | grep -E 'weave_trace|locals'

Length of output: 7248


Hardcoded Kubernetes service account name mismatches the established naming pattern.

The IAM binding at line 195 hardcodes wandb-weave-trace-worker, but reviewing the configuration shows:

  • Other services (weave-trace, app, console, etc.) use entries from local.k8s_sa_map to set their K8s service account names
  • The weave-trace deployment uses local.k8s_sa_map.weave_trace = wandb-weave-trace
  • The weave-trace-worker Helm values block doesn't specify a service account name and has no corresponding k8s_sa_map entry

The hardcoded wandb-weave-trace-worker in the binding appears inconsistent with this pattern. Either:

  1. The K8s service account name should be parameterized via local.k8s_sa_map (like other services), or
  2. The binding should reference the actual deployed service account name

Without the explicit name in Helm values, the actual deployed K8s service account name is unclear, risking a silent Workload Identity binding failure.

🤖 Prompt for AI Agents
In modules/app_gke/main.tf around lines 191–196 the Workload Identity member
hardcodes "wandb-weave-trace-worker" which breaks the established pattern of
using local.k8s_sa_map entries and can mismatch the actual deployed SA; fix by
making the binding reference the correct k8s SA entry instead of a hardcoded
name: either add a new key (e.g. weave_trace_worker) to local.k8s_sa_map and
replace the literal with local.k8s_sa_map.weave_trace_worker, or if the worker
should use the existing weave_trace SA, replace the hardcoded name with
local.k8s_sa_map.weave_trace; alternatively ensure the Helm values explicitly
set the serviceAccount.name and reference that mapped value in the IAM member.

Store the GCP Secrets Manager Provider manifest locally instead of
fetching it via HTTP at runtime. This ensures:
- Deterministic deployments (no dependency on external URLs)
- Faster Terraform plan/apply (no network fetch)
- Works in air-gapped environments
- Matches pattern used in core repository

The manifest file is versioned based on secrets_store_csi_driver_provider_gcp_version
variable, allowing easy updates when upgrading the provider version.
@amwarrier amwarrier force-pushed the aravind/csi-driver-support-gcp branch from 6347118 to 1ecb975 Compare October 30, 2025 17:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml (1)

72-77: Pin the busybox init container image to a specific version.

The init container references image: busybox without a version tag, which can result in non-deterministic deployments if the tag is re-released or updated. Pin it to a specific digest or version.

       - name: chown-provider-mount
-        image: busybox
+        image: busybox:1.35

Consider using a digest for stronger reproducibility: busybox@sha256:<digest>.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6347118 and f59f383.

📒 Files selected for processing (3)
  • README.md (1 hunks)
  • modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml (1 hunks)
  • modules/app_gke/secrets_store/secrets_store.tf (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • README.md
🧰 Additional context used
🪛 Checkov (3.2.334)
modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml

[medium] 7-17: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 27-47: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 30-50: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 1-21: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 7-63: Ensure Kubernetes Cluster is created with Private cluster enabled

(CKV_GCP_25)


[medium] 7-63: Ensure Kubernetes Cluster is created with Private cluster enabled

(CKV_GCP_25)


[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)


[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)


[medium] 41-67: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)


[medium] 11-28: Ensure Big Table Instances have deletion protection enabled

(CKV_GCP_122)


[high] 49-96: Ensure all Cloud SQL database instance requires all incoming connections to use SSL

(CKV_GCP_6)


[medium] 1-10: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall

(CKV2_GCP_18)


[medium] 52-131: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)

modules/app_gke/secrets_store/secrets_store.tf

[medium] 7-17: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 27-47: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 30-50: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 1-21: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)


[medium] 18-22: Ensure Cloud Armor prevents message lookup in Log4j2. See CVE-2021-44228 aka log4jshell

(CKV_GCP_73)


[medium] 11-28: Ensure Big Table Instances have deletion protection enabled

(CKV_GCP_122)


[medium] 1-10: Ensure Terraform module sources use a commit hash

(CKV_TF_1)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[high] 11-34: Ensure public access prevention is enforced on Cloud Storage bucket

(CKV_GCP_114)


[medium] 3-7: Ensure GCP network defines a firewall and does not use the default firewall

(CKV2_GCP_18)

🔇 Additional comments (6)
modules/app_gke/secrets_store/manifests/provider-gcp-plugin-v1.9.0.yaml (2)

15-50: RBAC and manifest structure are well-designed.

The ServiceAccount, ClusterRole, and ClusterRoleBinding correctly implement least-privilege access for token creation and serviceaccount read operations, which are the minimal permissions needed for the GCP provider plugin to authenticate via Workload Identity.


52-131: SecurityContext and resource limits are appropriately configured.

The DaemonSet enforces a strong security posture with non-root user, dropped capabilities, read-only filesystem, and runtime seccomp. Resource requests and limits prevent resource contention on nodes. Liveness probe ensures unhealthy pods are restarted.

modules/app_gke/secrets_store/secrets_store.tf (4)

1-23: Helm release configuration is well-structured.

The secrets_store_csi_driver Helm release enables secret synchronization, rotation, and polling with a 120-second interval, which supports dynamic secret updates from Google Secret Manager. Version and namespace are properly parameterized.


27-36: Manifest parsing logic is robust and deterministic.

The locals block correctly handles multi-document YAML by splitting on delimiters, filtering whitespace, and validating YAML syntax before applying. This approach ensures only valid manifests are deployed and addresses the past concern about reproducibility by using a versioned local file instead of a GitHub URL fetch.


38-49: kubectl_manifest orchestration is correct with proper dependencies.

Server-side apply and wait flags ensure manifests are fully reconciled before Terraform considers the resource complete. Dependency on the Helm release ensures the CSI driver is deployed before the GCP provider plugin manifests are applied, preventing timing issues.


51-53: Documentation of SecretProviderClass management is clear.

The comment correctly documents that SecretProviderClass resources are managed by the application Helm chart (operator-wandb), not Terraform. This architectural choice avoids CRD timing issues and maintains separation of concerns between infrastructure setup and application configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants