Skip to content

[GKE] ClusterRole Creation Permission Error - Investigation and Reproduction Required #1601

@hanizang77

Description

@hanizang77

[GKE] ClusterRole Creation Permission Error - Investigation and Reproduction Required

Reported Problem

A user reported that they cannot create ClusterRoles when deploying applications on a GKE cluster.
The same operations work correctly on AWS EKS, Azure AKS, NHN, and Alibaba clusters.

Reporter's Environment

Reported Error Message

Error from server (Forbidden): error when creating "...": 
clusterroles.rbac.authorization.k8s.io is forbidden: 
User "115992006892805718504" cannot create resource "clusterroles" 
in API group "rbac.authorization.k8s.io" at the cluster scope: 
requires one of ["container.clusterRoles.create"] permission(s).

Reproduction Attempt Results (2025-10-31)

Test Environment

Test Procedure

# 1. Create GKE cluster via Tumblebug (k8sClusterDynamic API)
curl -X 'POST' \
  'http://localhost:1323/tumblebug/ns/default/k8sClusterDynamic' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "imageId": "default",
  "specId": "gcp+asia-east1+e2-standard-4",
  "name": "k8scluster01",
  "nodeGroupName": "k8sng01"
}'

# 2. Download kubeconfig and configure
curl -s "http://localhost:1024/spider/cluster/d42aa6pptkns739vh7r0?ConnectionName=gcp-asia-east1" \
  | jq -r '.AccessInfo.Kubeconfig' > /tmp/gke-spider-kubeconfig.yaml
sed -i 's/CLUSTER_NAME_PLACEHOLDER/d42aa6pptkns739vh7r0/g; s/CONNECTION_NAME_PLACEHOLDER/gcp-asia-east1/g' \
  /tmp/gke-spider-kubeconfig.yaml
export KUBECONFIG=/tmp/gke-spider-kubeconfig.yaml

# 3. Verify cluster connection
kubectl get nodes
# Result: Success ✅

# 4. Check permissions (CRITICAL: This immediately returned "yes")
kubectl auth can-i create clusterroles
# Result: yes ✅ ← Already have permission from the start!

kubectl auth can-i --list | head -30
# Result: Shows extensive permissions ✅

kubectl auth whoami
# Result: Username: 110168210200745144964 ✅

# 5. Test manual ClusterRole creation
kubectl create clusterrole test-role --verb=get --resource=pods
# Result: Success ✅
kubectl delete clusterrole test-role
# Result: Success ✅

# 6. Test real-world application deployment (Nginx Ingress Controller)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.8.1/deploy/static/provider/cloud/deploy.yaml
# Result: Success ✅ (ClusterRole, ClusterRoleBinding, etc. all created without issues)

kubectl get all -n ingress-nginx
# Result: All resources created successfully ✅

Failed to Reproduce - No Permission Error Occurred

Unable to reproduce the error. From the very first permission check (kubectl auth can-i), we already had full ClusterRole creation permissions. All subsequent tests succeeded without any RBAC issues.

Analysis of Reproduction Failure

Discovered Cluster Configuration Difference

After checking the KeyValueList of the test cluster:

{
  "key": "RbacBindingConfig",
  "value": "{enableInsecureBindingSystemAuthenticated:true,enableInsecureBindingSystemUnauthenticated:true}"
}

enableInsecureBindingSystemAuthenticated: true is enabled, which automatically grants broad permissions to the system:authenticated group (all authenticated users).

This is why no permission error occurred!

Root Cause Analysis

GKE's Dual Authentication Structure: IAM vs RBAC

  1. OAuth2 Token (IAM) Authentication: Implemented in CB-Spider PR [K8S] Support kubeconfig with dynamic token types #1583

    • Scope: cloud-platform
    • Role: GCP API access and Kubernetes API server authentication
    • Permission: Only GCP resource management
  2. Missing Kubernetes RBAC Permissions:

    • GKE requires explicit Kubernetes RBAC ClusterRoleBinding separate from IAM authentication
    • CB-Spider does not automatically create cluster-admin RoleBinding during cluster creation

Differences from Other CSPs

CSP Auto-grants Admin Privileges on Cluster Creation Authentication Method
AWS EKS ✅ Auto-granted IAM → aws-auth ConfigMap mapping
Azure AKS ✅ Auto-granted Azure AD → K8s RBAC integration
NHN ✅ Auto-granted Static token with admin privileges
Alibaba ACK ✅ Auto-granted Static token with admin privileges
GCP GKE Manual setup required OAuth2 → IAM → K8s RBAC (separate)

Permission Differences Based on GKE RbacBindingConfig

Permissions vary significantly depending on GKE cluster creation options:

Case 1: enableInsecureBindingSystemAuthenticated: false (default, recommended)

  • No default permissions for system:authenticated group
  • Manual ClusterRoleBinding creation required
  • ❌ No ClusterRole creation permission → Reported error occurs

Case 2: enableInsecureBindingSystemAuthenticated: true (not recommended)

  • Automatically grants broad permissions to all authenticated users
  • ✅ ClusterRole creation allowed → No error
  • ⚠️ Security risk: Excessive permissions

Proposed Solutions

Solution 1: Modify CB-Spider Code (Recommended)

File: cloud-control-manager/cloud-driver/drivers/gcp/resources/ClusterHandler.go

Modifications:

  • After CreateCluster() completion, automatically create cluster-admin ClusterRoleBinding
  • Connect Kubernetes client using generated kubeconfig
  • Grant cluster-admin permission to Service Account (or numeric user ID)
  • Similar to AWS EKS's aws-auth ConfigMap approach, automatically grant admin permission to cluster creator

Notes:

  • Requires k8s.io/client-go library
  • On error, only output warning and treat cluster creation as successful (non-critical)

Solution 2: Manual RoleBinding Creation (Temporary Workaround)

User manually executes after cluster creation:

# Check GCP Service Account Unique ID
gcloud iam service-accounts describe {SERVICE_ACCOUNT_EMAIL}

# Or check current user with kubectl
kubectl auth whoami

# Create cluster-admin binding
kubectl create clusterrolebinding cb-spider-admin \
  --clusterrole=cluster-admin \
  --user={NUMERIC_USER_ID}

References

Official GKE RBAC Documentation

Key Points

"To authorize an action, GKE checks for an RBAC policy first. If there isn't an RBAC policy, GKE checks for IAM permissions."

"GKE users require at minimum, the container.clusters.get IAM permission in the project that contains the cluster. This permission is required for users to authenticate to the clusters in the project, but does not authorize them to perform any actions inside those clusters."

Related to PR #1583

Recommended Actions

  1. Immediate Action (User Guidance):

    • Guide manual ClusterRoleBinding creation
    • Or use enableInsecureBindingSystemAuthenticated: true option (security caution)
  2. Long-term Solution (Code Modification):

    • Apply Solution 1 code modification
    • Implementation considering GKE characteristics

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions