Skip to content

SparkConnect spec update not applied until pods are manually deleted #2665

@RSBhoomika

Description

@RSBhoomika

What happened?

When updating any field under the spec of a SparkConnect resource (e.g., executor.instances, resource requests etc.), the CR’s manifest updates correctly, but the changes do not get reflected in the running pods. The operator status and pod count remain stale, and the operator does not reconcile or restart pods automatically. Changes only take effect after manually deleting pods, which forces the operator to recreate them.

Reproduction Code

### Steps to Reproduce:

1. Deploy a SparkConnect resource, for example:

```yaml
apiVersion: sparkoperator.k8s.io/v1alpha1
kind: SparkConnect
metadata:
  name: spark-connect
  namespace: spark
spec:
  sparkVersion: 4.0.0
  server:
    template:
      metadata:
        labels:
          key1: value1
          key2: value2
        annotations:
          key3: value3
          key4: value4
      spec:
        containers:
        - name: spark-kubernetes-driver
          image: spark:4.0.0
          imagePullPolicy: Always
          resources:
            requests:
              cpu: 1
              memory: 1Gi
            limits:
              cpu: 1
              memory: 1Gi
        serviceAccount: spark-operator-spark
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsGroup: 185
          runAsUser: 185
          runAsNonRoot: true
          allowPrivilegeEscalation: false
          seccompProfile:
            type: RuntimeDefault
  executor:
    instances: 1
    cores: 1
    memory: 512m
    template:
      metadata:
        labels:
          key1: value1
          key2: value2
        annotations:
          key3: value3
          key4: value4
      spec:
        containers:
        - name: spark-kubernetes-executor
          image: spark:4.0.0
          imagePullPolicy: Always
        securityContext:
          capabilities:
            drop:
            - ALL
          runAsGroup: 185
          runAsUser: 185
          runAsNonRoot: true
          allowPrivilegeEscalation: false
          seccompProfile:
            type: RuntimeDefault
  1. Update a field under spec, for example:
  • Increase executor.instances from 1 to 3
  • Change server.template.spec.containers[0].resources.requests.cpu
  1. Observe the resource:
kubectl get sparkconnect spark-connect -n spark -o yaml

The updated spec is visible, but the status still reflects the old state (e.g., number of running executors does not change).

  1. Observe pods do not scale or update accordingly.

  2. Only by manually deleting pods (e.g., kubectl delete pod ...) do the changes get applied when pods restart.

Expected behavior

Any spec update in the SparkConnect resource should trigger the operator to:

1.Reconcile the change automatically, scaling or updating pods as necessary, without manual pod deletion.

2.Update the status to reflect the current state matching the desired spec.

Actual behavior

1.Spec updates are accepted and reflected in the CR manifest.

2.Operator does not reconcile or restart pods based on spec changes.

3.Pod state and operator status remain stale.

4.Manual pod deletion is required to trigger reconciliation.

Environment & Versions

  • Kubernetes Version:v1.29.0
  • Spark Operator Version:2.3.0
  • Apache Spark Version:4.0.0

Additional context

No response

Impacted by this bug?

Give it a 👍 We prioritize the issues with most 👍

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions