Skip to content

Improving the default PDB implementation. #8780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 65 commits into
base: main
Choose a base branch
from

Conversation

naemono
Copy link
Contributor

@naemono naemono commented Jul 31, 2025

Resolves #2936

What is this change?

This adds an Enterprise feature which allows the ECK operator to create multiple PDBs which map to grouped roles within nodesets (ideally a PDB per nodeset of role 'X'). For non-enterprise customers, the behavior remains the same, which creates a single default pdb for the whole cluster allowing 1 disruption when the cluster is green.

Examples

Simple example (pdb per nodeset with single roles)

# Manifest
kind: Elasticsearch
spec:
  version: 8.12.1
  nodeSets:
    - name: masters
      count: 3
      config:
        node.roles: ["master"]
    - name: data
      count: 3
      config:
        node.roles: ["data", "ingest"]

# PDBs created
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  labels:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: cluster-name
  name: cluster-name-es-default-master
spec:
  maxUnavailable: 1
  selector:
    matchExpressions:
    -  key: elasticsearch.k8s.elastic.co/cluster-name
        operator: "In"
        values: ["cluster-name"]
    -  key: elasticsearch.k8s.elastic.co/statefulset-name
        operator: "In"
        values: ["cluster-name-es-masters"]

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  labels:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: cluster-name
  name: cluster-name-es-default-data
spec:
  maxUnavailable: 1
  selector:
    matchExpressions:
    -  key: elasticsearch.k8s.elastic.co/cluster-name
        operator: "In"
        values: ["cluster-name"]
    -  key: elasticsearch.k8s.elastic.co/statefulset-name
        operator: "In"
        values: ["cluster-name-es-data"]

Complex example (mixed roles across nodeSets)

# Manifest
kind: Elasticsearch
spec:
  version: 8.12.1
  nodeSets:
    - name: masters
      count: 3
      config:
        node.roles: ["master", "data", "ingest"]
    - name: data-warm
      count: 3
      config:
        node.roles: ["data_warm"]
    - name: data-cold
      count: 3
      config:
        node.roles: ["data_cold"]
    - name: data-frozen
      count: 3
      config:
        node.roles: ["data_frozen"]
    - name: coordinating
      count: 1
      config:
        node.roles: []

# PDBs created
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  labels:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: cluster-name
  name: cluster-name-es-default-master
spec:
  maxUnavailable: 1
  selector:
    matchExpressions:
    -  key: elasticsearch.k8s.elastic.co/cluster-name
        operator: "In"
        values: ["cluster-name"]
    -  key: elasticsearch.k8s.elastic.co/statefulset-name
        operator: "In"
        # All of these statefulsets share data* roles, so they must be grouped together.
        values: ["cluster-name-es-masters", "cluster-name-es-data-warm", "cluster-name-es-data-cold"]

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  labels:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: cluster-name
  name: cluster-name-es-default-data_frozen
spec:
  maxUnavailable: 1
  selector:
    matchExpressions:
    -  key: elasticsearch.k8s.elastic.co/cluster-name
        operator: "In"
        values: ["cluster-name"]
    -  key: elasticsearch.k8s.elastic.co/statefulset-name
        operator: "In"
        values: ["cluster-name-es-data_frozen"]

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  labels:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: cluster-name
  name: cluster-name-es-default-coord
spec:
  maxUnavailable: 1
  selector:
    matchExpressions:
    -  key: elasticsearch.k8s.elastic.co/cluster-name
        operator: "In"
        values: ["cluster-name"]
    -  key: elasticsearch.k8s.elastic.co/statefulset-name
        operator: "In"
        values: ["cluster-name-es-coordinating"]

Implementation notes for review

  • I admittedly went back and forth on the implementation, specifically the method of grouping statefulSets by their associated roles. Initially when thinking through how I would group these nodeSets together I logically land on a recursive-style algorithm (traverse all sts' roles, and find all other sts' with this role, and traverse that sts' roles), which I don't want to bring into this code base, so I began investigating other potential options for this grouping. I seemed to recall something along the lines of connected points on a graph, which led me to the current implementation which is building an adjacency list, then using depth first search to build the connected "groups" of statefulsets. I keep feeling that there's a simpler method of doing this grouping, so suggestions are strongly welcome in this area. resolved in review process
  • See the note about pdb name overlaps. I'd love some comments as to whether it's an actual concern, and potential paths to resolve. resolved in the review process.

TODO

  • full testing in a cluster
  • unit tests
  • Documentation (I will open a separate PR in our docs repository for the public facing changes)
  • Test older versions of Elasticsearch with the old "roles" style.

Testing

With a yellow cluster, data pdb works as expected (0 disruptions allowed), along with coordinating, and master allowing a single disruption.

❯ kc get es -n elastic
NAME       HEALTH   NODES   VERSION   PHASE   AGE
pdb-test   yellow   6       9.0.4     Ready   6d22h

❯ kc get pdb -n elastic
NAME                         MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
pdb-test-es-default-coordinating    N/A             1                 1                     6d22h
pdb-test-es-default-data     N/A             0                 0                     6d22h
pdb-test-es-default-master   N/A             1                 1                     6d22h

Forcing the cluster red, things are also acting as expected and allowing 0 disruptions:

❯ kc get es -n elastic
NAME       HEALTH   NODES   VERSION   PHASE             AGE
pdb-test   red      4       9.0.4     ApplyingChanges   6d22h

❯ kc get pdb -n elastic
NAME                         MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
pdb-test-es-default-coordinating    N/A             0                 0                     6d22h
pdb-test-es-default-data     N/A             0                 0                     6d22h
pdb-test-es-default-master   N/A             0                 0                     6d22h

naemono added 18 commits July 23, 2025 13:39
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Adjusting tests

Signed-off-by: Michael Montgomery <[email protected]>
Optimize disruption func.

Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Fixing tests

Signed-off-by: Michael Montgomery <[email protected]>
Adding additional documentation.

Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
@naemono naemono added the >feature Adds or discusses adding a feature to the product label Jul 31, 2025
@prodsecmachine
Copy link
Collaborator

prodsecmachine commented Jul 31, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@naemono naemono requested a review from Copilot July 31, 2025 15:29
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Enterprise-licensed role-specific PodDisruptionBudgets (PDBs) to improve cluster operations for ECK operator users. For non-enterprise users, the default single PDB behavior remains unchanged.

  • Introduces role-based PDB grouping where StatefulSets sharing Elasticsearch node roles are grouped into the same PDB
  • Uses graph algorithms (adjacency list + DFS) to identify connected role groups
  • Maintains health-aware disruption limits based on specific role requirements

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/controller/elasticsearch/pdb/roles.go Core implementation for role-specific PDB creation and reconciliation
pkg/controller/elasticsearch/pdb/roles_test.go Comprehensive test coverage for role-specific PDB functionality
pkg/controller/elasticsearch/pdb/dfs.go Graph algorithms for grouping StatefulSets by shared roles
pkg/controller/elasticsearch/pdb/dfs_test.go Test coverage for grouping algorithms
pkg/controller/elasticsearch/pdb/reconcile.go Updated main reconcile function with enterprise license check
pkg/controller/elasticsearch/pdb/reconcile_test.go Updated test helper function and legacy test fix
pkg/controller/common/statefulset/fixtures.go Extended test fixtures to support additional Elasticsearch node roles
Comments suppressed due to low confidence (1)

pkg/controller/elasticsearch/pdb/roles.go:81

  • There is an extra tab character at the end of the test name string, which should be removed for consistency.
		// Determine the roles for this group

Signed-off-by: Michael Montgomery <[email protected]>
@naemono
Copy link
Contributor Author

naemono commented Jul 31, 2025

buildkite test this

naemono added 2 commits July 31, 2025 11:52
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
naemono added 2 commits August 8, 2025 08:43
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
// allowedDisruptionsForRole returns the maximum number of pods that can be disrupted for a given role.
func allowedDisruptionsForRole(
es esv1.Elasticsearch,
role esv1.NodeRole,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic needs to take all roles in the group into account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is the case in certain scenarios, which I recently made changes to cover. I'd like your input on whether I've covered the needed cases though @pebrc , thanks.


// Determine the most conservative role to use when determining the maxUnavailable setting.
// If group has no roles, it's a coordinating ES role.
primaryRole := getPrimaryRoleForPDB(groupRoles)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we flip the first two priorities on L37 so that the role most sensitive to disruption comes first. I know I argued that masters should be first but if we want to drive the logic to determine the allowed disruptions from that priority list as well, I think I would change my mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted this list of priority roles, but are you wanting specific changes to how we are determining the allowed disruptions for each type of role?

review updates.

Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Add unit tests.

Signed-off-by: Michael Montgomery <[email protected]>
Signed-off-by: Michael Montgomery <[email protected]>
Fix tests issue found by linter.

Signed-off-by: Michael Montgomery <[email protected]>
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking again but was not able to finish in the time I had. Will come back to this.

// getPrimaryRoleForPDB returns the primary role from a set of roles for PDB naming and grouping.
// Data roles are most restrictive (require green health), so they take priority.
// All other roles have similar disruption rules (require yellow+ health).
func getPrimaryRoleForPDB(roles sets.Set[esv1.NodeRole]) esv1.NodeRole {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this function? groupBySharedRoles gives you the statefulsets already grouped by their primary roles?

Comment on lines +403 to +407
for _, sts := range statefulSets {
if isSensitiveToDisruptions(sts) && commonsts.GetReplicas(sts) == 1 {
return 0
}
}
Copy link
Collaborator

@pebrc pebrc Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic cannot be by stateful set imo. It needs to take desired number pods of a certain Elasticsearch role into account across all relevant stateful sets. Relating to the discussion out of band: there needs to be reasonable expectation of availability that we can protect with a PDB to warrant 0 allowed disruptions. E.g. at least two data nodes of each (non-frozen) data role type (so that there is a place for primary and replica shards to go).
At least three master nodes etc (so that two remainig can take over while the third is disrupted).

Protecting single node tiers with a PDB with 0 allowed disruptions would mean preventing any k8s upgrade or evictions to proceed forever, making the k8s cluster unmanageable, imo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Adds or discusses adding a feature to the product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PDB per Node type (Master, Data, Ingest)
4 participants