Skip to content

3 racks with more than one node per rack cannot be deployed from scratch #180

@pchmieli

Description

@pchmieli

Bug Report

What did you do?
Create cassandracluster with one dc, 3 racks and 2 nodesPerRacks:

  topology:
    dc:
    - name: dc1
      nodesPerRacks: 2
      rack:
      - labels:
          dynatrace.tier: cassandra-v1
          topology.kubernetes.io/zone: us-east4-b
        name: rack1
      - labels:
          dynatrace.tier: cassandra-v1
          topology.kubernetes.io/zone: us-east4-a
        name: rack2
      - labels:
          dynatrace.tier: cassandra-v1
          topology.kubernetes.io/zone: us-east4-c
        name: rack3
      resources: {}

What did you expect to see?
A 3 statefulsets with 2 pods each, all ready and running.

What did you see instead? Under which circumstances?
Roll-out stuck on 4th node (rack1 2/2, rack2 1/2, rack3 not created yet). Cassandra fails with:

Token allocation failed: the number of racks 2 in datacenter dc1 is lower than its replication factor 3.
Fatal configuration error; unable to start server.  See log for stacktrace.
ERROR [main] 2025-09-22 16:32:13,409 CassandraDaemon.java:900 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: Token allocation failed: the number of racks 2 in datacenter dc1 is lower than its replication factor 3.

It's connected with the config field introduced from what I know in cassandra 4.1:

allocate_tokens_for_local_replication_factor: 3

Workarounds

  1. Removing allocate_tokens_for_local_replication_factor field from config helps to create >3 cluster from scratch but it's no go in my case.
  2. Creating 3 nodes cluster first and just once it's ready scaling it out to 6 or more nodes.

Environment

  • casskop version:
    2.3.0

  • Kubernetes version information:
    Azure: v1.32.6
    GCP: v1.32.9-gke.1072000

  • Kubernetes cluster kind:
    Azure / GCP

  • Cassandra version:
    4.1.10

Possible Solution
Whenever deploying a cluster with nodesPerRacks > 1 start from deploying "first layer" first = ensure that each statefulset has exactly 1 pod and only then proceed to next step (which is scaling statefulset to the desired size).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions