Skip to content

feat: enable replicas to use replication slots #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: release-v1.16.x
Choose a base branch
from

Conversation

piotrkpc
Copy link

@piotrkpc piotrkpc commented Jul 17, 2025

Part of: https://github.com/tetrateio/tetrate/issues/26058

This PR introduces the following options to enable use of replicaiton slots:
Here are default values:

spec:
  replicationSlots:
    enabled: false
    # below are not implemented yet
    disableCleanup: false # allows to disable cleaning up of inactive replication slots
    maxWalKeepSize: "1Gi" # controls https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE
    inactiveSlotGracePeriod: 2m # defines a time period after which inactive slot is deleted
    healthCheckInterval: 30s # defines interval that is used to check for inactive replication slots

Replication slots are created and deleted by the operator and the replication slot name is passed down to scripts as a env var in a similar way as $PRIMARY_HOST_NAME


type Repository interface {
CreateSlot(ctx context.Context, name string) (replicationslot.ReplicationSlot, error)
FindSlotByName(ctx context.Context, name string) (replicationslot.ReplicationSlot, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit:

Suggested change
FindSlotByName(ctx context.Context, name string) (replicationslot.ReplicationSlot, error)
GetSlot(ctx context.Context, name string) (replicationslot.ReplicationSlot, error)

I think name is the ID of the slot, so we can make it easier to understand this API.
If we add some fins slots by other filters, then we can add Find* methods.

Signed-off-by: Piotr Kopec <[email protected]>
@piotrkpc piotrkpc force-pushed the manage-replication-slots branch from 79232df to e2265f2 Compare July 20, 2025 00:32
Copy link
Author

@piotrkpc piotrkpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sergicastro annotated code, let me know if something is not clear.

replicationSlotsCreateDeleter: noopReplicationSlotsCreateDeleter{},
}

// TODO(piotrkpc): should this be here ? not testable code really
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also does not looks great as the direct sql/postgres dependencies are here because of this. Maybe we can figure out better way to setup db connections.

}

// TODO(piotrkpc): @Sergi, this is likely to be changed to something more sophisticated that would enable ssl enabled connections
func newConnectorConfig(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to figure out how to create and manage connection with SSL configuration.

@@ -159,7 +369,13 @@ func (r *ReplicaDbCountSpecEnforcer) getDeployedReplicas() []statefulset.Statefu
}

func (r *ReplicaDbCountSpecEnforcer) getNbreDeployedReplicas() int32 {
return r.resourcesStates.StatefulSets.Replicas.NbreDeployed
deployedReplicas := r.resourcesStates.StatefulSets.Replicas.NbreDeployed
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is required in cases when we enable/disable replication slots so we can deploy replicas with proper configuration before cleaning up replicas that have old config (enabled/disable replication slots).

"reactive-tech.io/kubegres/controllers/states"
)

func TestCreateReplicaDbCountSpecEnforcer(t *testing.T) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is empty now as I'm not sure we want to test it here - it will execute a lot faster but the setup requires creating a lot of fake object to make it work and I'm also not sure if there are some hidden dependencies on other controllers such as statefulset controllers.

@@ -211,8 +211,16 @@ data:
then
chown -R postgres:postgres $PGDATA;
fi


grep "primary_slot_name" $PGDATA/postgresql.auto.conf > /dev/null && sed -i '/primary_slot_name/d' $PGDATA/postgresql.auto.conf
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will be also required in TSB since we use custom config.

github.com/onsi/ginkgo/v2 v2.21.0
github.com/onsi/gomega v1.34.2
github.com/stretchr/testify v1.10.0
github.com/testcontainers/testcontainers-go v0.38.0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know what do you think about using testcontainers - used to the repo object.

})

// TODO(piotrkpc): missing tests:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is missing, it might work though :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the test, does work as expected.

@piotrkpc piotrkpc changed the title wip: replication slots feat: enable replicas to use replication slots Jul 28, 2025
@piotrkpc piotrkpc marked this pull request as ready for review July 28, 2025 19:30
}
} else {
log.Println("No PVC found for kubegres resource '" + resourceToDelete.Name + "'")
continue
}
}

log.Println("Deleted all resources created during tests. Waiting for 30 seconds...")
time.Sleep(30 * time.Second)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watching for delete events rather than doing this sleep shave ~4min from each test group and hopefully helps with flakiness

sergicastro and others added 3 commits August 7, 2025 14:55
…nce (#53)

I need to finish testing of the TLS PR #51 before making this connection
management compatible with TLS-secured connections.

The [`func (r *ServicesCountSpecEnforcer) canConnectToPrimaryDb()
bool`](https://github.com/tetrateio/kubegres/pull/53/files#diff-2069405fe99f8bae54d23273d6e4708cb8b4e1ec715bbe3f276c63b3157c5ee2R136)
method is the current one checking all of this works.
It will need to be removed once the target PR starts using these
changes.

---------

Signed-off-by: Sergi Castro <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Piotr Kopec <[email protected]>
@@ -261,6 +262,28 @@ func (r *TestResourceCreator) DeleteResource(resourceToDelete client.Object, res
return true
}

func (r *TestResourceCreator) DeleteResourceWithWatch(resourceToDelete client.Object, resourceName string, listObject client.ObjectList) (<-chan watch.Event, func()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not used here but i'd keep iet here so we can speedup tests in followup PR.

@piotrkpc piotrkpc requested a review from sergicastro August 13, 2025 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants