Skip to content

Can't recover from disk full error #758

@abh

Description

@abh

Report

Disk (PVC) was full; I made the PVCs bigger and let mysql restart.

The group replication never came back. I set instance 0 to "bootstrap" and it got replication working.

The two other instances never finished "recovering" though and are just crash looping now. The logs from one of them attached.

More about the problem

mysql-1.txt

The controller doesn't have any (to me) useful information; it seems to think everything is fine-ish. I'm not sure what role the controller here has though (I'm migrating from the bitpoke operator which worked a little differently with the orchestrator exposed).

2024-10-23T16:21:26.676Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-10"}
2024-10-23T16:22:27.762Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1060a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:23:40.357Z	INFO	Crash recovery	Cluster was successfully rebooted	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65"}
2024-10-23T16:23:47.288Z	INFO	groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb	Member is not ONLINE	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "state": "RECOVERING"}
2024-10-23T16:30:19.004Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-13"}
2024-10-23T16:31:20.054Z	INFO	Crash recovery	Pod is waiting for recovery	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1360a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:31:55.660Z	INFO	Crash recovery	Cluster was successfully rebooted	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940"}
2024-10-23T16:32:02.594Z	INFO	groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb	Member is not ONLINE	{"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "state": "OFFLINE"}

Steps to reproduce

  1. let disk run full; for example use the default configuration that doesn't limit how many binlog files are kept.
  2. watch cluster go down
  3. watch cluster not recover after disk has been added
  4. force mysql-0 to start group replication
  5. watch the replicas never recovering

Versions

  1. Kubernetes - v1.28.12
  2. Operator - 0.8.0
  3. Database - the default 8.x version from 0.8.0

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions