Skip to content

Backups no longer work after upgrading to 1.17.0 #2066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wonko opened this issue May 4, 2025 · 10 comments
Open

Backups no longer work after upgrading to 1.17.0 #2066

wonko opened this issue May 4, 2025 · 10 comments
Labels

Comments

@wonko
Copy link

wonko commented May 4, 2025

Report

After a operator and cluster upgrade towards 1.17.0 (crds + helm upgrade), backup jobs no longer complete (which worked fine in 1.16.1). The log for a backup is below, with target bucket and db names redacted.

Seems like it tries to delete a non-existing key, and then fails ... Any pointer what might be wrong?

The image which is used by the pod is percona/percona-xtradb-cluster-operator:1.17.0-pxc8.0-backup-pxb8.0.35. I can't seem to find the source of /usr/bin/backup.sh in that image to allow me to debug this further.

More about the problem

backup-init ++ id -u
backup-init ++ id -g
backup-init + install -o 2 -g 2 -m 0755 -D /peer-list /opt/percona/peer-list
xtrabackup + LIB_PATH=/usr/lib/pxc
xtrabackup + . /usr/lib/pxc/backup.sh
xtrabackup ++ set -o errexit
xtrabackup ++ LIB_PATH=/usr/lib/pxc
xtrabackup ++ . /usr/lib/pxc/aws.sh
xtrabackup +++ set -o errexit
xtrabackup +++ export AWS_SHARED_CREDENTIALS_FILE=/tmp/aws-credfile
xtrabackup +++ AWS_SHARED_CREDENTIALS_FILE=/tmp/aws-credfile
xtrabackup +++ export AWS_ENDPOINT_URL=https://[REDACTED].s3.amazonaws.com
xtrabackup +++ AWS_ENDPOINT_URL=https://[REDACTED].s3.amazonaws.com
xtrabackup +++ export AWS_REGION=eu-west-1
xtrabackup +++ AWS_REGION=eu-west-1
xtrabackup +++ '[' -n true ']'
xtrabackup +++ [[ true == \f\a\l\s\e ]]
xtrabackup ++ SST_INFO_NAME=sst_info
xtrabackup ++ XBCLOUD_ARGS='--curl-retriable-errors=7 '
xtrabackup ++ INSECURE_ARG=
xtrabackup ++ '[' -n true ']'
xtrabackup ++ [[ true == \f\a\l\s\e ]]
xtrabackup ++ S3_BUCKET_PATH=backups/[REDACTED]-pxc-db-2025-05-04-08:43:19-full
xtrabackup +++ date +%F-%H-%M
xtrabackup ++ BACKUP_PATH=[REDACTED]-pxc-db-pxc-2025-05-04-08-43-xtrabackup.stream
xtrabackup + GARBD_OPTS=
xtrabackup + check_ssl
xtrabackup + CA=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
xtrabackup + '[' -f /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt ']'
xtrabackup + SSL_DIR=/etc/mysql/ssl
xtrabackup + '[' -f /etc/mysql/ssl/ca.crt ']'
xtrabackup + CA=/etc/mysql/ssl/ca.crt
xtrabackup + SSL_INTERNAL_DIR=/etc/mysql/ssl-internal
xtrabackup + '[' -f /etc/mysql/ssl-internal/ca.crt ']'
xtrabackup + CA=/etc/mysql/ssl-internal/ca.crt
xtrabackup + KEY=/etc/mysql/ssl/tls.key
xtrabackup + CERT=/etc/mysql/ssl/tls.crt
xtrabackup + '[' -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
xtrabackup + KEY=/etc/mysql/ssl-internal/tls.key
xtrabackup + CERT=/etc/mysql/ssl-internal/tls.crt
xtrabackup + '[' -f /etc/mysql/ssl-internal/ca.crt -a -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
xtrabackup + GARBD_OPTS='socket.ssl_ca=/etc/mysql/ssl-internal/ca.crt;socket.ssl_cert=/etc/mysql/ssl-internal/tls.crt;socket.ssl_key=/etc/mysql/ssl-internal/tls.key;socket.ssl_cipher=;pc.weight=0;'
xtrabackup + '[' -n [REDACTED] ']'
xtrabackup + clean_backup_s3
xtrabackup + s3_add_bucket_dest
xtrabackup + local time=15
xtrabackup + local is_deleted_full=0
xtrabackup + local is_deleted_info=0
xtrabackup + local exit_code=0
xtrabackup + for i in {1..5}
xtrabackup + ((  i > 1  ))
xtrabackup + is_object_exist [REDACTED] backups/[REDACTED]-pxc-db-2025-05-04-08:43:19-full/
xtrabackup + local bucket=[REDACTED]
xtrabackup + local path=backups/[REDACTED]-pxc-db-2025-05-04-08:43:19-full/
xtrabackup ++ aws s3 ls s3://[REDACTED]/backups/[REDACTED]-pxc-db-2025-05-04-08:43:19-full/ --summarize --recursive
xtrabackup 
xtrabackup An error occurred (NoSuchKey) when calling the ListObjectsV2 operation: The specified key does not exist.
xtrabackup + res=
xtrabackup + echo ''
xtrabackup + grep -q 'Total Objects: 0'
xtrabackup + return 0
xtrabackup + log INFO 'Delete (attempt 1)...'
xtrabackup 2025-05-04 08:43:39 [INFO] Delete (attempt 1)...
xtrabackup + xbcloud delete --curl-retriable-errors=7 --storage=s3 --s3-bucket=[REDACTED] backups/[REDACTED]-pxc-db-2025-05-04-08:43:19-full
xtrabackup error: http request failed: SSL peer certificate or SSH remote key was not OK
xtrabackup error: http request failed: SSL peer certificate or SSH remote key was not OK
xtrabackup 250504 08:43:39 xbcloud: Successfully connected.
xtrabackup 250504 08:43:39 xbcloud: Failed to list objects. Error message: The specified key does not exist.
xtrabackup 250504 08:43:39 xbcloud: Delete failed. Cannot list backups/[REDACTED]-pxc-db-2025-05-04-08:43:19-full.

Steps to reproduce

Hard to tell how to reproduce, but we had a 1.16.1 setup with backups towards S3 configured and working fine, for many weeks. After an upgrade to 1.17.0, backups started to fail.

Versions

  1. Kubernetes -> 1.31.7 (EKS)
  2. Operator -> 1.17.0
  3. Database -> 1.17.0

Anything else?

No response

@wonko wonko added the bug label May 4, 2025
@egegunes
Copy link
Contributor

egegunes commented May 5, 2025

Hi @wonko,

We tried to reproduce this quickly but we couldn't. Could you please share your cr.yaml?

@wonko
Copy link
Author

wonko commented May 5, 2025

I assume you'd only need the backup part (i've left out the pxc en haproxy sections). We use the helm chart for DB deployment (from https://github.com/percona/percona-helm-charts/tree/main/charts/pxc-db). This is the terraform template snippet, the variables are obviously filled in... The template wasn't touched going from 1.16.1 to 1.17.0.

tls:
  enabled: true
unsafeFlags:
  pxcSize: false
allowUnsafeConfigurations: false

backup:
  pitr:
    enabled: true
    timeBetweenUploads: 120
    timeoutSeconds: 600
    storageName: aws-bucket-pitr
  schedule:
    - keep: 90
      name: daily
      schedule: 4 15 * * *
      storageName: aws-bucket-backups
  storages:
    aws-bucket-backups:
      s3:
        bucket: ${backup_bucket}/backups
        region: ${backup_region}
        credentialsSecret: ${backup_credentials_secret_name}
        endpointUrl: https://${backup_bucket}.s3.amazonaws.com
      type: s3
    aws-bucket-pitr:
      s3:
        bucket: ${backup_bucket}/pitr
        region: ${backup_region}
        credentialsSecret: ${backup_credentials_secret_name}
      type: s3

@hors
Copy link
Collaborator

hors commented May 5, 2025

@wonko as you can see, you have backup_bucket name in endpointUrl. Please try to remove i:

  storages:
    aws-bucket-backups:
      s3:
        bucket: ${backup_bucket}/backups
        region: ${backup_region}
        credentialsSecret: ${backup_credentials_secret_name}
        endpointUrl: https://${backup_bucket}.s3.amazonaws.com

From 1.17.0, we started to use AWS CLI instead of Minio CLI in our backup images. I try to google this error
xtrabackup An error occurred (NoSuchKey) when calling the ListObjectsV2 operation: The specified key does not exist.
and bucket name in endpointUrl can cause this problem.

@wonko
Copy link
Author

wonko commented May 6, 2025

This results in a different error, indicating that I should be using the correct endpoint. I double checked, the bucket is in the eu-west-1 region.

xb-endpoint-test-backup-xvqc9 xtrabackup + LIB_PATH=/usr/lib/pxc
xb-endpoint-test-backup-xvqc9 xtrabackup + . /usr/lib/pxc/backup.sh
xb-endpoint-test-backup-xvqc9 xtrabackup ++ set -o errexit
xb-endpoint-test-backup-xvqc9 xtrabackup ++ LIB_PATH=/usr/lib/pxc
xb-endpoint-test-backup-xvqc9 xtrabackup ++ . /usr/lib/pxc/aws.sh
xb-endpoint-test-backup-xvqc9 backup-init ++ id -u
xb-endpoint-test-backup-xvqc9 xtrabackup +++ set -o errexit
xb-endpoint-test-backup-xvqc9 xtrabackup +++ export AWS_SHARED_CREDENTIALS_FILE=/tmp/aws-credfile
xb-endpoint-test-backup-xvqc9 xtrabackup +++ AWS_SHARED_CREDENTIALS_FILE=/tmp/aws-credfile
xb-endpoint-test-backup-xvqc9 xtrabackup +++ export AWS_ENDPOINT_URL=https://s3.amazonaws.com
xb-endpoint-test-backup-xvqc9 xtrabackup +++ AWS_ENDPOINT_URL=https://s3.amazonaws.com
xb-endpoint-test-backup-xvqc9 backup-init ++ id -g
xb-endpoint-test-backup-xvqc9 xtrabackup +++ export AWS_REGION=eu-west-1
xb-endpoint-test-backup-xvqc9 backup-init + install -o 2 -g 2 -m 0755 -D /peer-list /opt/percona/peer-list
xb-endpoint-test-backup-xvqc9 xtrabackup +++ AWS_REGION=eu-west-1
xb-endpoint-test-backup-xvqc9 xtrabackup +++ '[' -n true ']'
xb-endpoint-test-backup-xvqc9 xtrabackup +++ [[ true == \f\a\l\s\e ]]
xb-endpoint-test-backup-xvqc9 xtrabackup ++ SST_INFO_NAME=sst_info
xb-endpoint-test-backup-xvqc9 xtrabackup ++ XBCLOUD_ARGS='--curl-retriable-errors=7 '
xb-endpoint-test-backup-xvqc9 xtrabackup ++ INSECURE_ARG=
xb-endpoint-test-backup-xvqc9 xtrabackup ++ '[' -n true ']'
xb-endpoint-test-backup-xvqc9 xtrabackup ++ [[ true == \f\a\l\s\e ]]
xb-endpoint-test-backup-xvqc9 xtrabackup ++ S3_BUCKET_PATH=backups/REDACTED-pxc-db-2025-05-06-06:41:19-full
xb-endpoint-test-backup-xvqc9 xtrabackup +++ date +%F-%H-%M
xb-endpoint-test-backup-xvqc9 xtrabackup ++ BACKUP_PATH=REDACTED-pxc-db-pxc-2025-05-06-06-41-xtrabackup.stream
xb-endpoint-test-backup-xvqc9 xtrabackup + GARBD_OPTS=
xb-endpoint-test-backup-xvqc9 xtrabackup + check_ssl
xb-endpoint-test-backup-xvqc9 xtrabackup + CA=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
xb-endpoint-test-backup-xvqc9 xtrabackup + '[' -f /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt ']'
xb-endpoint-test-backup-xvqc9 xtrabackup + SSL_DIR=/etc/mysql/ssl
xb-endpoint-test-backup-xvqc9 xtrabackup + '[' -f /etc/mysql/ssl/ca.crt ']'
xb-endpoint-test-backup-xvqc9 xtrabackup + CA=/etc/mysql/ssl/ca.crt
xb-endpoint-test-backup-xvqc9 xtrabackup + SSL_INTERNAL_DIR=/etc/mysql/ssl-internal
xb-endpoint-test-backup-xvqc9 xtrabackup + '[' -f /etc/mysql/ssl-internal/ca.crt ']'
xb-endpoint-test-backup-xvqc9 xtrabackup + CA=/etc/mysql/ssl-internal/ca.crt
xb-endpoint-test-backup-xvqc9 xtrabackup + KEY=/etc/mysql/ssl/tls.key
xb-endpoint-test-backup-xvqc9 xtrabackup + CERT=/etc/mysql/ssl/tls.crt
xb-endpoint-test-backup-xvqc9 xtrabackup + '[' -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
xb-endpoint-test-backup-xvqc9 xtrabackup + KEY=/etc/mysql/ssl-internal/tls.key
xb-endpoint-test-backup-xvqc9 xtrabackup + CERT=/etc/mysql/ssl-internal/tls.crt
xb-endpoint-test-backup-xvqc9 xtrabackup + '[' -f /etc/mysql/ssl-internal/ca.crt -a -f /etc/mysql/ssl-internal/tls.key -a -f /etc/mysql/ssl-internal/tls.crt ']'
xb-endpoint-test-backup-xvqc9 xtrabackup + GARBD_OPTS='socket.ssl_ca=/etc/mysql/ssl-internal/ca.crt;socket.ssl_cert=/etc/mysql/ssl-internal/tls.crt;socket.ssl_key=/etc/mysql/ssl-internal/tls.key;socket.ssl_cipher=;pc.weight=0;'
xb-endpoint-test-backup-xvqc9 xtrabackup + '[' -n REDACTED ']'
xb-endpoint-test-backup-xvqc9 xtrabackup + clean_backup_s3
xb-endpoint-test-backup-xvqc9 xtrabackup + s3_add_bucket_dest
xb-endpoint-test-backup-xvqc9 xtrabackup + local time=15
xb-endpoint-test-backup-xvqc9 xtrabackup + local is_deleted_full=0
xb-endpoint-test-backup-xvqc9 xtrabackup + local is_deleted_info=0
xb-endpoint-test-backup-xvqc9 xtrabackup + local exit_code=0
xb-endpoint-test-backup-xvqc9 xtrabackup + for i in {1..5}
xb-endpoint-test-backup-xvqc9 xtrabackup + ((  i > 1  ))
xb-endpoint-test-backup-xvqc9 xtrabackup + is_object_exist REDACTED backups/REDACTED-pxc-db-2025-05-06-06:41:19-full/
xb-endpoint-test-backup-xvqc9 xtrabackup + local bucket=REDACTED
xb-endpoint-test-backup-xvqc9 xtrabackup + local path=backups/REDACTED-pxc-db-2025-05-06-06:41:19-full/
xb-endpoint-test-backup-xvqc9 xtrabackup ++ aws s3 ls s3://REDACTED/backups/REDACTED-pxc-db-2025-05-06-06:41:19-full/ --summarize --recursive
xb-endpoint-test-backup-xvqc9 xtrabackup 
xb-endpoint-test-backup-xvqc9 xtrabackup An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint: REDACTED.s3.amazonaws.com
xb-endpoint-test-backup-xvqc9 xtrabackup You can fix this issue by explicitly providing the correct region location using the --region argument, the AWS_DEFAULT_REGION environment variable, or the region variable in the AWS CLI configuration file.  You can get the bucket's location by running "aws s3api get-bucket-location --bucket BUCKET".
xb-endpoint-test-backup-xvqc9 xtrabackup + res=
xb-endpoint-test-backup-xvqc9 xtrabackup + echo ''
xb-endpoint-test-backup-xvqc9 xtrabackup + grep -q 'Total Objects: 0'
xb-endpoint-test-backup-xvqc9 xtrabackup + return 0
xb-endpoint-test-backup-xvqc9 xtrabackup + log INFO 'Delete (attempt 1)...'
xb-endpoint-test-backup-xvqc9 xtrabackup 2025-05-06 06:41:30 [INFO] Delete (attempt 1)...
xb-endpoint-test-backup-xvqc9 xtrabackup + xbcloud delete --curl-retriable-errors=7 --storage=s3 --s3-bucket=REDACTED backups/REDACTED-pxc-db-2025-05-06-06:41:19-full
xb-endpoint-test-backup-xvqc9 xtrabackup 250506 06:41:30 xbcloud: Successfully connected.
xb-endpoint-test-backup-xvqc9 xtrabackup 250506 06:41:30 xbcloud: error: backup named backups/REDACTED-pxc-db-2025-05-06-06:41:19-full doesn't exists!

@wonko
Copy link
Author

wonko commented May 6, 2025

After some debugging, I must conclude that I've hit two bugs:

  • the error the AWS-CLI throws when not setting the endpoint includes the text "Please send all future requests to this endpoint: REDACTED.s3.amazonaws.com", but this is wrong. This error should have the correct, region-aware endpoint. The given address however resolved correctly to a set of AWS ip addresses, so I will be investigating how this works in the AWS CLI and if there is already a bug report on that in the aws-cli repo.
  • For this issue: if both the region and the endpoint are set through variables (which is done in the backup code. However, I can't find the backup.sh, aws.sh, and PXC_LIB/backup.sh in github, so I can't link to it) then the aws cli will use the endpoint anyhow, and ignore to code to look up the region-specific endpoint. The fix would be to not set the endpoint unless it is provided by the user (in case of private endpoints, outposts, or non-aws setups).

So, I guess this will be fixed when aws.sh is changed:

#!/bin/bash

set -o errexit

export AWS_SHARED_CREDENTIALS_FILE='/tmp/aws-credfile'
export AWS_ENDPOINT_URL="${ENDPOINT:-https://s3.amazonaws.com}"
export AWS_REGION="${DEFAULT_REGION:-us-west-2}"

...

The line setting the AWS_ENDPOINT_URL should be conditional, and only be executed when it is provided. If it's not provided, don't set it, don't export any value there, let the aws-cli figure out what the endpoint is.

Proof:

bash-5.1$ set | grep AWS
AWS_ENDPOINT_URL=https://s3.amazonaws.com
AWS_REGION=eu-west-1
AWS_SHARED_CREDENTIALS_FILE=/tmp/aws-credfile
_=AWS_ENDPOINT_URL=https://s3.amazonaws.com
bash-5.1$ aws s3 ls s3://[REDACTED]/backups/[REDACTED]-pxc-db-2025-05-06-06:41:19-full/ --summarize --recursive

An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint:[REDACTED].s3.amazonaws.com
You can fix this issue by explicitly providing the correct region location using the --region argument, the AWS_DEFAULT_REGION environment variable, or the region variable in the AWS CLI configuration file.  You can get the bucket's location by running "aws s3api get-bucket-location --bucket BUCKET".
bash-5.1$ unset AWS_ENDPOINT_URL
bash-5.1$ aws s3 ls s3://[REDACTED]/backups/[REDACTED]-pxc-db-2025-05-06-06:41:19-full/ --summarize --recursive

Total Objects: 0
   Total Size: 0

And I guess you tested in the us-east-1 region, which is a bit special in AWS.

@wonko
Copy link
Author

wonko commented May 6, 2025

(fyi, aws/aws-cli#9479 has the report for aws-cli.)

@hors
Copy link
Collaborator

hors commented May 6, 2025

@wonko thanks for helping with the debug. The code of aws.sh is located in a different repo https://github.com/percona/percona-docker/blob/main/percona-xtradb-cluster-8.4-backup/lib/pxc/aws.sh#L6-L7, but we will move it under the 'percona-xtradb-cluster-operator' repo in the next release.

As I can see, the default ENDPOINT was the same for MinIO CLI, but the behavior was different.

We can fix the problem in two different ways:

  1. Don't set ENDPOINT at all (if it was not provided by the user)
  2. Set AWS_ENDPOINT_URL using AWS_REGION (if it was not provided by user) like:
export AWS_REGION="${DEFAULT_REGION:-us-west-2}"
export AWS_ENDPOINT_URL="${ENDPOINT:-https://s3.${AWS_REGION}.amazonaws.com}"

I think it's better to use the second way to follow AWS CLI logic. As you can see, AWS CLI uses AWS_REGION to set the default AWS_ENDPOINT_URL.

❯ AWS_ENDPOINT_URL='' AWS_REGION='' aws s3 ls s3://TEST/ --summarize --recursive
Invalid endpoint: https://s3..amazonaws.com

@wonko
Copy link
Author

wonko commented May 6, 2025

I'd suggest to leave the endpoint-construction op to the AWS CLI code, as they own the logic to it. Replicating that logic might lead to other things not working ... No endpoint set by the user, no endpoint set in the script feels most logical to me.

But that's only my opinion, feel free to ignore it ;-)

@kevinrudde
Copy link

We are also encountering this issue, is there some workaround for the time being, or is it safe to downgrade the operator? 🤔

@wonko
Copy link
Author

wonko commented May 8, 2025

We are also encountering this issue, is there some workaround for the time being, or is it safe to downgrade the operator? 🤔

I currently solved it by setting the endpointUrl to the correct one for my region:

  storages:
    aws-bucket-backups:
      s3:
        bucket: ${backup_bucket}/backups
        region: ${backup_region}
        credentialsSecret: ${backup_credentials_secret_name}
        endpointUrl: https://s3.${backup_region}.amazonaws.com
      type: s3

I'll remove that line again when this is fixed in the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants