Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pxc 4469 sst clone #1961

Draft
wants to merge 28 commits into
base: 8.0
Choose a base branch
from
Draft

Conversation

Tusamarco
Copy link

Adding preview POC for the Clone method for SST.
The script is based on the one from codership but then need to refactor almost in full to have it working with PXC.
birdeye flow

#              ┌──────────────────┐                      
#              │  Joiner starts   │                      
#              └────────┬─────────┘                      
#              ┌────────▼─────────┐                      
#              │open NetCat listnr│                      
#              └────────┬─────────┘                      
#              ┌────────▼─────────┐                      
#              │Send message to Dn│                      
#              └────────┬─────────┘                      
#              ┌────────▼─────────┐                      
#              │Donor decide IST  │                      
#              │or SST. Send msg  │                      
#              │through NC and    │                      
#              │waits for Joiner  │                      
#              │Clone instance    │                      
#              └────────┬─────────┘                      
#              ┌────────▼─────────┐                      
#              │Joiner get IST    │                      
#              │or SST.If IST     │                      
#              │Bypass all and    │                      
#              │Wait for IST.     │                      
#              │SST Start clone   │                      
#              │Instance and wait │                      
#              │for donor to clone│                      
#              └─────────┬────────┘                      
#              ┌─────────▼─────────────┐                 
#              │Clone process is       │                 
#              │reported.              │                 
#              │When done Dn waits     │                 
#              │Joiner close instance  │                 
#              │And performs 3 restarts│                 
#              └──────────┬────────────┘                 
#              ┌──────────▼──────────────┐               
#              │1) To fix dictionary     │               
#              │2) To recover position   │               
#              │3) Final cleanup         │               
#              └──────────┬──────────────┘               
#              ┌──────────▼──────────────┐               
#              │Send final ready signal  │               
#              │Waits for IST            │               
#              └─────────────────────────┘               

@it-percona-cla
Copy link

it-percona-cla commented Oct 7, 2024

CLA assistant check
All committers have signed the CLA.

@Tusamarco
Copy link
Author

I have a question...
the two file wsrep_clone and wsrep_common_clone are not copied over when I do make install where is that defined?
@kamil-holubicki @venkatesh-prasad-v

@@ -1,7 +1,7 @@
--echo Performing State Transfer on a server that has been killed and restarted
--echo while a DDL was in progress on it

--source include/have_debug.inc
#source include/have_debug.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will cause another tests to fail

[mysqld.2]
wsrep_provider_options='[email protected].#galera_port;gcache.size=1;pc.ignore_sb=true'

[sst]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MTR framework allows running tests in parallel. It allocates groups of ports dynamically for MTR workers.
We can't hardcode ports here. It will prevent MTR framework to run tests in parallel, like
./mtr galera_sst_clone{,,,,,} --parallel=2
Please check how 'mysqld.1.#sst_port' is handled for example

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where @kamil-holubicki do you have a link to file or something?

[sst]
netcat_port=4442
wsrep-debug=true
clone_instance_port=4444
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need clone_instance_port parameter at all. Please see following comments in sst script for justification/reasoning.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do.
This may go through firewall in real life, and the use of port and ranges can be regulated. So better to keep the flexibility to define ports from the beginning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a way to configure this port.
See my other comment about wsrep_sst_receive_address

WSREP_SST_OPT_PSWD="$2"
shift
;;
'--port')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter (and some other) is passed to SST script only in case of mysqldump_sst script. ([LINK](These parameters are passed only to the wsrep_sst_mysqldump state transfer script by both the sending and receiving nodes:)) I think we could remove them from clone-specific file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, this can be cleanup agreed.

scripts/wsrep_sst_clone.sh Show resolved Hide resolved



GRANT ALTER, CREATE, SELECT, INSERT ON PERCONA_SCHEMA.xtrabackup_history TO 'mysql.pxc.sst.role'@localhost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Idea it was there, not related to changes I have add.

@@ -1230,6 +1230,8 @@ int wsrep_remove_sst_user(bool initialize_thread) {
nullptr,
"SET SESSION lock_wait_timeout = 1;",
nullptr,
"REVOKE mysql.pxc.sst.role FROM 'mysql.pxc.sst.user'@'localhost';",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discuss this.
The correct way to remove the user when having a role, is to revoke the role and then drop the user.
Just dropping the user will work, but the role cleanup is delegated to MySQL, and as we have identified with mtr tests on very fast machines this looks like to be asynchronous.
So to avoid problems and to be on the safe side, we should operate following the right execution. Role first then drop user.

scripts/mysql_system_users.sql Show resolved Hide resolved
sql/wsrep_sst.cc Show resolved Hide resolved
@kamil-holubicki
Copy link
Contributor

As we spoke today, we need to test it with encrypted tables and keyring component/plugin enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants