Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RM-1442 PXC-8.4.0 #1974

Open
wants to merge 1,175 commits into
base: trunk
Choose a base branch
from
Open

RM-1442 PXC-8.4.0 #1974

wants to merge 1,175 commits into from

Conversation

adivinho
Copy link
Contributor

@adivinho adivinho commented Nov 7, 2024

No description provided.

Marcin Babij and others added 30 commits March 6, 2024 17:54
…ib::fatal

The `os_innodb_umask` is global variable, it can't be modified by a thread that wants to create a file with different UNIX access mode, as it will modify it for all threads (not mentioning UndefinedBehavior).
The `os_file_set_umask` is supposed to be called only once, at InnoDB initialization.
However, after `Bug #29472125 NEED OS_FILE_GET_UMASK()` it was made possible to modify it and `WL#12009 - Redo-log Archiving` used it to modify it.
Fix:
- `os_file_get_umask()` is removed
- `os_file_set_umask()` is modified so it can be called only once
- Unix-only `os_file_create_simple_no_error_handling_with_umask()` is added to allow specifying umask param.

Change-Id: I78f169cfb99704e031ea1ff758b970cd8c73240d
Post-push fix:
RW-splitting WL introduced 2 additional routing configuration options:
"wait_for_my_writes" and "wait_for_my_writes_timeout".
Both should be exposed in the metadata.

This patch adds those options to the exposed JSON.

Change-Id: Ie4fc0a899f00cd5f728f8585f6b0cb769162df8d
Post-push fix:
router_require_enforce config option has different default value for
classic protocol endpoint (true) and x protocol endpoint (false)

This patch changes the defaults exposed to match that and also removes
that option from "common" section.

Change-Id: I2a4ce33988f96eedde1b34d61e98e7e72c4296fd
…Y scenario

Symptom:
Some queries with `SELECT ... GROUP BY` can be a few times slower when executed on TempTable than ones executed on Memory temporary table engine.

Root cause:
The `AllocatorState::current_block` gets allocated and quickly deallocated in case a single `Row` instance is allocated on it and released in a loop.

Fix:
The `AllocatorState::current_block` is not released when it gets empty. We release it only when `AllocatorState` is deleted, that is when the `Table` gets deleted.

Additional fixes:
`AllocationScheme` gets a new `block_freed()` method to be able to be fully responsible for managing memory usage reporting to `MemoryMonitor`.
`AllocatorState` knows the `AllocationScheme` and can on its own report usage of memory to it.

Change-Id: I64ae2387dc23b3f8d027d4050972bf126aa5d004
After an ignored TLS error, reset the error condition in the
management handle to NO_ERROR.

Change-Id: Ia88b5a5ddeadac6683a446772a79c4caf8a3f806
This is the umbrella worklog for developing the new features for
bulk load component v2.  The features included are listed below:

WL#15607 Innodb: Monitor progress for Bulk Load
WL#15608 Innodb: Support Bulk Load from compressed file
WL#15609 Innodb: Enhance Bulk Load Syntax to support loading shell dump file

Developed By:
   Niksa Skeledzija <[email protected]>
   Annamalai Gurusami <[email protected]>

Change-Id: I156a9ceac9ae7598723b3a67d2552033fdf2650b
RoutingSplittingTest.insert_is_sticky was failing on machines which:

- resolve "localhost" to ::1 and then 127.0.0.1 and
- fail to connect to IPv6

Change
======

- updated the expected trace-pattern.

Change-Id: Id5b9da65275c8618dd5d81f773dafa515cc501b3
Change-Id: Ie7bcaec33fd59e3350e79a0f3f6012d62df5bc92
Related to the Super privilege, the comment message printed by the
server is:

"To use KILL thread, SET GLOBAL, CHANGE MASTER, etc."

This patch replaces the old command with CHANGE REPLICATION SOURCE.

Change-Id: Id5486bb82bf776eb1e605bea330798e5b7c8a709
MySQL Cluster no longer uses a customized MySQL server and there is no
need for re-running the non NDB server tests. Instead the NDB specific
testing is increased.

Change-Id: Ia9f9622116ecd057afcb5d8177f44cf51ef7eae5
…nit for all plugins

This patch unifies plugin's deinit function call to pass valid plugin pointer
instead of the nullptr for all types of plugin.

Change-Id: I482497bbaff28d5cd31d74d694056a4df6693152
assertion failure

- Executing EXPLAIN/DESCRIBE statements from MLE SP using prepared
  statement execution causes a crash.
- To address this issue temporarily, the solution is to disable the
  cursor for such statements.
- The proper fix should be implemented in Bug#36332426.
- Additionally, include extra test cases to cover scenarios involving
  these statements.

Change-Id: Ibda9995a60ff2a74548342f0fda8f1c768726726
PS-9117: Make innodb_interpreter_output sysvar readonly
              is not valid for CHARACTER SET

Condition pushdown to a view fails with a collation mismatch
if the view was created with a different charset than the
charset used when querying the view.
Problem is seen if the underlying fields in the view are
literals with COLLATE statements. The string literal with
the COLLATE statement is cloned when replacing expressions
in the condition that is being pushed down with the
expressions from the view. The string literal is currently
parsed with the connection charset which in this case is
different from the one that was used when the view was
created. Therefore the COLLATE statement fails.
Creation context for a view has the connection charset and
collation information which was used when the view
was created. This is currently used for parsing the view
when it is later queried. We use the same now when cloning
expressions from a view during condition pushdown.

Change-Id: Ib040b9a67ddedd5fb9bf5de6fafafb358226e9d9
Description
===========
MySQL Server source code, contains "net_ts" which may be used as alternative
for "libevent". The "net_ts" is maintained and Oracle homemade.

Fix
===
Refactored X Plugin code to use "net::io_context", instead "libevent".

Change-Id: I8f3945110feebf041c22b7abafa1b0a872661f4f
Add test for adding index on part of primary key using inplace alter
table.

Change-Id: Id2849c1cf429ca94317dbad89067b9d2c6e850fb
[ 79%] ndb.server_lifecycle                     w7  [ fail ]

mysqltest: At line 8: Query 'CREATE TABLE t1 (
a INT PRIMARY KEY,
b VARCHAR(32)
) engine=ndb' failed.
ERROR 1296 (HY000): Got error 4009 'Cluster Failure' from NDBCLUSTER

Skip test on valgrind since using low connect  wait time intentionally.

Change-Id: I4ef55b379f2518121843d4ccc78ac39f7f80def0
In preparation of Bug#36313793 ndb-log-apply-status fails to work with
no replicated changes in BI.

Handling the ndb_apply_status updates start with unpacking the whole
record just to have access to two variables for assertment. The record
is unpacked once again when handling the I/U/D.

This patch removes the double unpack of the ndb_apply_status table
event, extracting these two values directly. Adds some DBUG_PRINTS to
track the handling of ndb_apply_status updates.

Change-Id: I3f280232808bac27cfec13a6df8c710a48599b21
Part of Bug36313793 ndb-log-apply-status fails to work with no
replicated changes in BI.

Instead of using separate variables to track the handling of events
within the epoch, use an object that aggregates all these. Then, using
this epoch context, a function is added to assess if the epoch is
empty in light of the variables' value that reflect what happened in
the epoch.

Also gives the possibility to have it extended.

Change-Id: I85e8b25cc3890786f97ed0f615287b3ab7e63765
…nges [3/3]

Problem:

The NDB table, ndb_apply_status, tracks the replica applier state. If
replicated changes were applied but are not visible in the Binlog
Injector (log-replica-updates=OFF) - because NDB data nodes filtered
those - then the epoch, from the Binlog Injector point-of-view, is
deemed to be empty.  But, ndb_apply_status did have an update because
those changes were replicated in the cluster. This breaks the flow of
apply status information in a chain/circular replication deployment.

Analysis:

Running some chain/circular replication scenarios, it is possible to
observe, in the binlog injector handling of data events, that the
ndb_apply_status table updates are written into the binlog
transaction, thus not filtered (system tables do not have
skip-log-replica-updates data node filter).

If only ndb_apply_status updates are written, then it's considered an
empty epoch, as there's no actual table data to write to the
binlog. Therefore, the binlog transaction will be rolled back, in case
logging of empty epochs is off. But, due to the nature of the
`skip-log-replica-updates` filter in the NDB data nodes, some data
might have been replicated even if not received by the binlog thread.

Solution:

The usage of ndb-log-apply-status=ON with circular replication is a
complex case. Requires the understanding that even for a server that
has skip-log-replica-updates, ndb_apply_status updates are still
logged (and thus continue to propagate through the chain).

With the above, this patch does not enable the replica updates filter
in the NDB data nodes for the cases that ndb-log-apply-status or
ndb-log-orig are ON. If log-replica-updates is OFF (filter might be
enabled and in operation) then ndb-log-apply-status and ndb-log-orig
cannot be toggled.

Tests adapted to this new behavior.

Change-Id: I48944c536142b87d361e73eacf0f7a699616fa66
Symptom: In some circumstances an index cannot be created;
this is configuration-dependent

Root cause: Merge sort file buffer size can in some cases be
calculated to be the size of IO_BLOCK_SIZE. The logic for the
buffer is that subsequent records of data are added to the
buffer, but when adding a row would overflow the buffer, thw
contents are written to disk in multiples of IO_BLOCK_SIZE and
space is freed up. If the buffer is only IO_BLOCK_SIZE, it is
likely that at that point the existing contents' length is
less than IO_BLOCK_SIZE, in which case nothing gets written,
no space can be freed up, and the new record cannot be added.

Fix: Since maximum allowed key length is 3072 and
IO_BLOCK_SIZE is 4096 bytes, and given other factors
contributing to buffer length, like page size, which also
affects allowed key size, 2 * IO_BLOCK_SIZE is sufficient
minimum length to ensure there is always IO_BLOCK_SIZE
bytes in the buffer when the write happens. If this ever
changes, the merge will fail gracefully.

Change-Id: I7aec373cfed9e364751372cce3746eb7ad75b3b9
This worklog introduces dynamic offload of Queries to RAPID in following
ways:

When system variable rapid_use_dynamic_offload is 0/false , then we
fall back to normal cost threshold classifier, which also implies that
when use secondary engine is set to forced, eligible queries will go to
secondary engine, regardless of cost threshold or this classifier.

When rapid_use_dynamic_offload is 1/true, then we proceed with looking
for optimal execution engine for this queries, if secondary engine is
found more optimal, then query is offloaded, otherwise it is sent back
to mysql. This is handled in following scenarios:

1. Static Scenario: When there's no Change propagation or Queue on RAPID
side, this introduces decision tree which has > 85 % precision in
training which queries should be faster on mysql or which queries
should be faster on mysql, and accepts or rejects queries.  the decision
tree takes around 20-100 microseconds for fast queries, hence
minimal overhead, for bigger queries this introduces overhead of
upto maximum observed 700 microseconds, these end up with long execution
time, hence not a problem. For very fast queries, defined here by having
cost < 10 and of the form point select, dynamic offload is not applied,
since 100 % of these queries  (out of 16667 samples) are faster on
MySQL. Additionally, routing these "very fast queries" through dynamic
offload leads to performance regressions due to 3 phase optimisation.

2. Dynamic Scenario: When there's CP or queuing on RAPID, this worklog
 introduces dynamic feature normalization to factor into account
 extra catch up time RAPID needs, and factoring in that, attempts to
 verify if RAPID is still the best engine for execution. If queue is
 too long or CP is too long, this mechanism wants to progressively start
 shifting queries to mysql, moving gradually towards the heavier queries

The steps in this worklog with respect to query lifecycle in server with
secondary_engine = ON, are described below:

query
   |
Primary Tentatively optimisation -> mysql optimises for Innodb
   |
secondary_engine_pre_prepare_hook -> following Rapid function called:
   |  RapidCachePrimaryInfoAtPrimaryTentativelyStep
   |  If dynamic offload is enabled and query is not "very fast":
   |   This caches features from mysql plan in rapid_statement_context
   |   to be used for dynamic offload.
   |  If dynamic offload is disabled or the query is "very fast":
   |   This function invokes standary mysql cost threshold classifier,
   |   which decides if query needs further RAPID optimisation.
   |
   |
   |-> if returns False, then query proceeds to Innodb for execution
   |-> if returns true, step below is called
   |
 Secondary optimisation -> mysql optimises for RAPID
   |
prepare_secondary_engine -> following Rapid function is called:
   |   RapidPrepareEstimateQueryCosts
   |     In this function, Dynamic offload combines mysql plan features
   |      retrieved from rapid_statement_context
   |     and RAPID info such as rapid base table cardinality,
   |     dict encoding projection, varlen projection size, rapid queue
   |     size in to decide if query should be offloaded to RAPID.
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, step below is called
   |
optimize_secondary_engine -> following Rapid function is called
   |    RapidOptimize
   |     In this function, Dynamic offload retrieves info from
   |     rapid_statement_context and additionally looks at Change
   |     propagation lag to decide if query should be offloaded to rapid
   |
   |->if returns True, then query proceeds to Innodb for execution
   |->if returns False, then query goes to Rapid Execution.

Following new MYSQL ERR log messages are printed with this WL, when
dynamic offload is enabled, and query is not a "very fast query".

1. SelOffload allow decision 1 : as secondary not forced 1 and enable
 var value 1 and transactional enabled 1 and( big shape detected 0
  or small shape detected 1 ) inno: 10737418240 , rpd: 4294967296 ,
   no lh table: 1

   Message such as this shows if dynamic offload is used to classify
   this query or not. If not, why not, using each of the conditions.
   1 = pass, 0 = not pass.

2. myqid=65 Selective offload classifier #1#1#1
    f_mysql_total_ts_nrows <= 2105.5 : 0.173916, f_MySQLCost <=
    68.3899040222168 : 0.028218, f_count_all_base_tables = 0 ,
    f_count_ref_index_ts = 0 ,f_BaseTableSumNrows <= 278177.5 :
    0.173916 are_all_ts_index_ref = true outcome=0

   Line such as this serialises what leg of decision tree decided
   outcome of this query 0 -> back to mysql 1 -> keep on rapid.
   each leg is uniquely searchable via identifier such as #1#1#1 here.

This worklog additionally introduces python scripts to run queries on
mysql client with multiple queries and multiple dmls at once, in
various modes such as simulator mode and standard benchmark modes.

By Default this WL is enabled, but before release it will be disabled.
This is tracked via BUG#36343189 #no-close.

Perf mode unittests will be enabled on jenkins after this wl.
Further cleanup will be done via BUG#36368437 #no-close.

Bugs tackled via this WL: 	BUG#35738194, Enh#34132523, Bug#36343208

Unrelated bugs fixed: BUG#35987975

Old gerrit review : 25567 (abandoned due to 1000 update limit reached)

Change-Id: Ie5f9fdcd8b55a669d04b389d3aec5f6b33f0fe2e
…iles

Added the attributes

  CompanyName
  ProductName
  LegalCopyright
  LegalTrademarks

Change-Id: I79cb92b90aabc0ca1961559b0b62c36aa5c525ca
If one put the NDB data node filesystem on some distributed filesystems
one can experience that data node fails when it tries to remove a file
due to file is reported to not exist.  For a local filesystem that
should be impossible to happen, but for some distributed filesystems can
as part of internal failover handling retry a removal which may have
succeeded before the failover in which case the second removal may fail
since file no longer exists.

Data node was changed to allow file and directory removal to fail with
'file does not exist' error and treat that as successful removal.

Note, this apply only to files under the data node filesystem and backup
files. Removing other files will not have changed semantics. Nor will
any file removal by other NDB programs change semantics.

For testing purposed an extra file deletion call is issued roughly 1% of
the times a file deletion is requested, this is only done for debug
builds.

Change-Id: Ie8f5f587e9e675c2a0705d7e450be0e139b045a8
Symptom:
When creating an index on a table containing data, valgrind
occasionally reports reads of uninitialized memory from
ddl::Builder::bulk_add_row

Root cause:
When calculating alignment for the final write of
ddl::Key_sort_buffer::serialize, the IO write may be aligned
to write from region partly past the end of IO buffer.

Fix:
When such a condition is detected, a portion of the IO buffer
is written first to free up space in the buffer.

Change-Id: I607ab549712a077cafdc5e067dfd667db40ade4f
…to 8.0

In MySQL 8.3 and 8.4, the following C-APIs were removed:

- MYSQL_OPT_RECONNECT
- mysql_stmt_bind_param()
- mysql_shutdown()
- mysql_ssl_set()
- mysql_refresh()
- mysql_reload()
- mysql_list_fields()
- mysql_list_processes()
- mysql_kill()

This WL intends to restore them back.

Also, few of the above mentioned APIs used to have few deprecated
COM_XXX commands which were removed in MySQL 8.3.

In this implementation of the C-APIs, the deprecated commands
are not being used, instead, mysql_real_query is used.

Only for mysql_list_fields, COM_FIELD_LIST has been restored.

main.kill has few tests disable within. They will be addressed
via separate bug.

Change-Id: I00bf6eb4c9cf29f3189f18fceb5b67b80719b979
harness_assert() is defined in the harness_assert.h, but defined in
router/src/harness/src/logging/registry.cc again.

Change
======

- use the definition from harness_assert.h

Change-Id: If8e0b1facbfd178d3e44befef1a5901cb95a7292
EXPECT_NO_ERROR is defined in stdx_expected_no_error.h, but also
defined in various tests.

Change
======

- use the definition from stdx_expected_no_error.h

Change-Id: I6801440c1e92bb3daae170932131537fc0f93c6d
…er WL#16221

WL#16221 restored COM_FIELD_LIST in mysql-server-8.4 again, which now
returns a different, now unexpected error-code for a broken packet.

With COM_FIELD_LIST removed: "unknown command"
With COM_FIELD_LIST restored: "malform packet"

Change
======

- expect either of these error-codes in the test for COM_FIELD_LIST.

Change-Id: Ic8295525cdd0509c6cc8f99bc02263580b22fe87
In this commit, we are adding diagnostics to track time spent by the
alter table secondary engine DDL statements activities like acquiring or
upgrading MDL locks, writing to binary log or committing transactions.

The aim of these diagnostics is to help us debug issues where the
secondary engine DDLs take unusually large time to execute.

Change-Id: I55e0afb16f8e80625e8d1dc6129b1849f6984f08
(cherry picked from commit beebebd27bb09a2c62a4d74fe03f9c5ba146ca15)
kamil-holubicki and others added 28 commits September 4, 2024 16:15
https://perconadev.atlassian.net/browse/PXC-4435

log_encryp_xplugin test fixed.
1. keyring component is already installed
2. use node_2 for tests as it is easier to restart it

The original testcase used 1 node testing in fact.

Missing files added.
https://perconadev.atlassian.net/browse/PXC-4435

g++ 13 compilation error fixed

error: 'nonnull' argument 'this' compared to NULL
[-Werror=nonnull-compare]
https://perconadev.atlassian.net/browse/PXC-4435

MTR:
1. master/slave related fixes
2. moved inc files related fixes
3. disabled keyring_plugin (only enc_off for now)
https://perconadev.atlassian.net/browse/PXC-4436

Implemented PXC-specific adjustments to percona_telemetry component.

1. Default target path
2. gcache/writeset cache encryption info
3. PXC cluster replication info

(cherry picked from commit 8259b8e)
https://perconadev.atlassian.net/browse/PXC-4435

1. Addressed review comments
2. Galera repo pointer updated
https://perconadev.atlassian.net/browse/PXC-4435

Disable parallel execution of garbd backup tests (9999 port used)
(Fixed wrong merge)
…dict_stats_thread and cost model initialization

https://perconadev.atlassian.net/browse/PS-9384

Problem:
--------
Both debug and release version of server crash sporadically while running
different tests in Jenkins with stacktraces referencing to
Cost_model_server::init() being called from InnoDB's dict_stats_thread().

Analysis:
---------
Investigation has shown that there is a race condition between code handling
auto-updating of histograms from InnoDB background thread and the main thread
performing server start-up. The code responsible for updating histogram, which
was introduced by Upstream in 8.4.0, initializes LEX structure to perform its
duties and tries to use global Optimizer cost model object as part of this.
OTOH the main thread performing server start-up concurrently initializes and
destroys this global object several times after this background thread has
been started and sets it to the final working state much later in the process
of start-up, before we start accepting user queries. Not surprisingly
concurrent usage of this global object and its init/deinit cause crashes.

In theory, the problem exists in Upstream but probably is normally invisible
there, as to trigger it, some updates to tables are needed, so persistent
stats recalculation and histogram update are requested. And in the Upstream
this probably can normally happen only after user requests start being
processed (by which time global cost model object has proper stable state).

While in Percona Server, we have telemetry component enabled by default, and
code which on first start up of server updates mysql.component table, which
triggers stats/histogram update request. As result this race becomes visible.
OTOH this specific scenario should only affect the first start of the server
for installation, and not later restarts. But if there are other components
which update tables during initialization/start up time the issue might
become more prominent.

Solution:
---------
Delay processing of requests to update stats/histograms in background
thread until server is fully operational (and thus global optimizer
cost model is fully initialized and stable).
PS-9384: Sporadic crashes in Jenkins on start up due to race betweet …
https://perconadev.atlassian.net/browse/PXC-4499

Problem:
PXC 8.4 will be released for Ubuntu Noble. SST script allows PXC SST to
be done for the following combinations:
1. Previous LTS -> this version (8.0.x -> 8.4)
2. Previous version -> this version (8.3 -> 8.4)
4. This version -> this version (8.4 -> 8.4)

Support of path 2 would need PXB 8.3 to be released for Ubuntu Noble,
but it was not released.

On the other hand, all innovative releases are internal, so public
release needs to support only paths 1 and 3

Solution:
Remove check for prev PXB version from SST script. This way only
previous LTS and 'this version' will be a hard dependency.
PXC-4499: Remove checks for prev pxb from SST script
PXC-4499: Remove checks for prev pxb from SST script
PXC-4551 Dependency issues while installing PXC in RHEL-8 ARM
@it-percona-cla
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ adivinho
✅ surbhat1595
✅ kamil-holubicki
❌ dlenev
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.