Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-19574 innodb_stats_method is not honored when innodb_stats_persistent=ON #3886

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Thirunarayanan
Copy link
Member

  • The Jira issue number for this PR is: MDEV-19574

Description

Problem:

InnoDB persistent statistics doesn't take innodb_stats_method
variable while calculating n_diff_pfx for the n-prefix index columns. InnoDB persistent statistics doesn't calculate number of non-null key values for n-prefix index columns.

Solution:

To address the above issues, InnoDB consider all nulls as different value when innodb_stats_method is set to NULLS_UNEQUAL and NULLS_IGNORED. It also adds the n_nonnull_pfx01, n_nonull_pfx02 etc stats description to indicate how many non-nulls exist for n-prefix index

Release Notes

innodb_stats_method is honoured when innodb_stats_persistent=1

How can this PR be tested?

./mtr innodb.stats_method

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@Thirunarayanan Thirunarayanan requested a review from dr-m March 10, 2025 10:23
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

…stent=ON

Problem:
=======
 InnoDB persistent statistics doesn't take innodb_stats_method
variable while calculating n_diff_pfx for the n-prefix index
columns. InnoDB persistent statistics doesn't calculate number
of non-null key values for n-prefix index columns.

Solution:
=========
To address the above issues, InnoDB consider all nulls as different
value when innodb_stats_method is set to NULLS_UNEQUAL and
NULLS_IGNORED. It also adds the n_nonnull_pfx01, n_nonull_pfx02 etc
stats description to indicate how many non-nulls exist for n-prefix
index
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is implementing my rough ideas from MDEV-19574 that I expressed back in 2022. Since you are targeting a non-GA release with this, I think that we should consider some larger format changes.

Specifically, I’d like to understand what it would take to make innodb_stats_method control only the way how statistics are being used, and collecting statistics that can serve all variants. What would be the minimum amount of statistics to store for accommodating this, and how would these aggregate statistics be combined for each value of innodb_stats_method?

Comment on lines 106 to 117
vidxcd n_diff_pfx01 d
vidxcd n_diff_pfx02 d,DB_ROW_ID
vidxcd n_leaf_pages Number of leaf pages in the index
vidxcd n_nonnull_pfx01 d
vidxcd n_nonnull_pfx02 d,DB_ROW_ID
vidxcd size Number of pages in the index
vidxe n_diff_pfx01 e
vidxe n_diff_pfx02 e,DB_ROW_ID
vidxe n_leaf_pages Number of leaf pages in the index
vidxe n_nonnull_pfx01 e
vidxe n_nonnull_pfx02 e,DB_ROW_ID
vidxe size Number of pages in the index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name nonnull is rather misleading here. In MariaDB, virtual columns cannot ever be NOT NULL; in MariaDB they can.

Can we have a more descriptive name, such as replacing the nonnull with diff_null if these statistics are covering different prefixes for NULLS_UNEQUAL? In the commit message it is unclear which values were stored by the n_diff_pfx statistics up until now, and how that would be changing here.

Comment on lines -963 to +964
if (n_not_null != NULL) {
btr_record_not_null_field_in_rec(
n_cols, offsets_rec, n_not_null);
}
btr_record_not_null_field_in_rec(
n_cols, offsets_rec, n_not_null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be unnecessarily calling this function when all the columns in the index are declared NOT NULL. In that case, n_not_null[] should be identical to n_diff[], right?

Comment on lines +1 to +2
--- stats_method.result 2025-03-10 15:30:38.087625820 +0530
+++ stats_method.reject 2025-03-10 15:34:26.697129924 +0530
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not add any timestamps; 88d9348 just recently tried to get rid of them.

Comment on lines +7 to +8
-n_diff_pfx01 16384 f3
+n_diff_pfx01 1 f3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn’t it be better to make the statistics collection independent of the parameter, and only make innodb_stats_method affect the way how previously collected statistics are being used?

Comment on lines +10 to +20
n_diff_pfx01 16341 DB_ROW_ID
n_leaf_pages 37 Number of leaf pages in the index
n_nonnull_pfx01 0 DB_ROW_ID
size 97 Number of pages in the index
n_diff_pfx01 16384 f1
n_diff_pfx02 16384 f1,f3
n_diff_pfx03 16384 f1,f3,DB_ROW_ID
n_leaf_pages 1 Number of leaf pages in the index
n_nonnull_pfx01 0 f1
n_nonnull_pfx02 0 f1,f3
n_nonnull_pfx03 0 f1,f3,DB_ROW_ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because DB_ROW_ID as well as f1 are declared NOT NULL, it feels very strange that n_nonnull_pfx01 for those fields is different from n_diff_pfx01.

As far as I understand, it is necessary to store the n_nonnull_pfx02 and n_nonnull_pfx03, because the column f3 allows NULL values. I think that we should try not to store any redundant statistics, such as n_nonnull_pfx01 here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants