-
Notifications
You must be signed in to change notification settings - Fork 6k
statistics: refresh stats concurrently #64034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
7dff835 to
7aaa6da
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #64034 +/- ##
================================================
+ Coverage 72.7425% 74.7345% +1.9920%
================================================
Files 1854 1885 +31
Lines 501719 524448 +22729
================================================
+ Hits 364963 391944 +26981
+ Misses 114581 108648 -5933
- Partials 22175 23856 +1681
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
2 similar comments
|
/retest |
|
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses concurrency safety issues in statistics initialization by removing non-thread-safe session management and implementing session pool-based initialization. The changes enable safe concurrent execution of statistics refresh operations.
- Removed global
maxTidRecordstate and dedicatedinitStatsCtxsession field - Introduced session pool wrapper methods to ensure GC blocking during stats initialization
- Refactored stats loading methods to accept session context parameters instead of using a shared session
Reviewed Changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/statistics/handle/handle.go | Removed initStatsCtx field from Handle struct |
| pkg/statistics/handle/bootstrap.go | Refactored to pass session context as parameter; removed global maxTidRecord tracking |
| pkg/session/syssession/pool.go | Added WithForceBlockGCSession method with GC blocking logic |
| pkg/domain/domain.go | Updated CreateStatsHandle and LoadAndUpdateStatsLoop signatures to remove initStatsCtx parameter |
| pkg/session/session.go | Removed dedicated initStatsCtx creation in bootstrapSessionImpl |
| pkg/statistics/handle/handletest/initstats/init_stats_test.go | Added maxPhysicalTableID helper to replace GetMaxTidRecordForTest |
| pkg/executor/simple_test.go | Added concurrent REFRESH STATS test |
| pkg/server/stat_test.go | Added ForceBlockGCInTest failpoint enablement |
| pkg/domain/infosync/info.go | Renamed ContainsInternalSessionForTest to ContainsInternalSession |
| Multiple test files | Updated NewHandle/CreateStatsHandle calls to remove initStatsCtx argument |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
55f9aa3 to
4f29b18
Compare
2ac55ae to
67c75ea
Compare
67c75ea to
1b92e21
Compare
|
/retest |
0xPoe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| }() | ||
|
|
||
| // Make sure the internal session is registered to the session manager to block GC. | ||
| const retryInterval = 100 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @lcwangchao
Not sure if you’re interested in taking a look at this change. If you have any thoughts on it, please feel free to let me know.
I think the fact that StoreInternalSession is not guaranteed to succeed is a bad design. I’ll try to create an issue for it to explore if we can change it in the future.
|
/retest |
|
Is the following test failure issue in mysql-test related to this PR? |
|
/retest |
|
Did we write in the documentation that we do not recommend users use the concurrent refresh stats function? |
|
pkg/session/syssession/pool.go
Outdated
| // In most cases, the session manager is not set, so this step will be skipped. | ||
| // It is only enabled explicitly in tests through a failpoint. | ||
| forceBlockGCInTest := !intest.InTest | ||
| failpoint.Inject("ForceBlockGCInTest", func(val failpoint.Value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or this is better
if intest.Intest {
var forceBlockGCInTest bool
failpoint.Inject("ForceBlockGCInTest", func(val failpoint.Value) {
forceBlockGCInTest = val.(bool)
})
if !forceBlockGCInTest {
break
}
}
elsa0520
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mjonss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, only some minor questions and curiosity :)
| func (h *Handle) initStatsMeta(ctx context.Context, tableIDs ...int64) (statstypes.StatsCache, int64, error) { | ||
| func (h *Handle) initStatsMeta(ctx context.Context, sctx sessionctx.Context, tableIDs ...int64) (statstypes.StatsCache, int64, error) { | ||
| ctx = kv.WithInternalSourceType(ctx, kv.InternalTxnStats) | ||
| sql := genInitStatsMetaSQL(tableIDs...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to PR, but I am curious, what is the max number of tableIDs that can come in this call? Could it be an issue that the SQL string becomes too long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be too many. Because usually normal users would never use this command. And BR will only use REFRESH *.*.
| func (h *Handle) initStatsLiteWithSession(ctx context.Context, sctx sessionctx.Context, tableIDs ...int64) (err error) { | ||
| defer func() { | ||
| _, err1 := util.Exec(h.initStatsCtx, "commit") | ||
| _, err1 := util.Exec(sctx, "commit") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly related to this PR, but should it not do "rollback" in case of error?
And if only doing SELECTs, why not default to "rollback"? Or even use auto_commit mode, unless we really need snapshot consistency of the statistics, which I assume could be the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we always do it here https://github.com/0xPoe/tidb/blob/poe-patch-refresh-stats-concurrently/pkg/session/syssession/pool.go#L220. But you made a good point. We should consider rolling back here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll create an issue for it. I’d like to separate it from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: elsa0520, mjonss The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@0xPoe: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/assign gmhdbjd |
|
/assign Leavrth |
What problem does this PR solve?
Issue Number: close #61273
Problem Summary:
What changed and how does it work?
Although we shouldn’t encourage users to execute this command multiple times concurrently, we need to ensure that it’s safe to do so.
In this PR, I removed the non–thread-safe initialization and replaced it with a session pool–based initialization for statistics.
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.