Skip to content

scheduler: fix the recovery time of slow store #9388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 12, 2025

Conversation

rleungx
Copy link
Member

@rleungx rleungx commented Jun 9, 2025

What problem does this PR solve?

Issue Number: Close #9384

What is changed and how does it work?

Check List

Tests

  • Unit test
Screenshot 2025-06-10 at 16 20 34 Screenshot 2025-06-10 at 16 14 28

Previously, it would balance the leader immediately if the slow score recovers. Now it will wait for the recovery time before rebalancing.

Release note

None.

Copy link
Contributor

ti-chi-bot bot commented Jun 9, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 9, 2025
Copy link

codecov bot commented Jun 9, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.15%. Comparing base (29ead01) to head (3bde058).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9388      +/-   ##
==========================================
+ Coverage   76.10%   76.15%   +0.05%     
==========================================
  Files         478      478              
  Lines       74707    74682      -25     
==========================================
+ Hits        56853    56875      +22     
+ Misses      14316    14280      -36     
+ Partials     3538     3527      -11     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rleungx
Copy link
Member Author

rleungx commented Jun 9, 2025

/cc @LykxSassinator

Copy link
Contributor

ti-chi-bot bot commented Jun 9, 2025

@rleungx: GitHub didn't allow me to request PR reviews from the following users: LykxSassinator.

Note that only tikv members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @LykxSassinator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rleungx rleungx marked this pull request as ready for review June 10, 2025 08:35
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 10, 2025
@rleungx rleungx requested review from JmPotato and okJiang June 10, 2025 08:42
Copy link
Contributor

ti-chi-bot bot commented Jun 10, 2025

@LykxSassinator: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved and removed do-not-merge/needs-triage-completed labels Jun 11, 2025
Copy link
Member

@okJiang okJiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

@@ -299,10 +315,23 @@ func (s *evictSlowStoreScheduler) Schedule(cluster sche.SchedulerCluster, _ bool
// slow node next time.
log.Info("slow store has been removed",
zap.Uint64("store-id", store.GetID()))
} else if store.GetSlowScore() <= slowStoreRecoverThreshold && s.conf.readyForRecovery() {
} else if store.GetSlowScore() <= slowStoreRecoverThreshold {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better?

Suggested change
} else if store.GetSlowScore() <= slowStoreRecoverThreshold {
} else {
s.conf.tryUpdateRecoverStatus(true)
if store.GetSlowScore() <= slowStoreRecoverThreshold {
...
} else {
...
}
}

rleungx added 2 commits June 11, 2025 17:07
Signed-off-by: Ryan Leung <[email protected]>
Signed-off-by: Ryan Leung <[email protected]>
@rleungx rleungx requested a review from okJiang June 11, 2025 09:59
@ti-chi-bot ti-chi-bot bot added the lgtm label Jun 12, 2025
Copy link
Contributor

ti-chi-bot bot commented Jun 12, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LykxSassinator, okJiang, overvenus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 12, 2025
Copy link
Contributor

ti-chi-bot bot commented Jun 12, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-06-11 05:08:26.625136137 +0000 UTC m=+418084.853451398: ☑️ agreed by overvenus.
  • 2025-06-12 08:41:38.236947147 +0000 UTC m=+517276.465262396: ☑️ agreed by okJiang.

@ti-chi-bot ti-chi-bot bot merged commit 7bdf489 into tikv:master Jun 12, 2025
30 of 34 checks passed
@rleungx rleungx deleted the fix-recovery branch June 12, 2025 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recovery time is not expected in slow store scheduler
4 participants