-
Notifications
You must be signed in to change notification settings - Fork 25
Fix reprotect time out bug #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- remove verbosity from unit tests. not needed. failures will be printed. - remove clean from unit-test target. user can execute `make clean unit-test` if needed. - add csm-common.mk to gitignore.
- reprotect will now verify the first sync job is running instead of waiting for the job to finish. - link state status derivation function updated to give SYNC_IN_PROGRESS higher priority to more accurately represent the state.
falfaroc
previously approved these changes
Jul 2, 2025
falfaroc
previously approved these changes
Jul 2, 2025
santhoshatdell
previously approved these changes
Jul 2, 2025
bharathsreekanth
previously approved these changes
Jul 2, 2025
4 tasks
- cover scenarios when LastJobState is either Running or Finished
c69899c
santhoshatdell
approved these changes
Jul 2, 2025
Merging this branch will not change overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
bharathsreekanth
approved these changes
Jul 3, 2025
falfaroc
approved these changes
Jul 3, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dependent on #415 and dell/gopowerscale#101
Description
Problem
During a reprotect, if the initial sync of a newly created SyncIQ policy takes longer than the default timeout of five minutes, the context of the reprotect action would time out and set the Replication Group status to Error while the reprotect action was re-queued.
Subsequently, the retry of the reprotect would then fail indefinitely, because the first action of the reprotect flow is to confirm the "Local Target" policy is write-enabled on the storage system that is about to become the new source, and that policy no longer exists because it was deleted by the first reprotect attempt.
Solution
This PR updates the reprotect flow to only confirm the initial sync job has begun, instead of waiting for the full sync to complete.
The function for deriving the Link State has been updated to give a higher priority to the
SYNC_IN_PROGRESS
state since that state is based on the existence of active sync jobs for the named SyncIQ policy, and further states are derived from the presence or lack of a SyncIQ policy and their associated states.Now, after the sync job is confirmed to be running, the Link State will be updated to reflect the
SYNC_IN_PROGRESS
state for the initial sync job.GitHub Issues
List the GitHub issues impacted by this PR:
Checklist:
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Please also list any relevant details for your test configuration
Replication e2e tests:
Kubernetes e2e - External Storage