-
Notifications
You must be signed in to change notification settings - Fork 25.4k
[ILM]: Fix TSDS unfollow timing with WaitUntilTimeSeriesEndTimePassesStep #128361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
💚 CLA has been signed |
Hi, @gmarouli |
Hi @happysubin , thank you for your contribution, @samxbr and I will review it as soon as possible. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@happysubin Thank you for contributing to Elasticsearch! We value external contributions and would love to work with you to get this PR merged. I have left some comments on the PR, please feel free to take your time to address them.
WaitForFollowShardTasksStep step2 = new WaitForFollowShardTasksStep( | ||
waitForFollowShardTasks, | ||
waitUntilTimeSeriesEndTimePassesStep, | ||
client | ||
); | ||
WaitUntilTimeSeriesEndTimePassesStep step3 = new WaitUntilTimeSeriesEndTimePassesStep( | ||
waitUntilTimeSeriesEndTimePassesStep, | ||
pauseFollowerIndex, | ||
Instant::now | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WaitUntilTimeSeriesEndTimePassesStep
should be prior to WaitForFollowShardTasksStep
, because the follower index can sync with the leader index one last time after the end_time
has passed, to make sure there's no new docs coming in to the leader index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add new integration test in the CCRIndexLifecycleIT test suite to cover this scenario. You can refer to this test as an example for verifying the WaitUntilTimeSeriesEndTimePassesStep
. Essentially we would want to verify that after leader index rollovers, the follower index goes into the WaitUntilTimeSeriesEndTimePassesStep
, and new documents to the leader index are synced to the follower index until end_time
has passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the code review.
I'll review the existing test cases and work on adding new ones.I'm going to add test code!
buildkite test this |
@samxbr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted some minor comments, but the change generally looks good to me. Nice!
@@ -533,6 +571,94 @@ public void testILMUnfollowFailsToRemoveRetentionLeases() throws Exception { | |||
} | |||
} | |||
|
|||
@SuppressWarnings({ "checkstyle:LineLength", "unchecked" }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the line length warning supress, you can run ./gradlew spotlessApply
to auto-format.
Also please take a look at the contribution guide for more details and other tips: https://github.com/elastic/elasticsearch/blob/main/CONTRIBUTING.md#java-language-formatting-guidelines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your code review. I've reflected your suggestions!
Request countRequest = new Request("GET", "/" + indexName + "/_count"); | ||
Response response = client.performRequest(countRequest); | ||
Map<String, Object> result = entityAsMap(response); | ||
System.out.println("result = " + result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the println
buildkite test this |
@happysubin that is a very good finding, good job! I did not know we have moved such configuration there. But this means that this is not a sufficient fix, we need to think why is the timing off on CI. |
@gmarouli It might be similar to just increasing the timeout, but how about adding the following code instead?
Manually setting the index like below doesn’t seem to align with the purpose of this integration test.
I sincerely appreciate your kind and detailed feedback on this issue ! 👍 |
Hey @happysubin , you are right we could set it ourselves but this is taking it a step too far when it comes to manipulating state for a test. But good news, I think I found the culprit. There are a few issues in this test and they create some flakiness:
My theory is that the rollover on the follower cluster was catching the problem locally. Shall we try fixing these 2 issues and put back the timeout to what it was? Again great work being thorough! |
@gmarouli |
buildkite test this |
It's been my pleasure, let's see if we got it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! @happysubin great work, thank you! @samxbr any other comments before we merge?
This is awesome, thanks @gmarouli for the thorough review! |
// rollover | ||
Request rolloverRequest = new Request("POST", "/" + dataStream + "/_rollover"); | ||
rolloverRequest.setJsonEntity(""" | ||
{ | ||
"conditions": { | ||
"max_docs": "1" | ||
} | ||
}"""); | ||
leaderClient.performRequest(rolloverRequest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment to explain why we need to wait for ILM to rollover instead of manual rollover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course! I’ll add comments explaining why we did it this way!
putILMPolicy(policyName, null, 1, null); | ||
putUnfollowOnlyPolicy(client(), policyName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here, could you add a comment to explain why we only want Unfollow action?
@samxbr |
...ugin/ilm/qa/multi-cluster/src/test/java/org/elasticsearch/xpack/ilm/CCRIndexLifecycleIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think we are good to merge after incorporating Mary's comment change and CI has passed. This is a great change, thank you very much for your contribution and being so promptly addressing the comments!
Co-authored-by: Mary Gouseti <[email protected]>
buildkite test this |
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…Step (elastic#128361) The backing indices of a time series data streams (TSDS) have time ranges (start_time & end_time) and they include documents that belong to these time ranges. To ensure that we will not unfollow a leader TSDS index before the indexing is complete, we need to add a WaitUntilTimeSeriesEndTimePassesStep to the unfollow action. This will ensure that we will only unfollow after the end_time has passed. This creates some weird semantics with the combination of the rollover and the unfollow. Because we need the rollover of the leader index to finalise the end_time but the unfollow action is injected before the rollover. However, this should be fine, because the leader index will skip the unfollow action so it will rollover and finalise the end_time and the follower index will wait the end_time to pass before it unfollows. Rolling over the follower index will have no effect since it’s already rolled over. (cherry picked from commit ed7f2ca) # Conflicts: # x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/UnfollowAction.java
…ePassesStep (#128361) (#129518) * [ILM]: Fix TSDS unfollow timing with WaitUntilTimeSeriesEndTimePassesStep (#128361) The backing indices of a time series data streams (TSDS) have time ranges (start_time & end_time) and they include documents that belong to these time ranges. To ensure that we will not unfollow a leader TSDS index before the indexing is complete, we need to add a WaitUntilTimeSeriesEndTimePassesStep to the unfollow action. This will ensure that we will only unfollow after the end_time has passed. This creates some weird semantics with the combination of the rollover and the unfollow. Because we need the rollover of the leader index to finalise the end_time but the unfollow action is injected before the rollover. However, this should be fine, because the leader index will skip the unfollow action so it will rollover and finalise the end_time and the follower index will wait the end_time to pass before it unfollows. Rolling over the follower index will have no effect since it’s already rolled over. (cherry picked from commit ed7f2ca) # Conflicts: # x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/UnfollowAction.java * Fix missing method * [CI] Auto commit changes from spotless --------- Co-authored-by: 안수빈 <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]>
fix: #128129
I added the WaitUntilTimeSeriesEndTimePassesStep between the WaitForFollowShardTasksStep and the PauseFollowerIndexStep in the step list of UnFollowAction, and updated the tests accordingly.
Please review my code!