AdressRefresh: Fixes behavior of AddressRefresh calls so they preform cross regional retries #5017
+66
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Template
Description
Address Refresh Cross Regional Retries
Following the 12/26 outage of SCUS, during the analyis of the impact of several customers there was a gap in the retry logic of the SDK, particuallry with the logic of cross regional retries for address refresh calls. This document outlines the impact of the gap and the proposed solution.
Impact
In the case of a regional outage, currently if the SDK attempts Address Refresh calls and the primary region is down, the SDK will not attempt to retry the call in the secondary region. This is due to the fact that when the address refresh times out, the SDK will thow a task cancelled exception. Currently the SDK does not have logic to catch this exception and will treat this as a timeout. With other timeouts, the SDK will wrap the exception in a 503 and upon reaching the RetryLayer, the
ClientRetryPolicy
will attempt to retry the call in the secondary region if available.Proposed Solution
The proposed solution would be to catch the OperationCanceledException as well as any 410s (Timeouts) and wrap them in a 503. This will allow the
ClientRetryPolicy
to attempt to retry the call in the secondary region if available.Impact of the Solution
There might be an impact with the use of this with the compute gateway. Further investigation will be needed to determine the impact of this change. A possible way to mitigate the impact would be to have a flag that would allow the user to enable this feature. This flag would be internal and not accessible to external customers.
Testing
The testing will be done with the FaultInjeciton Library, which will need to have metadata request support added before this fix can be tested. See #4795 for more information.
Type of change
Please delete options that are not relevant.
Closing issues
To automatically close an issue: closes #4979