Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdressRefresh: Fixes behavior of AddressRefresh calls so they preform cross regional retries #5017

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

NaluTripician
Copy link
Contributor

@NaluTripician NaluTripician commented Feb 13, 2025

Pull Request Template

Description

Address Refresh Cross Regional Retries

Following the 12/26 outage of SCUS, during the analyis of the impact of several customers there was a gap in the retry logic of the SDK, particuallry with the logic of cross regional retries for address refresh calls. This document outlines the impact of the gap and the proposed solution.

Impact

In the case of a regional outage, currently if the SDK attempts Address Refresh calls and the primary region is down, the SDK will not attempt to retry the call in the secondary region. This is due to the fact that when the address refresh times out, the SDK will thow a task cancelled exception. Currently the SDK does not have logic to catch this exception and will treat this as a timeout. With other timeouts, the SDK will wrap the exception in a 503 and upon reaching the RetryLayer, the ClientRetryPolicy will attempt to retry the call in the secondary region if available.

Proposed Solution

The proposed solution would be to catch the OperationCanceledException as well as any 410s (Timeouts) and wrap them in a 503. This will allow the ClientRetryPolicy to attempt to retry the call in the secondary region if available.

Impact of the Solution

There might be an impact with the use of this with the compute gateway. Further investigation will be needed to determine the impact of this change. A possible way to mitigate the impact would be to have a flag that would allow the user to enable this feature. This flag would be internal and not accessible to external customers.

Testing

The testing will be done with the FaultInjeciton Library, which will need to have metadata request support added before this fix can be tested. See #4795 for more information.

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)

Closing issues

To automatically close an issue: closes #4979

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Enables automation to merge PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Address Refresh Cross Regional Retries
1 participant