You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following the 12/26 outage of SCUS, during the analyis of the impact of several customers there was a gap in the retry logic of the SDK, particuallry with the logic of cross regional retries for address refresh calls. This document outlines the impact of the gap and the proposed solution.
Impact
In the case of a regional outage, currently if the SDK attempts Address Refresh calls and the primary region is down, the SDK will not attempt to retry the call in the secondary region. This is due to the fact that when the address refresh times out, the SDK will thow a task cancelled exception. Currently the SDK does not have logic to catch this exception and will treat this as a timeout. With other timeouts, the SDK will wrap the exception in a 503 and upon reaching the RetryLayer, the ClientRetryPolicy will attempt to retry the call in the secondary region if available.
Proposed Solution
The proposed solution would be to catch the OperationCanceledException as well as any 410s (Timeouts) and wrap them in a 503. This will allow the ClientRetryPolicy to attempt to retry the call in the secondary region if available.
Impact of the Solution
There might be an impact with the use of this with the compute gateway. Further investigation will be needed to determine the impact of this change. A possible way to mitigate the impact would be to have a flag that would allow the user to enable this feature. This flag would be internal and not accessible to external customers.
Testing
The testing will be done with the FaultInjeciton Library, which will need to have metadata request support added before this fix can be tested. See #4795 for more information.
The text was updated successfully, but these errors were encountered:
Address Refresh Cross Regional Retries
Following the 12/26 outage of SCUS, during the analyis of the impact of several customers there was a gap in the retry logic of the SDK, particuallry with the logic of cross regional retries for address refresh calls. This document outlines the impact of the gap and the proposed solution.
Impact
In the case of a regional outage, currently if the SDK attempts Address Refresh calls and the primary region is down, the SDK will not attempt to retry the call in the secondary region. This is due to the fact that when the address refresh times out, the SDK will thow a task cancelled exception. Currently the SDK does not have logic to catch this exception and will treat this as a timeout. With other timeouts, the SDK will wrap the exception in a 503 and upon reaching the RetryLayer, the
ClientRetryPolicy
will attempt to retry the call in the secondary region if available.Proposed Solution
The proposed solution would be to catch the OperationCanceledException as well as any 410s (Timeouts) and wrap them in a 503. This will allow the
ClientRetryPolicy
to attempt to retry the call in the secondary region if available.Impact of the Solution
There might be an impact with the use of this with the compute gateway. Further investigation will be needed to determine the impact of this change. A possible way to mitigate the impact would be to have a flag that would allow the user to enable this feature. This flag would be internal and not accessible to external customers.
Testing
The testing will be done with the FaultInjeciton Library, which will need to have metadata request support added before this fix can be tested. See #4795 for more information.
The text was updated successfully, but these errors were encountered: