-
Notifications
You must be signed in to change notification settings - Fork 41
Add Retry Interval to REQUEST_ERROR #1339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: afrind <[email protected]>
afrind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this proposal.
This also fixes #1256.
Co-authored-by: afrind <[email protected]>
Co-authored-by: afrind <[email protected]>
ianswett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks sensible, but I'd prefer to not split error codes and make the retry-after always optional
draft-ietf-moq-transport.md
Outdated
| Length (16), | ||
| Request ID (i), | ||
| Error Code (i), | ||
| [Retry Interval (i)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of framing, I think I'd prefer to not split error codes and instead make the Retry Interval 'optional' by reserving 0 as a special value that's equivalent to it not being present.
This also avoids us having a discussion about which of the error codes should be retryable.
draft-ietf-moq-transport.md
Outdated
| If a Retry Interval exceeds the lifetime of a necessary authentication token | ||
| used in the request, so that a retry at that time would fail, the sender SHOULD | ||
| use an error code indicating a new authentication token is needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really mean this? Like if I get a DOES_NOT_EXIST with a valid auth token, and Retry-After 30 (since it will exist then), but the token expires in 15, I change the error to UNAUTHORIZED?
That seems super confusing. I think I'd rather get the real error code, and burn the round trip when my token actually expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am inclined to remove this entirely.
ianswett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments, but this looks close.
Co-authored-by: ianswett <[email protected]>
ianswett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
fluffy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should split out the stuff on code point allocation and put in separate PR that forms the IANA consideration section.
| | INVALID_JOINING_REQUEST_ID | 0x32 | {{message-request-error}} | | ||
| | UNKNOWN_STATUS_IN_RANGE | 0x33 | {{message-request-error}} | | ||
|
|
||
| The range of error codes including 0x100 to 0xffff is reserved for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not keen on implementation specific error code. I think that contributes to non interoperable implementations. I'd like to have some discussion on this.
| The range of error codes including 0x100 to 0xffff is reserved for | ||
| implementation-speceific codes. | ||
|
|
||
| The range of error codes starting with 0x10000 is reserved for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we right this up in the normal way we would IANA. This should be there is a way to get IANA to pre allocate a code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any reason we need to invent new IANA processes beyond what is in rfc8126
| sending more than one such message within a second or so across one or more | ||
| sessions, it SHOULD apply randomization to each retry interval so that retries | ||
| are spread out over time, minimizing the risk of synchronized retry storms. A | ||
| Retry Interval value of 1 indicates the request can be retried immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to do randomization even if it is not doing multiple requests. A common problem is Start of Service attacks. With IP phones we regularly see cities of over 10 million people loose power, then all restart at the same time often all with a set top box from of some sort from the same vendor. This forms a Start Of Service attack where the normal mitigations is for the server to be able to respond with retry after and it is more robust to spreading out the load if that can be spread.
My suggestion would be just make this very specific that the min time to respond was a number between the values returned and 125% of value returned with uniform random distribution in that time. Sure some clients won't bother but the ones that cause SoS attacks will.
| * Error Code: Identifies an integer error code for request failure. | ||
|
|
||
| * Retry Interval: The minimum time (in seconds) before the request SHOULD be | ||
| sent again, plus one. If the value is 0, the request SHOULD NOT be retried. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as something to think about ... ff you use seconds instead of ms, it makes it hard for the server to pace multiple clients but I can see how either would work.
|
I added this to the agenda for tomorrow's meeting (11/17) and in I would like to get a show of hands on whether we will or will not support implementation defined codes. |
Fixes the request error part of #1175.
Fixes: #1231