Skip to content

Commit 6f4c178

Browse files
committed
wip dedupe reload on poll error; and on non-transient request errors, reload rather than retry
Hmm or maybe for those non-transient request errors we should stick with retrying? Maybe up to a bounded number of times? Those are by definition a bug either in the server or in our formulating the request: it's either a 4xx we didn't expect, or a malformed response (so in either case, a mismatch of our expectations to the server's behavior, and therefore a bug in one or the other), or something other than ApiRequestException entirely (and so a bug in the API request code). It's therefore hard to predict what's most likely to work. Maybe the most likely cause is that the server is having operational trouble that's not showing up as 5xx responses. ... Hmm, do we handle rate-limit errors? No, we don't. So *that's* one possible cause, anyway. Certainly we should back off if we get one of those! ... Though I'm not sure that getEvents is actually currently subject to rate-limiting, being in Tornado rather than Django. Also, when we abandon retrying getEvents and switch to re-register, we do end up backing off if that fails. It's just that that's a heavier request in the first place.
1 parent 026d032 commit 6f4c178

File tree

1 file changed

+47
-29
lines changed

1 file changed

+47
-29
lines changed

lib/model/store.dart

Lines changed: 47 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -968,13 +968,6 @@ class UpdateMachine {
968968

969969
store.isLoading = true;
970970
switch (e) {
971-
case ZulipApiException(code: 'BAD_EVENT_QUEUE_ID'):
972-
assert(debugLog('Lost event queue for $store. Replacing…'));
973-
// This disposes the store, which disposes this update machine.
974-
await store._globalStore._reloadPerAccount(store.accountId);
975-
debugLog('… Event queue replaced.');
976-
return;
977-
978971
case Server5xxException() || NetworkException():
979972
assert(debugLog('Transient error polling event queue for $store: $e\n'
980973
'Backing off, then will retry…'));
@@ -990,14 +983,7 @@ class UpdateMachine {
990983
continue;
991984

992985
default:
993-
assert(debugLog('Error polling event queue for $store: $e\n'
994-
'Backing off and retrying even though may be hopeless…'));
995-
// TODO(#186): Handle unrecoverable failures
996-
_reportToUserErrorConnectingToServer(e);
997-
await (backoffMachine ??= BackoffMachine()).wait();
998-
if (_disposed) return;
999-
assert(debugLog('… Backoff wait complete, retrying poll.'));
1000-
continue;
986+
rethrow;
1001987
}
1002988
}
1003989

@@ -1027,27 +1013,52 @@ class UpdateMachine {
10271013
try {
10281014
await store.handleEvent(event);
10291015
if (_disposed) return;
1030-
} catch (e) {
1031-
assert(debugLog('BUG: Error handling an event: $e\n' // TODO(log)
1032-
' event: $event\n'
1033-
'Replacing event queue…'));
1034-
// TODO maybe tell user (in beta) that event handling threw error
1035-
// TODO dedupe this with the other _reloadPerAccount sites?
1036-
// This disposes the store, which disposes this update machine.
1037-
await store._globalStore._reloadPerAccount(store.accountId);
1038-
debugLog('… Event queue replaced.');
1039-
return;
1016+
} catch (e, stackTrace) {
1017+
Error.throwWithStackTrace(
1018+
_EventHandlingException(cause: e, event: event),
1019+
stackTrace);
10401020
}
10411021
}
10421022
if (events.isNotEmpty) {
10431023
lastEventId = events.last.id;
10441024
}
10451025
}
10461026
} catch (e) {
1047-
assert(debugLog('BUG: Unexpected error in event polling: $e\n' // TODO(log)
1048-
'Replacing event queue…'));
1049-
// TODO maybe tell user (in beta) that event poll loop threw error
1050-
// TODO dedupe this with the other _reloadPerAccount sites above?
1027+
// An error occurred, other than the transient request errors we retry on.
1028+
// This means either a lost/expired event queue on the server (which is
1029+
// normal after the app is offline for a period like 10 minutes),
1030+
// or an unexpected exception representing a bug in our code or the server.
1031+
// Either way, the show must go on. So reload server data from scratch.
1032+
1033+
// First, log what happened.
1034+
switch (e) {
1035+
case ZulipApiException(code: 'BAD_EVENT_QUEUE_ID'):
1036+
assert(debugLog('Lost event queue for $store. Replacing…'));
1037+
// The old event queue is gone, so we need a new one. This is normal.
1038+
1039+
case _EventHandlingException(:final cause, :final event):
1040+
assert(debugLog('BUG: Error handling an event: $cause\n' // TODO(log)
1041+
' event: $event\n'
1042+
'Replacing event queue…'));
1043+
// TODO maybe tell user (in beta) that event handling threw error
1044+
// We can't just continue with the next event, because our state
1045+
// may be garbled due to failing to apply this one (and worse,
1046+
// any invariants that were left in a broken state from where
1047+
// the exception was thrown). So reload from scratch.
1048+
// Hopefully (probably?) the bug only affects our implementation of
1049+
// the *change* in state represented by the event, and when we get the
1050+
// new state in a fresh InitialSnapshot we'll handle that just fine.
1051+
1052+
default:
1053+
assert(debugLog('BUG: Unexpected error in event polling: $e\n' // TODO(log)
1054+
'Replacing event queue…'));
1055+
_reportToUserErrorConnectingToServer(e);
1056+
// Similar story to the _EventHandlingException case;
1057+
// separate only so that one can print more context.
1058+
// The bug here could be in the server if it's an ApiRequestException,
1059+
// but our remedy is the same either way.
1060+
}
1061+
10511062
// This disposes the store, which disposes this update machine.
10521063
await store._globalStore._reloadPerAccount(store.accountId);
10531064
debugLog('… Event queue replaced.');
@@ -1143,3 +1154,10 @@ class UpdateMachine {
11431154
@override
11441155
String toString() => '${objectRuntimeType(this, 'UpdateMachine')}#${shortHash(this)}';
11451156
}
1157+
1158+
class _EventHandlingException implements Exception {
1159+
final Object cause;
1160+
final Event event;
1161+
1162+
_EventHandlingException({required this.cause, required this.event});
1163+
}

0 commit comments

Comments
 (0)