You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When hammering a filled up queue with many peers to suck of messages as fast as possible, Amazon throws an exception which kills the job.
I would like to know how to recover from this. My stopgap is to use less peers:
[{:type clojure.lang.ExceptionInfo
:message "Too many messages have been received without being deleted.\nPlease delete your received messages or let them timeout before receiving more. (Service: AmazonSQS; Status Code: 403; Error Code: OverLimit; Request ID: f53c9b7c-76d4-57a4-9e55-cf1e77e7e885)"
:data {:original-exception :com.amazonaws.services.sqs.model.OverLimitException}
:at [com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1639]}]
As far as I can figure out:
sqs/delete-message-async-batch gets called in checkpointed!. This only happens after a 100k messages have already been read off the queue in the case of many peers.
Which means that poll! fails with >100k messages in flight.
Can I get some guidance on how to handle it?
Is it as simple as not doing a sqs/receive-messages if (< (count @processing) 100,000)?
Or would a separate counter be more useful/efficient?
What else should I be aware of before I touch the code?
The text was updated successfully, but these errors were encountered:
Yes, I think we need to add a backoff mechanism to only allow X messages at a time. I thought we had already added one but I reviewed the code and it looks like we didn't.
sqs/receive-messages if (< (count @processing) 100,000) would probably be the best way of doing this, rather than using a counter, as it's the best idea of how many messages we have outstanding at a given time.
You can also add a lifecycle handler to handle the exception so that the job won't be killed, though this obviously won't help with the root cause.
When hammering a filled up queue with many peers to suck of messages as fast as possible, Amazon throws an exception which kills the job.
I would like to know how to recover from this. My stopgap is to use less peers:
As far as I can figure out:
sqs/delete-message-async-batch
gets called incheckpointed!
. This only happens after a 100k messages have already been read off the queue in the case of many peers.poll!
fails with >100k messages in flight.Can I get some guidance on how to handle it?
sqs/receive-messages
if(< (count @processing) 100,000)
?The text was updated successfully, but these errors were encountered: