Recovering from "Too many messages have been received without being deleted" #12

danieroux · 2018-11-16T09:41:25Z

When hammering a filled up queue with many peers to suck of messages as fast as possible, Amazon throws an exception which kills the job.

I would like to know how to recover from this. My stopgap is to use less peers:

 [{:type clojure.lang.ExceptionInfo
   :message "Too many messages have been received without being deleted.\nPlease delete your received messages or let them timeout before receiving more. (Service: AmazonSQS; Status Code: 403; Error Code: OverLimit; Request ID: f53c9b7c-76d4-57a4-9e55-cf1e77e7e885)"
   :data {:original-exception :com.amazonaws.services.sqs.model.OverLimitException}
   :at [com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1639]}]

As far as I can figure out:

sqs/delete-message-async-batch gets called in checkpointed!. This only happens after a 100k messages have already been read off the queue in the case of many peers.
Which means that poll! fails with >100k messages in flight.

Can I get some guidance on how to handle it?

Is it as simple as not doing a sqs/receive-messages if (< (count @processing) 100,000)?
Or would a separate counter be more useful/efficient?
What else should I be aware of before I touch the code?

The text was updated successfully, but these errors were encountered:

lbradstreet · 2018-11-18T19:32:00Z

Yes, I think we need to add a backoff mechanism to only allow X messages at a time. I thought we had already added one but I reviewed the code and it looks like we didn't.

sqs/receive-messages if (< (count @processing) 100,000) would probably be the best way of doing this, rather than using a counter, as it's the best idea of how many messages we have outstanding at a given time.

You can also add a lifecycle handler to handle the exception so that the job won't be killed, though this obviously won't help with the root cause.

I'd be happy accept a PR to implement this. Make sure to implement the schemas defined in https://github.com/onyx-platform/onyx-amazon-sqs/blob/0.14.x/src/onyx/tasks/sqs.clj

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recovering from "Too many messages have been received without being deleted" #12

Recovering from "Too many messages have been received without being deleted" #12

danieroux commented Nov 16, 2018

lbradstreet commented Nov 18, 2018

Recovering from "Too many messages have been received without being deleted" #12

Recovering from "Too many messages have been received without being deleted" #12

Comments

danieroux commented Nov 16, 2018

lbradstreet commented Nov 18, 2018