-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification on Expected Behaviour #188
Comments
Hi there, We just experienced a partial outage of our service due to the behavior you mention. A single user generated a huge number of jobs and was throttled. The jobs for other users where not processed, despite thread capacity. I would indeed expect throttled jobs to wait while other jobs are processed 🤔 A clarification would be indeed very welcome haha 😅 Thanks! ❤️ |
Are you seeing this behavior for jobs on the same queue or are jobs in lower priority queues also waiting? If the latter, could you share your queue configuration? |
It's on the same queue with default priority. It's the only queue the worker deals with. |
Currently the behavior is to add the throttled job back to the queue without any changes, so its enqueued_at value stays the same. Sidekiq then prioritizes jobs in the same queue by their enqueued_at value, which could lead to throttled jobs continuously being chosen over unthrottled ones. There's been discussion about having different behaviors for jobs that are being throttled, but I'm not sure where that work stands #150 |
See #52 (comment) : when a job is throttled it's put back at the end of the queue. The issue with other jobs not running is due to the cooldown Sidekiq::Throttled.configure do |config|
# Period in seconds to exclude queue from polling in case it returned
# {config.cooldown_threshold} amount of throttled jobs in a row. Set
# this value to `nil` to disable cooldown manager completely.
# Default: 2.0
config.cooldown_period = 2.0
# Exclude queue from polling after it returned given amount of throttled
# jobs in a row.
# Default: 1 (cooldown after first throttled job)
config.cooldown_threshold = 1
end If a queue contains a hundred jobs in a row that will be throttled, the cooldown will kick-in a hundred times in a row, meaning it will take 200 seconds before all those jobs are put back at the end of the queue and you actually start processing other jobs... It's tempting to set |
Hi @bdegomme, thanks for clarifying this. So if I understand it correctly, setting the threshold to Just to make sure, here is my personal situation: I run a PDF generation service. Each PDF generation is a job in Sidekiq. Sometimes we have one client that will flood us with thousands of documents to generate in a very short period of time. Without throttling, this means that other customers will wait a very long time before their documents get generated. I thought of sidekiq-throttled to solve this, thinking I would throttle generations based on the customer id as prefix thus preventing a single customer from taking the entire queue for too long. We can handle 30 generations concurrently so I throttled at 15 concurrent jobs per user so that this specific customer can only take half of the capacity. That's when shit hit the fan as it did throttle his generations but still stopped processing other jobs. From what you're saying, this was due to the cooldown kicking in and preventing the queue from being polled because too many jobs got throttled, thus preventing other customers generations from being handled. Is that correct? (side note: we ended-up creating a dedicated queue for this customer so they could not take over the entire queue but I'm still interested in understanding the issue) |
yes that's right. Maybe for you it makes sense to increase |
Alright! Thanks a lot for taking the time to explain it 🙏 😊 |
Hi there, I had the same problem, we apply throttle on some specific jobs, but since some times the amount of it enqueued is very big, 150k jobs enqueued and the throttle allow only 25k per hour, that was causing the not throttle jobs take forever to run. I tried to change the Sidekiq::Throttled.configure do |config|
# Period in seconds to exclude queue from polling in case it returned
# {config.cooldown_threshold} amount of throttled jobs in a row. Set
# this value to `nil` to disable cooldown manager completely.
# Default: 2.0
config.cooldown_period = ENV.fetch("SIDEKIQ_COOLDOWN_PERIOD", 2.0).to_f
# Exclude queue from polling after it returned given amount of throttled
# jobs in a row.
# Default: 1 (cooldown after first throttled job)
config.cooldown_threshold = ENV.fetch("SIDEKIQ_COOLDOWN_THRESHOLD", 1).to_i
end For now I can predict before enqueue the job if it will be throttled or not, so I just send them to another queue to no cause problem with the others. |
Yes you should put this in the sidekiq initializer. What did you set SIDEKIQ_COOLDOWN_PERIOD and SIDEKIQ_COOLDOWN_THRESHOLD to? Have you tried 1 and 5000 respectively? Then it should take around ~30s for non throttled jobs to start being processed. You may also consider setting cooldwon_period to nil to disable the cooldown manager and see if things work. |
I tried 2 seconds and 10000 the threshold, but even that way I couldn't see the difference |
Just chiming in that this was very helpful, and would probably go a long way to be expanded upon a bit in the docs. For a long time I thought the gem was basically breaking the queues and I wasn't totally sure it was working or if I had broken it in an upgrade. |
See #188 I think the default cooldown parameters are poorly chosen, and cause a lot of issues to people starting out with this gem (me included). The most important part is to update the README to insist on how critical those parameters are. And I also think a cooldown_period of 1 and cooldown_threshold of 100 are more reasonable defaults. --------- Co-authored-by: Alexey Zapparov <[email protected]>
Hopefully a quick question to clarify expected behaviour. We are seeing jobs waiting in the Enqueued state due to configured rate limit throttling (expected). However, there may be other jobs which are not throttled that are stuck behind the throttled job(s) that are not getting picked up by the fetcher. Is this the expected behaviour (i.e. jobs that could theoretically be run are stuck waiting until the throttled jobs ahead of them are processed)? Or is it expected that the fetcher would go through the queue and pull out the first job that is allowed to be run? If it is the former, is there a recommendation for how best to work around this (i.e. so Sidekiq is better able to fully utilize its resources while throttled jobs are paused)?
Thanks!
The text was updated successfully, but these errors were encountered: