Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Expected Behaviour #188

Open
gregfletch opened this issue Apr 11, 2024 · 14 comments
Open

Clarification on Expected Behaviour #188

gregfletch opened this issue Apr 11, 2024 · 14 comments

Comments

@gregfletch
Copy link

gregfletch commented Apr 11, 2024

Hopefully a quick question to clarify expected behaviour. We are seeing jobs waiting in the Enqueued state due to configured rate limit throttling (expected). However, there may be other jobs which are not throttled that are stuck behind the throttled job(s) that are not getting picked up by the fetcher. Is this the expected behaviour (i.e. jobs that could theoretically be run are stuck waiting until the throttled jobs ahead of them are processed)? Or is it expected that the fetcher would go through the queue and pull out the first job that is allowed to be run? If it is the former, is there a recommendation for how best to work around this (i.e. so Sidekiq is better able to fully utilize its resources while throttled jobs are paused)?

Thanks!

@simonc
Copy link

simonc commented Apr 15, 2024

Hi there,

We just experienced a partial outage of our service due to the behavior you mention. A single user generated a huge number of jobs and was throttled. The jobs for other users where not processed, despite thread capacity.

I would indeed expect throttled jobs to wait while other jobs are processed 🤔

A clarification would be indeed very welcome haha 😅

Thanks! ❤️

@mnovelo
Copy link
Contributor

mnovelo commented Apr 15, 2024

Are you seeing this behavior for jobs on the same queue or are jobs in lower priority queues also waiting? If the latter, could you share your queue configuration?

@simonc
Copy link

simonc commented Apr 15, 2024

It's on the same queue with default priority. It's the only queue the worker deals with.

@mnovelo
Copy link
Contributor

mnovelo commented Apr 15, 2024

Currently the behavior is to add the throttled job back to the queue without any changes, so its enqueued_at value stays the same. Sidekiq then prioritizes jobs in the same queue by their enqueued_at value, which could lead to throttled jobs continuously being chosen over unthrottled ones. There's been discussion about having different behaviors for jobs that are being throttled, but I'm not sure where that work stands #150

@bdegomme
Copy link
Contributor

bdegomme commented May 10, 2024

See #52 (comment) : when a job is throttled it's put back at the end of the queue. The issue with other jobs not running is due to the cooldown

Sidekiq::Throttled.configure do |config|
  # Period in seconds to exclude queue from polling in case it returned
  # {config.cooldown_threshold} amount of throttled jobs in a row. Set
  # this value to `nil` to disable cooldown manager completely.
  # Default: 2.0
  config.cooldown_period = 2.0

  # Exclude queue from polling after it returned given amount of throttled
  # jobs in a row.
  # Default: 1 (cooldown after first throttled job)
  config.cooldown_threshold = 1
end

If a queue contains a hundred jobs in a row that will be throttled, the cooldown will kick-in a hundred times in a row, meaning it will take 200 seconds before all those jobs are put back at the end of the queue and you actually start processing other jobs...

It's tempting to set config.cooldown_period = nil, but the cooldown is there to avoid overloading redis when a queue contains only throttled jobs, so probably not a good idea. Personally I set config.cooldown_period = 1.0 and config.cooldown_threshold = 100, and it resolved all practical issues (because it's much faster at moving a batch of throttled jobs at the end of the queue, at the price of more load on Redis).

@simonc
Copy link

simonc commented May 10, 2024

Hi @bdegomme, thanks for clarifying this. So if I understand it correctly, setting the threshold to 100 and the cooldown to 1.0 would mean that it would exclude the queue from polling for 1s after 100 jobs have been throttled in a row?


Just to make sure, here is my personal situation:

I run a PDF generation service. Each PDF generation is a job in Sidekiq. Sometimes we have one client that will flood us with thousands of documents to generate in a very short period of time. Without throttling, this means that other customers will wait a very long time before their documents get generated.

I thought of sidekiq-throttled to solve this, thinking I would throttle generations based on the customer id as prefix thus preventing a single customer from taking the entire queue for too long. We can handle 30 generations concurrently so I throttled at 15 concurrent jobs per user so that this specific customer can only take half of the capacity.

That's when shit hit the fan as it did throttle his generations but still stopped processing other jobs.

From what you're saying, this was due to the cooldown kicking in and preventing the queue from being polled because too many jobs got throttled, thus preventing other customers generations from being handled.

Is that correct?

(side note: we ended-up creating a dedicated queue for this customer so they could not take over the entire queue but I'm still interested in understanding the issue)

@bdegomme
Copy link
Contributor

bdegomme commented May 10, 2024

yes that's right. Maybe for you it makes sense to increase cooldown_threshold even more, but make sure to do it gradually so you know your Redis server can handle it

@simonc
Copy link

simonc commented May 10, 2024

Alright! Thanks a lot for taking the time to explain it 🙏 😊

@jonatasrancan
Copy link

Hi there,

I had the same problem, we apply throttle on some specific jobs, but since some times the amount of it enqueued is very big, 150k jobs enqueued and the throttle allow only 25k per hour, that was causing the not throttle jobs take forever to run.

I tried to change the cooldown_threshold to a bigger number but it didn't make any difference, do I need to put this code in any specific place? I just used my sidekiq initializer file?

Sidekiq::Throttled.configure do |config|
  # Period in seconds to exclude queue from polling in case it returned
  # {config.cooldown_threshold} amount of throttled jobs in a row. Set
  # this value to `nil` to disable cooldown manager completely.
  # Default: 2.0
  config.cooldown_period = ENV.fetch("SIDEKIQ_COOLDOWN_PERIOD", 2.0).to_f

  # Exclude queue from polling after it returned given amount of throttled
  # jobs in a row.
  # Default: 1 (cooldown after first throttled job)
  config.cooldown_threshold = ENV.fetch("SIDEKIQ_COOLDOWN_THRESHOLD", 1).to_i
end

For now I can predict before enqueue the job if it will be throttled or not, so I just send them to another queue to no cause problem with the others.

@bdegomme
Copy link
Contributor

Yes you should put this in the sidekiq initializer. What did you set SIDEKIQ_COOLDOWN_PERIOD and SIDEKIQ_COOLDOWN_THRESHOLD to? Have you tried 1 and 5000 respectively? Then it should take around ~30s for non throttled jobs to start being processed. You may also consider setting cooldwon_period to nil to disable the cooldown manager and see if things work.

@jonatasrancan
Copy link

I tried 2 seconds and 10000 the threshold, but even that way I couldn't see the difference

@jeffbax
Copy link

jeffbax commented Oct 25, 2024

See #52 (comment) : when a job is throttled it's put back at the end of the queue. The issue with other jobs not running is due to the cooldown

Sidekiq::Throttled.configure do |config|
  # Period in seconds to exclude queue from polling in case it returned
  # {config.cooldown_threshold} amount of throttled jobs in a row. Set
  # this value to `nil` to disable cooldown manager completely.
  # Default: 2.0
  config.cooldown_period = 2.0

  # Exclude queue from polling after it returned given amount of throttled
  # jobs in a row.
  # Default: 1 (cooldown after first throttled job)
  config.cooldown_threshold = 1
end

If a queue contains a hundred jobs in a row that will be throttled, the cooldown will kick-in a hundred times in a row, meaning it will take 200 seconds before all those jobs are put back at the end of the queue and you actually start processing other jobs...

It's tempting to set config.cooldown_period = nil, but the cooldown is there to avoid overloading redis when a queue contains only throttled jobs, so probably not a good idea. Personally I set config.cooldown_period = 1.0 and config.cooldown_threshold = 100, and it resolved all practical issues (because it's much faster at moving a batch of throttled jobs at the end of the queue, at the price of more load on Redis).

Just chiming in that this was very helpful, and would probably go a long way to be expanded upon a bit in the docs. For a long time I thought the gem was basically breaking the queues and I wasn't totally sure it was working or if I had broken it in an upgrade.

ixti added a commit that referenced this issue Nov 6, 2024
See #188

I think the default cooldown parameters are poorly chosen, and cause a
lot of issues to people starting out with this gem (me included).
The most important part is to update the README to insist on how
critical those parameters are.
And I also think a cooldown_period of 1 and cooldown_threshold of 100
are more reasonable defaults.

---------

Co-authored-by: Alexey Zapparov <[email protected]>
@ixti
Copy link
Owner

ixti commented Nov 6, 2024

#195 was merged.

PS: Please let me know WDYT about #174 - if it's something you might find useful - we can merge it too.

@jeffbax
Copy link

jeffbax commented Nov 7, 2024

#195 was merged.

PS: Please let me know WDYT about #174 - if it's something you might find useful - we can merge it too.

I think this description is great, thank you!

Dynamic values on queue or job names would also definitely be useful :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants