Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip redundant celery queue tasks when appropriate #438

Open
btbonval opened this issue Jun 22, 2015 · 5 comments
Open

skip redundant celery queue tasks when appropriate #438

btbonval opened this issue Jun 22, 2015 · 5 comments

Comments

@btbonval
Copy link
Member

There are occasions when tasks do not run for long periods of time as a matter of course. This is typical in a dev environment, but is a constant feature of our staging system.

Certain tasks, like fix_note_counts are set to run every 24 hours to update the cache. However, running it 32 times because it has been 32 days since the worker ran is not beneficial. Be it 32 days or just 1, running fix_note_counts one time will bring the data to completion.

Certain other tasks, like tweets about a new note, are distinct and should be run.

Is there any way to create classes of tasks that queue in certain ways? If so, this should be implemented. Any update tasks only need to get queued one time; any more is wasteful.

@btbonval
Copy link
Member Author

This sort of feature would need to be supported in celery beat somehow. Maybe there's a singleton schedule method or task quota or something.

These two tasks serve no purpose being queued in multiple:

'check-mturk-results': {
'task': 'get_extract_keywords_results',
'schedule': timedelta(minutes=20),
},
'update-scoreboard': {
'task': 'fix_note_counts',
'schedule': timedelta(days=1),
},

@btbonval
Copy link
Member Author

Periodic task fields. Options are anything supported by apply_async().

apply_async() supports an expiry time. We could set the expiration to 1 days (24 hours), which would only allow between 1 and 2 instances of any particular daily update task to remain in the queue.

expires must be "as seconds after task publish" or a timestamp. Timestamp is not feasible because expires is set just one time at server load.

Something like this for 24 hour expiration after the task is published:

'update-scoreboard': {
  'task': 'fix_note_counts',
  'schedule': timedelta(days=1),
  'options': {'expires': 86400},
},

@btbonval
Copy link
Member Author

This ticket is an example of how much can be done while waiting for the staging system queue to complete.

still waiting...

@btbonval
Copy link
Member Author

... and since this is a second Heroku worker off the side of the main web worker, we're being charged for every minute or hour it runs. So we're wasting money recalculating these update statistics repeatedly and without any merit, since the first run and last run and every run in between will yield the same basic results for update tasks.

@btbonval
Copy link
Member Author

Applied expiration to all 3 periodic tasks, since none have any reason to build up a backlog. The code was put in a branch and pushed to beta for testing at the time of this issue comment.

Check back in a few days or a week and see how much has accumulated in the queue backlog. It should just be 3 tasks: one of each.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant