-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Checked for duplicates
Yes - I've already checked
Describe the bug
I have faced this issue twice on ops-pop1 venue where ASG was increased, from AWS console seems stuck "Updating the capacity" and the new workers are never added into the system.
For example on 2025-11-05 working on ops-pop1 cluster we had 1095 slc_download jobs in queue and while asg for opera-ops-pop1-opera-job_worker-slc_data_download. was increased to 100 there were only 12 nodes being used.
Both times faced with this issue I set the ASG for the affected queue to 0. When all nodes are released then I scaled up the ASG and this time new nodes were added to the workers pool.
Also not sure if related or not. I had unexplained Revoked jobs of same type.
What did you expect?
I expected to see the nodes added to system and speed up the download process without extra actions.
Reproducible steps
Environment
cluster: ops-pop1
pcm: 3.2.1