Skip to content

[Bug]: ASG not scaling up properly #1294

@LalaP

Description

@LalaP

Checked for duplicates

Yes - I've already checked

Describe the bug

I have faced this issue twice on ops-pop1 venue where ASG was increased, from AWS console seems stuck "Updating the capacity" and the new workers are never added into the system.

For example on 2025-11-05 working on ops-pop1 cluster we had 1095 slc_download jobs in queue and while asg for opera-ops-pop1-opera-job_worker-slc_data_download. was increased to 100 there were only 12 nodes being used.

Both times faced with this issue I set the ASG for the affected queue to 0. When all nodes are released then I scaled up the ASG and this time new nodes were added to the workers pool.

Also not sure if related or not. I had unexplained Revoked jobs of same type.

Image Image Image

What did you expect?

I expected to see the nodes added to system and speed up the download process without extra actions.

Reproducible steps

Environment

cluster: ops-pop1
pcm: 3.2.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageIssue that requires triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions