Skip to content

Investigation: check scheduler performance when there are many workloads #8081

@mimowo

Description

@mimowo

I'm not sure if we have an issue or not, but I would like to verify if scheduler would send multiple Workload update requests or not in case the cluster is busy. Consider the scenario;

  1. the cluster has 100 gpu, all are busy
  2. we consider workload-X but it is inadmissible, so we put it into inadmissibleWorklaods (we update the workload with the reason here
  3. some workload ends, so we requeue all workloads
  4. the workload-X is reconsidered, but still cannot fit
    Question: do we send another request to update the Workload-X, or we just skip the update? IIUC the code even if we skip then we still send the event

The ask comes to better understand if we could improve performance on large scale deployments where we have 10k workloads, and constant inflow of new workloads, so "requeue" is called almost all the time, and workloads are constantly re-evaluated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions