scheduler: incorrect scheduling of batch job allocations on drain

Nomad scheduling of batch job allocations is currently inconsistent with the documented behavior. From the documentation, batch job allocations should behave in the following ways:

* when stopped with `nomad alloc stop` - [the allocation should be rescheduled](https://developer.hashicorp.com/nomad/commands/alloc/stop)
* when any task statuses become `failed` - [the allocation should be rescheduled](https://developer.hashicorp.com/nomad/docs/job-specification/reschedule)
* when drained - [the allocation should not be replaced](https://developer.hashicorp.com/nomad/commands/node/drain) (allocation is allowed to complete, or killed if deadline reached)

Currently the drain behavior is not working as documented. 

To document the current behavior, a cluster with 3 agents will be used along with the simple jobspec below defining a batch job:

<details><summary>batch jobspec</summary>

```hcl
job "sleep-job" {
  type = "batch"

  group "sleeper" {
    count = 5

    ephemeral_disk {
      size = 10
    }

    task "do_sleep" {
      driver = "raw_exec"

      logs {
        disabled      = true
        max_files     = 1
        max_file_size = 1
      }

      config {
        command = "sleep"
        args    = ["1d"]
      }

      resources {
        memory = 10
        cpu    = 5
      }
    }

    task "extra_sleep" {
      driver = "raw_exec"

      logs {
        disabled      = true
        max_files     = 1
        max_file_size = 1
      }

      config {
        command = "sleep"
        args    = ["1d"]
      }

      resources {
        memory = 10
        cpu    = 5
      }
    }
  }
}
```
</details>

## drain behavior

Running the job we get an initial status:

```
Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
sleeper     0       0         5        0       0         0     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
25f432b7  490b97bb  sleeper     0        run      running  3s ago   2s ago
a8eb00d4  717d40fd  sleeper     0        run      running  3s ago   2s ago
d05c5866  52a010ff  sleeper     0        run      running  3s ago   2s ago
dffa4043  490b97bb  sleeper     0        run      running  3s ago   2s ago
ec349a28  52a010ff  sleeper     0        run      running  3s ago   2s ago
```

Now, draining node `490b97bb` with a deadline of 2s results in:

```
Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
sleeper     0       0         5        2       0         0     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created   Modified
7ab9320e  52a010ff  sleeper     0        run      running  3s ago    2s ago
ccb7284f  717d40fd  sleeper     0        run      running  3s ago    2s ago
25f432b7  490b97bb  sleeper     0        stop     failed   2m6s ago  3s ago
a8eb00d4  717d40fd  sleeper     0        run      running  2m6s ago  2m5s ago
d05c5866  52a010ff  sleeper     0        run      running  2m6s ago  2m5s ago
dffa4043  490b97bb  sleeper     0        stop     failed   2m6s ago  2s ago
ec349a28  52a010ff  sleeper     0        run      running  2m6s ago  2m5s ago
```

The two allocations which were running on node `490b97bb` have a status of `failed` and were rescheduled. The expected behavior should be the two allocations having a status of `complete` and not being rescheduled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

scheduler: incorrect scheduling of batch job allocations on drain #26929

drain behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

scheduler: incorrect scheduling of batch job allocations on drain #26929

Description

drain behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions