Incorrect exit code

nomad podman driver v0.6.1

When a container take a longer time to stop sometimes allocation exit code is 0 and other times is 137.

```
[root@nomadtesting test-image]# nomad job status redis 
ID            = redis
Name          = redis
Submit Date   = 2024-11-29T12:56:38+02:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Node Pool     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
redis       0       0         1        4       0         0     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created     Modified
68808dc9  253a8f99  redis       0        run      running  17m13s ago  16m57s ago
b897b38f  253a8f99  redis       0        stop     failed   23m1s ago   17m13s ago
cf225248  253a8f99  redis       0        stop     failed   32m42s ago  23m1s ago
e20f42db  253a8f99  redis       0        stop     failed   1h15m ago   32m42s ago
```
```
[root@nomadtesting test-image]# nomad alloc status e2
ID                   = e20f42db-31cd-f245-c548-6f9f5409bea2
Eval ID              = 942ad545
Name                 = redis.redis[0]
Node ID              = 253a8f99
Node Name            = nomadtesting.novalocal
Job ID               = redis
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 1h16m ago
Modified             = 33m38s ago
Replacement Alloc ID = cf225248

Task "redis" is "dead"
Task Resources:
CPU        Memory           Disk     Addresses
0/500 MHz  692 KiB/256 MiB  300 MiB  

Task Events:
Started At     = 2024-11-29T10:58:53Z
Finished At    = 2024-11-29T11:40:57Z
Total Restarts = 1
Last Restart   = 2024-11-29T13:37:52+02:00

Recent Events:
Time                       Type              Description
2024-11-29T13:40:57+02:00  Not Restarting    Error was unrecoverable
2024-11-29T13:40:57+02:00  Driver Failure    rpc error: code = FailedPrecondition desc = failed to remove dead container: cannot delete container, status code: 200
2024-11-29T13:37:52+02:00  Restarting        Task restarting in 0s
2024-11-29T13:35:29+02:00  Terminated        Exit Code: 137
2024-11-29T13:33:45+02:00  Restart Signaled  Template with change_mode restart re-rendered
2024-11-29T12:58:53+02:00  Started           Task started by client
2024-11-29T12:58:52+02:00  Task Setup        Building Task Directory
2024-11-29T12:58:52+02:00  Received          Task received by client
```
```
[root@nomadtesting test-image]# nomad alloc status cf
ID                   = cf225248-9e5c-0219-2624-9e6b6cb5010b
Eval ID              = cdb4feb7
Name                 = redis.redis[0]
Node ID              = 253a8f99
Node Name            = nomadtesting.novalocal
Job ID               = redis
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = stop
Desired Description  = alloc was rescheduled because it failed
Created              = 34m14s ago
Modified             = 24m33s ago
Replacement Alloc ID = b897b38f

Task "redis" is "dead"
Task Resources:
CPU        Memory           Disk     Addresses
0/500 MHz  688 KiB/256 MiB  300 MiB  

Task Events:
Started At     = 2024-11-29T11:42:11Z
Finished At    = 2024-11-29T11:49:38Z
Total Restarts = 1
Last Restart   = 2024-11-29T13:49:23+02:00

Recent Events:
Time                       Type              Description
2024-11-29T13:49:38+02:00  Not Restarting    Error was unrecoverable
2024-11-29T13:49:38+02:00  Driver Failure    rpc error: code = FailedPrecondition desc = failed to remove dead container: cannot delete container, status code: 200
2024-11-29T13:49:23+02:00  Restarting        Task restarting in 0s
2024-11-29T13:49:14+02:00  Terminated        Exit Code: 0
2024-11-29T13:48:56+02:00  Restart Signaled  Template with change_mode restart re-rendered
2024-11-29T13:42:11+02:00  Started           Task started by client
2024-11-29T13:41:57+02:00  Task Setup        Building Task Directory
2024-11-29T13:41:57+02:00  Received          Task received by client
```

This is caused by the fact that after a stop command is sent to running container 
curl -v  -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/stats?stream=false will return 200 with an empty body that will cause runContainerMonitor to call ContainerInspect. If the container wasn't killed yet it has exitcode=0
```
[root@nomadtesting test-image]# curl -v  -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/json | jq  | grep -i exitcode
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80 (#0)
> GET /v1.0.0/libpod/containers/387834dddaae1e141763740b37b6bec33d39e6bba998cc333370197ee2cf12be/json HTTP/1.1
> Host: d
> User-Agent: curl/7.76.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.2.2
< Server: Libpod/5.2.2 (linux)
< X-Reference-Id: 0xc00070a000
< Date: Fri, 29 Nov 2024 12:09:30 GMT
< Transfer-Encoding: chunked
< 
{ [6334 bytes data]
* Connection #0 to host d left intact
    "ExitCode": 0,
  "KubeExitCodePropagation": "invalid",
```

but after it is killed it has correct exit code
```[root@nomadtesting test-image]# curl -v  -s --unix-socket /run/podman/podman.sock http://d/v1.0.0/libpod/containers/$container_id/json | jq  | grep -i exitcode
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80 (#0)
> GET /v1.0.0/libpod/containers/387834dddaae1e141763740b37b6bec33d39e6bba998cc333370197ee2cf12be/json HTTP/1.1
> Host: d
> User-Agent: curl/7.76.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.2.2
< Server: Libpod/5.2.2 (linux)
< X-Reference-Id: 0xc00070a990
< Date: Fri, 29 Nov 2024 12:20:52 GMT
< Transfer-Encoding: chunked
< 
{ [6070 bytes data]
* Connection #0 to host d left intact
    "ExitCode": 137,
  "KubeExitCodePropagation": "invalid",
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect exit code #391

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect exit code #391

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions