Skip to content

Client and core inappropriately use reserved exit statuses #357

@arisu3

Description

@arisu3

The client and core misuse exit status 126 to indicate an unavailable GPU and status 127 to indicate a stalled WU:

/// v710, FAIL:
CBANG_ENUM_VALUE(GPU_UNAVAILABLE_ERROR, 126)
/// v745, FAIL:
CBANG_ENUM_VALUE(WU_STALLED, 127)

It also misuses status 255 to indicate a miscellaneous failure:

/// v623, FAIL:
CBANG_ENUM_VALUE(FAILED_3, 255)

This violates POSIX.1-2024 § 2.8.2 which reserves status 126 for a command that is found but not executable, status 127 for a command that could not be found (or which requires libraries that could not be found), and status 128 and above for fatal signals. Note that ld-linux.so will use status 126 and 127 even if the command is executed directly and a POSIX shell is not involved.

As a result of this misuse, errors involving missing libraries or permission errors cause the client to dump the WU with a misleading message that is confusing to users. As many of the cores have issues with missing libraries, this results in a large number of dumped WUs (tens of thousands of dumped WUs in #266, whose problem that was misidentified in #266 (comment) as a transient error that "just happened" to return status 127).

The only exit statuses that are permitted for application use are 0-125, inclusive.

I have not made a client PR because a PR for the core is also needed, and I have no access to the core's code. The change should be trivial, though, and could be easily made in time for core 0x28 with a patch of only a few lines.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions