-
Notifications
You must be signed in to change notification settings - Fork 25
Description
The client and core misuse exit status 126 to indicate an unavailable GPU and status 127 to indicate a stalled WU:
fah-client-bastet/src/fah/client/ExitCode.h
Lines 164 to 168 in ca69e57
/// v710, FAIL: | |
CBANG_ENUM_VALUE(GPU_UNAVAILABLE_ERROR, 126) | |
/// v745, FAIL: | |
CBANG_ENUM_VALUE(WU_STALLED, 127) |
It also misuses status 255 to indicate a miscellaneous failure:
fah-client-bastet/src/fah/client/ExitCode.h
Lines 170 to 171 in ca69e57
/// v623, FAIL: | |
CBANG_ENUM_VALUE(FAILED_3, 255) |
This violates POSIX.1-2024 § 2.8.2 which reserves status 126 for a command that is found but not executable, status 127 for a command that could not be found (or which requires libraries that could not be found), and status 128 and above for fatal signals. Note that ld-linux.so will use status 126 and 127 even if the command is executed directly and a POSIX shell is not involved.
As a result of this misuse, errors involving missing libraries or permission errors cause the client to dump the WU with a misleading message that is confusing to users. As many of the cores have issues with missing libraries, this results in a large number of dumped WUs (tens of thousands of dumped WUs in #266, whose problem that was misidentified in #266 (comment) as a transient error that "just happened" to return status 127).
The only exit statuses that are permitted for application use are 0-125, inclusive.
I have not made a client PR because a PR for the core is also needed, and I have no access to the core's code. The change should be trivial, though, and could be easily made in time for core 0x28 with a patch of only a few lines.