-
Notifications
You must be signed in to change notification settings - Fork 36
[WIP] Fix IBM CI #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gitside-gnucobol-3.x
Are you sure you want to change the base?
[WIP] Fix IBM CI #238
Conversation
0bd1054 to
7987b46
Compare
|
Most of the failures seem to be related to the use of big-endian. Let's explore the failure on test 800 (801 is similar). The relevant part is this COBOL "extern" procedure: The generated C code for this procedure is (relevant parts only): So we have a COBOL procedure that takes a 32-bit integer as second argument, and the corresponding C argument ( A possible solution would be to have C declarations match the COBOL argument sizes more closely, but that would probably break the current ABI. Or the pointers could be adjusted according the the actual size when performing the intermediate call (for instance in the case above, we could generate Test 928 is also an endian issue. We are calling the The generated code that calls this function is:
For test 1252, the problem comes from different generated field types for TALLY: on a little-endian platform, it is a COB_TYPE_NUMERIC_COMP5 (with flag COB_FLAG_BINARY_SWAP), while on big endian it is a COB_TYPE_NUMERIC_BINARY. Thie yields different code paths being executed in |
please add a first/last position code here using the precompiler and the WORDS_BIGENDIAN defintion
As the type is "BY VALUE", the 64bit integer should be used in all cases, the cast is wrong (that's an "unfinished" part so breaking ABI here is OK and if we don't believe this will have a big impact even does not need a NEWS entry). I guess rebasing will fix the non-ibm CIs, no? |
Once PR #237 is merged, yes. |
hm... that part is about "expectations" how non-pretty/binary is printed... |
|
BTW: Can we debug via ssh on the IBM provided CI environments? |
Not sure, I'll ask them. I set up an Ubuntu/S390x QEMU VM on my machine for my investigations, but it's (obvisouly) very slow. |
GitMensch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, everything "just works" :-)
I'm a bit confused why the big-endian issues "just vanished"?!?
It would be interesting to see a run that uses VBISAM to check how the result looks there (we may disable it later if there are too much errrors).
| run: | | ||
| sed -i '/AT_SETUP(\[runtime check: write to internal storage (1)\])/a AT_SKIP_IF(\[true\])' tests/testsuite.src/run_misc.at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you possibly want to add the reason as a comment, I think that's "hardened builds break on failure, so libcob cannot detect the error directly after"
Sorry for the false joy - the testsuite step in th S390x CI is set to "continue-on-error" ; the issues are still there 😅 |
|
Ah, I was already questioning myself... then please let it fail ... and possibly add those to expected failure via sed, get the CI in the repo, then do a follow-up PR (may just use this one if directly committed to the 3x workflows, otherwise a new one) which drops the "sed" and fixes the issues. |
1261d77 to
4a9d7eb
Compare
1c24029 to
d770655
Compare
|
@GitMensch We've started to tackle these big-endian issues (since they also occur on AIX, which we need to handle for our new customer). For now I'm attempting to fix this one: This one occus because we hit My quick fix is to wrap the destination under a reference in |
|
please add Changelogs and check if/how this is solved in 4.x (a merge of a change that includes even more is often useful, if possible); |
Good point indeed. The code itself for the |
|
You cannot pass a literal BY REFERENCE, can you? Can you please check what MF actually does? ... and how do GC3 + GC4 handle that? |
MF on AIX (Server Express 5.1) allows BY REFERENCE on a numeric literal, without warning/error. I belive this just behaves as BY CONTENT...
We're on 64-bit. I made some tests. Passing BY VALUE a numeric literal larger than 32-bit gives a compile-time error. If you pass it BY CONTENT, the literal may be larger than 32-bit, but truncated to 32-bit at run time. |
|
Please share your test programs (ideally as a commit in the testsuite of this PR ;-) even if those currently "fail" in GC) and I'll rerun them with VisualCOBOL 32+64 bit on RHEL. |
I had in fact two variations of the following program, one for GnuCOBOL and one for AIX (not sure yet what "expected" output we would want in the test suite). This is the GnuCOBOL variant. The AIX variant has a 32-bit constant instead in the 4th call, and has a different output on the 5th et 6th calls (the value is truncated to 32-bit, if we use print_p64 it is shown twice, but if we use print_p32 then we can't distinguish the GnuCOBOL and MF behaviors). I.e the AIX output (if we adjust the literal in the 4th call): |
Just experiments for now.
Works "just fine" on ppc64le & ppc64le-p10.
A couple of failures and an unexpected pass on s390x: