Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialect option: normalize BCD "on-the-fly" when moving from ALPHANUMERIC to NUMERIC #200

Open
wants to merge 1 commit into
base: gcos4gnucobol-3.x
Choose a base branch
from

Conversation

ddeclerck
Copy link
Contributor

@ddeclerck ddeclerck commented Dec 2, 2024

This PR adds a new dialect option to allow on-the-fly "normalization" of BCD when moving from ALPHANUMERIC to NUMERIC. This is needed to properly mimic the GCOS COBOL behavior, for instance moving an ALPHANUMERIC containing "ABC456789}" to a NUMERIC yields "-1234567890" in this implementation (it tries to interpret the source as a PIC S9). See the new test at the end of run_misc.at.

@ddeclerck ddeclerck force-pushed the gcos_move branch 3 times, most recently from 2918afc to 675824c Compare December 5, 2024 15:30
@ddeclerck
Copy link
Contributor Author

@GitMensch

This seems to work. Should probably be improved.

MSYS2 CI failure is caused by different output format for floats (eg. where we have '-1.2345679E+9' in most environments, in MSYS2 we have '-1.2345679E+09') - any idea where this could be adjusted ?

@GitMensch
Copy link
Collaborator

GitMensch commented Dec 5, 2024

We had a similar issue with COMP-2 printing before, the solution was to never "printf" but instead use snprintf and post-adjust, if necessary.

Just FYI: That's the result with MF, after dropping the extension COMP-N (and the invalid index use):

    54     MOVE SRC-ALNUM TO DST-INDEX.
*  49-S*******************************                                       **
**    Illegal use of Index-name or Index Data-item

... with explicit dropping the numeric checks:

MF VC

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '123456789=-'
 -> DST-NUMEDIT  : '-1 234 567 89='
 -> DST-PACK     : '+123456789='
 -> DST-BIN      : '+1234567801'
 -> DST-BINC     : '+121'
 -> DST-BINS     : '+00633'
 -> DST-BINL     : '+1234567801'
 -> DST-BIND     : '+00000000001234567801'
 -> DST-COMP5    : '+0001234567801'
 -> DST-COMP6    : '123456789='
 -> DST-COMPX    : '01234567801'
 -> DST-FLTSHORT : ' .12345678E 10'
 -> DST-FLTLONG  : ' .123456780100000000E 010'
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567890-'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
 -> DST-BINC     : '-046'
 -> DST-BINS     : '+00722'
 -> DST-BINL     : '+1234567890'
 -> DST-BIND     : '+00000000001234567890'
 -> DST-COMP5    : '+0001234567890'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '01234567890'
 -> DST-FLTSHORT : ' .12345679E 10'
 -> DST-FLTLONG  : ' .123456789000000000E 010'

MF VC CHARSET"EBCDIC" (implies SIGN"EBCDIC")

note, also different results if just specifying SIGN"EBCDIC":

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '1234567890+'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
 -> DST-BINC     : '-046'
 -> DST-BINS     : '+00722'
 -> DST-BINL     : '+1234567890'
 -> DST-BIND     : '+00000000001234567890'
 -> DST-COMP5    : '+0001234567890'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '01234567890'
 -> DST-FLTSHORT : ' .12345679E 10'
 -> DST-FLTLONG  : ' .123456789000000000E 010'
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567897+'
 -> DST-NUMEDIT  : '+1 234 567 897'
 -> DST-PACK     : '+1234567897'
 -> DST-BIN      : '+1234567897'
 -> DST-BINC     : '-039'
 -> DST-BINS     : '+00729'
 -> DST-BINL     : '+1234567897'
 -> DST-BIND     : '+00000000001234567897'
 -> DST-COMP5    : '+0001234567897'
 -> DST-COMP6    : '1234567897'
 -> DST-COMPX    : '01234567897'
 -> DST-FLTSHORT : ' .12345679E 10'
 -> DST-FLTLONG  : ' .123456789700000000E 010'

... which is another variant, but that's nearer to the new one in this PR than 3.2 (contains mostly zero).

Enterprise COBOL

... and additional FYI the output of Enterprise COBOL (after changing the FLOAT types to COMP-1/COMP-2 and dropping the BINARY types) [no abort]:

 SRC-ALNUM        : 'ABC456789}'
  -> DST-NUMDISP  : 'ABC456789{'
  -> DST-NUMEDIT  : '+1 234 567 890'
  -> DST-PACK     : '1234567890'
  -> DST-BIN      : '1234567890'
  -> DST-FLTSHORT : ' .12345679E 10'
  -> DST-FLTLONG  : ' .12345678900000000E 10'

ACU 3.2.1

and last (took a while as no copy+paste or wget possible with that VM) on old ACU 3.2.1:

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : 'ABC456789}'
 -> DST-NUMEDIT  : '+A BC4 567 89}'
 -> DST-PACK     : ' 123456789='
 -> DST-BIN      : ' 8994567967'
 -> DST-COMP5    : ' 8994567967'
 -> DST-COMP6    : '123456789='
 -> DST-COMPX    : '  18994567967'
 -> DST-COMPN    : '  18994567967'
 -> DST-FLTSHORT : ' 1.8994567E10'
 -> DST-FLTLONG  : ' 1.8994567967000000E10'
 -> DST-INDEX    : ' 18146398783'
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : 'ABC456789p'
 -> DST-NUMEDIT  : '+A BC4 567 89p'
 -> DST-PACK     : ' 1234567890'
 -> DST-BIN      : ' 8994567954'
 -> DST-COMP5    : '  818453536'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '  18994567954'
 -> DST-COMPN    : ' 18994567954'
 -> DST-FLTSHORT : ' 1.8994567E10'
 -> DST-FLTLONG  : ' 1.8994567954000000E10'
 -> DST-INDEX    : ' 1814698770'

[and for me to remember: compile with ccbl -Df -o PRN.acu for using float items [alternative -Cv for "IBM numeric format" -no change to the values displayed] and actually do output, and run with TERM=vt100 A_TERMCAP=/opt/acu/etc/a_termcap runcbl ./PRN.acu (compiling with -Ca would do a "direct" DISPLAY but does output binary data, instead of "pretty-printing", which we want).

This PR's result

... and as a comparison the result of this PR so far:

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '-1234567890'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '-1234567890'
 -> DST-BIN      : '-1234567890'
 -> DST-BINC     : '+046'
 -> DST-BINS     : '-00722'
 -> DST-BINL     : '-1234567890'
 -> DST-BIND     : '-00000000001234567890'
 -> DST-COMP5    : '-00000000001234567890'
 -> DST-COMP6    : '1234567890'
 -> DST-COMPX    : '1234567890'
 -> DST-COMPN    : '1234567890'
 -> DST-FLTSHORT : '-1.2345679E+9'
 -> DST-FLTLONG  : '-1234567890'
 -> DST-INDEX    : '-1234567890'

@ddeclerck
Copy link
Contributor Author

I see. So it seems BCD is also "normalized" in MF and IBM, except that the sign is mostly ignored (except in MF when mving to NUMERIC-EDITED). Also I don't get why the results for binary in MF have exactly the opposite sign compared to GCOS... And then there are also differences in the way the fields are displayed - but I think this is not related to normalization.

I'm thinking maybe we should have two options, one to normalize the digits, and one for the sign (but then it would have to be more than a YES/NO flag, to account for MF keeping the sign only when moving to NUMEDIC-EDITED).

@GitMensch
Copy link
Collaborator

If I don't miss something, the most visible thing is that this is NOT about BCD (COMP-3/COMP-6) normalization, but about normalization of USAGE DISPLAY, because all environments but ACU show the same data -> normalized and only binary truncated for the binary fields.

I'll do the "review" after some meeting, to comment on the actual changes.

(just FYI - added the ACU results which are different and took too long to get in the first place; I'd ignore that output for now)

@ddeclerck
Copy link
Contributor Author

ddeclerck commented Dec 6, 2024

If I don't miss something, the most visible thing is that this is NOT about BCD (COMP-3/COMP-6) normalization, but about normalization of USAGE DISPLAY

Well, I'd still say it's BCD-related, since we're trying to interpret DISPLAY data as unpacked BCD.
But it's hard to come up with a proper description and self-explanatory flag names 😅.

BTW, it could be interesting to try with an ASCII sign ("p") instead of the EBCDIC sign ("}") in SRC-ALNUM.
Also, I'd be curious to know if those environment would consider a sign that is leading/trailing separate (on GCOS, it seems this DISPLAY data that we're moving to numeric is always interpreted as having a non-separate trailing sign - which is the default anyways for NUMERIC-DISPLAY).

@GitMensch
Copy link
Collaborator

Note: I've added the ASCII p variant above as well.

Concerning the PR: that's effectively a general issue. I've thought that I have broken something there, but this code goes back to at least OC 1.1, my changes only produces the same "intended" result with less instructions.
The general issues seen with inspecting the results from other COBOL vendors and the "old" existing implementation that uses cob_move_alphanum_to_display:

  • the sign is expected to be separate and in the first position; while with "redefined" numeric display data this is normally non-separate trailing
  • as soon as the first error is seen, the target is set to all zero
  • even with -fec=all this raises no exception (with MF it does by default, MF has an option to effectively make that fatal exception [abort] a non-fatal one [data is handled similar to this PR])

The question is how to go on. Using COB_D2I does work (is a bit more expensive, of course), so I tend to use it unconditionally in this function.
This mostly leaves the sign handling. My idea is to first use the current approach (skip all spaces, then check for a sign, if none found check at the last position of the field [effectively what this PR does, but not with an intermediate field]).

Another approach would be to try following COBOL 202x MOVE statement, general rule 7 d 4, which may (I'm not sure) should only be done if the target is neither numeric-display nor numeric-edited (or its national variants if the source is national):

  • define an intermediate unsigned field with no decimal places
  • size+digits = orig_size (or if this is > 31 then 31)
  • data=orig_data (or moved to the right to the last 31 positions, if there are more)
  • then do a numeric move from that intermediate field to the target one

That possibly yields nearly the same result as this PR (nearly because the data will likely always be handled as unsigned).
It would change the handling of leading separate sign of our current implementation (to an error in ascii in general and an error in ebcdic with "+").

What do you think?

@ddeclerck
Copy link
Contributor Author

The question is how to go on. Using COB_D2I does work (is a bit more expensive, of course), so I tend to use it unconditionally in this function. This mostly leaves the sign handling. My idea is to first use the current approach (skip all spaces, then check for a sign, if none found check at the last position of the field [effectively what this PR does, but not with an intermediate field]).

Would that imply doing this unconditionally instead of using a flag ?

Another approach would be to try following COBOL 202x MOVE statement, general rule 7 d 4, which may (I'm not sure) should only be done if the target is neither numeric-display nor numeric-edited (or its national variants if the source is national):

  • define an intermediate unsigned field with no decimal places
  • size+digits = orig_size (or if this is > 31 then 31)
  • data=orig_data (or moved to the right to the last 31 positions, if there are more)
  • then do a numeric move from that intermediate field to the target one

That possibly yields nearly the same result as this PR (nearly because the data will likely always be handled as unsigned). It would change the handling of leading separate sign of our current implementation (to an error in ascii in general and an error in ebcdic with "+").

Would this be configurabled through a dialect flag ? Our customer expects the sign to be preserved, as it is the case on GCOS.

@GitMensch
Copy link
Collaborator

For both options there would be an option to make this depending on a dialect (or optimization) flag.

If you like the first and go down that route I think there's no option necessary, is it?
Note: in any case we need a NEWS entry (especially for "no option" - I think that can go under "important bug fixes").

It may be good to have that test include a performance check option, as in

01 FILLER USAGE BINARY-INT VALUE 0.
88 DO-DISP VALUE 0.
88 NO-DISP VALUE 1.
REPLACE ==DISPLAY== BY ==IF DO-DISP DISPLAY==.
*
PROCEDURE DIVISION.
MAIN.
* Test with DISPLAY on error
PERFORM DO-CHECK.
>> IF CHECK-PERF IS DEFINED
SET NO-DISP TO TRUE
* some performance checks on the way...
PERFORM DO-CHECK 20000 TIMES.
>> END-IF
GOBACK.
DO-CHECK.

This allows to easily verify both correctness and speed if we change that later on.

@ddeclerck
Copy link
Contributor Author

MF VC

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '123456789=-'
 -> DST-NUMEDIT  : '-1 234 567 89='
 -> DST-PACK     : '+123456789='
 -> DST-BIN      : '+1234567801'
...
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567890-'
 -> DST-NUMEDIT  : '-1 234 567 890'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
...

I do not have the same results with MF VC (ASCII):

SRC-ALNUM        : 'ABC456789}'
 -> DST-NUMDISP  : '1234567903-'
 -> DST-NUMEDIT  : '+A BC4 567 89}'
 -> DST-PACK     : '+1234567903'
 -> DST-BIN      : '+1234567903'
 ...
SRC-ALNUM        : 'ABC456789p'
 -> DST-NUMDISP  : '1234567890-'
 -> DST-NUMEDIT  : '+A BC4 567 89p'
 -> DST-PACK     : '+1234567890'
 -> DST-BIN      : '+1234567890'
 ...

The two differences I notice are:

  • no normalization occur when moving to NUMERIC-EDITED
  • "789}" becomes "7903" instead of "789=" (which made more sense to me because '}' = 0x7D and '=' = 0x3D)

Now, I barely know how MF works, so maybe I'm doing something wrong ? Or maybe it's something that behaves differently depending on the version...

@ddeclerck ddeclerck force-pushed the gcos_move branch 2 times, most recently from a58b6f1 to 9115c8c Compare December 16, 2024 18:56
@GitMensch
Copy link
Collaborator

@ddeclerck Given 44c96d2 - what is the current state of this PR (and is it still high-priority)? If it is high, please rebase after upstreaming #204.

@GitMensch
Copy link
Collaborator

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

@ddeclerck
Copy link
Contributor Author

@ddeclerck Given 44c96d2 - what is the current state of this PR

Well, that commit you mention (from PR #201) was about fixing stuff and adding a bit of normalization on moves to NUMERCI-EDITED. This PR is about normalizing when moving from ALPHANUMERIC to NUMERIC.

(and is it still high-priority)? If it is high, please rebase after upstreaming #204.

It's still kind of important, although not as critical as the other issues reported in the past few days (and a new one that just arrived tonight 😭).

@ddeclerck
Copy link
Contributor Author

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

You'd want this in a separate PR ?

@GitMensch
Copy link
Collaborator

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

You'd want this in a separate PR ?

As you want, either a separate one or directly upstream.

@ddeclerck
Copy link
Contributor Author

Also note that this change, together with a minimal test case, should be upstreamed as well, potentially with a comment /* skip leading zeroes + */.

You'd want this in a separate PR ?

As you want, either a separate one or directly upstream.

I went through a PR just to ensure the CI passes on all platforms.
#207

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Friendly ping @ddeclerck for the current state (a rebase can be done as well ;-)

cobc/config.def Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't our conclusion that we did not need any config option for that as all environments do a normalization and we consider that a fix to a very old issue going back to OpenCOBOL?

Note: for both the dialect option and the general change we need a NEWS entry because of the difference to before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, since different dialects do not always normalize the same way ; some consider the sign, some ignore it, some don't normalize on some targets (for instance when moving to NUMERIC-EDITED). Depends how close we want to be to the original environment. It's true that a non-normalized result is not really useful in practice, and we also have warnings and/or exceptions to detect those invalid data.

Comment on lines +3889 to +3894
int sign;
cob_field_attr attr;
cob_field field;
COB_FIELD_INIT (COB_FIELD_SIZE (f), COB_FIELD_DATA (f), &attr);
COB_ATTR_INIT (COB_TYPE_NUMERIC_DISPLAY, COB_FIELD_SIZE (f), 0, COB_FLAG_HAVE_SIGN, NULL);
sign = cob_real_get_sign (&field, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more reasonable to duplicate the code of the called functions here? This way we don't need an intermediate field definition, just getting const char *p last_data = f->data + f->size - 1; and check p as in the function above?

libcob/move.c Outdated Show resolved Hide resolved
@@ -309,6 +309,7 @@ cob_move_alphanum_to_display (cob_field *f1, cob_field *f2)
const unsigned char *e2 = s2 + COB_FIELD_SIZE (f2);
const unsigned char dec_pt = COB_MODULE_PTR->decimal_point;
const unsigned char num_sep = COB_MODULE_PTR->numeric_separator;
unsigned char last;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need that for? The called function doesn't change the data, does it?
(numeric.c (cob_decimal_set_display) may change that, so that's a different thing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function (indirectly) calls cob_real_get_sign, which alters the byte holding the sign (always trailing when normalizing), we have to save/restore it (or maybe we could use cob_put_sign).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop the "normalize bcd function" and get the sign "directly", then we don't need to "unpunch" anything and therefore don't need to store/reset that position, do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably. I'll dive further into this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want an ascii test here for normalization as well.

@ddeclerck
Copy link
Contributor Author

Friendly ping @ddeclerck for the current state (a rebase can be done as well ;-)

Haven't looked at that in the past few days (focused on the GC3/GC4 merge ;) ), but that's really a PR I'd like to complete.

I added comments to some of yours.

There was also an unanswered questions in a former comment: #200 (comment)

Copy link
Collaborator

@GitMensch GitMensch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted before I suggest to drop the dialect configuration for now and always:

  • check for leading separate sign (adjusting the data offset as we did before)
  • iterate over the digits using COB_D2I
  • check for trailing embedded sign (as done with GCOS)

This won't give us exact results for "real" invalid data, but should be good in most cases, no?

It does change the result for "valid" (with embedded sign) to be as with other compilers, preserves the current behavior with "valid" (leading separate) and mostly changes that "invalid" (unexpected) data is only used as zero for real bad data (half-byte > 9, if not a sign).

The part that seems the most tricky to me is how to handle "embedded" spaces; this part may or may not still need a dialect configuration - we'd possibly need to check what compilers do here and if it is anywhere but in GnuCOBOL handled as zero - in this case we could adjust the result (we'll need a NEWS entry in any case).

What do you think?

Comment on lines +402 to +406
#ifndef COB_EBCDIC_MACHINE
*s2++ = (d | 0x30);
#else
*s2++ = (d | 0xF0);
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using COB_I2D here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, COB_I2D should be used.

@@ -309,6 +309,7 @@ cob_move_alphanum_to_display (cob_field *f1, cob_field *f2)
const unsigned char *e2 = s2 + COB_FIELD_SIZE (f2);
const unsigned char dec_pt = COB_MODULE_PTR->decimal_point;
const unsigned char num_sep = COB_MODULE_PTR->numeric_separator;
unsigned char last;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop the "normalize bcd function" and get the sign "directly", then we don't need to "unpunch" anything and therefore don't need to store/reset that position, do we?

if (count++ > 0) {
goto error;
}
} else if (!(isspace (*s1) || *s1 == num_sep)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will never get into the isspace case here, as that would be the integer 0 - so the code should be adjusted to either check for space first or drop the check completely

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True indeed.

} else {
for (p = s1; p < e1 && *p != dec_pt; ++p) {
const char d = COB_D2I (*p);
if (d >= 0 && d <= 9) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code also counts (not skip) spaces that way - does this match the expected result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually what happens on GCOS is very different. I had not taken into account what happens when there is a decimal point... While the current GnuCOBOL only moves digits that are before the decimal point, GCOS tries to move and normalize all digits. If it encounters a decimal point, it raises an exception because the decimal point - no matter if it is a comma (0x6B) or a dot (0x4B) - does not normalize to a valid digit...

In fact, GnuCOBOL tries to be smart - skips leading spaces, interprets the sign and the decimal point, while GCOS (and others) more or less boldly convert whatever is there. But how much do we want to keep the original GnuCOBOL behavior ? If we do want to keep it (to not break existing programs), we might as well just have two different normalization functions.

@@ -325,21 +326,35 @@ cob_move_alphanum_to_display (cob_field *f1, cob_field *f2)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly above this code there is a skipping of leading spaces; if we always check the half-byte only, then this should be checked via COB_D2I instead (otherwise it should be done that way depending on the dialect configuration) ... but somehow care would have to be taken to explicit match the leading +/- as character (I don't mind if this only happens after real space or also after zero; in which case we can handle both sign and leading space/zero in one loop; I also wouldn't mind iterating over everything until we don't find +/-/non-zero COB_D2I

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, something like ' -123' (where the blank before 123 is a tab - code 0x05 in EBCDIC, and considering minus has code 0x60) normalizes as 50123 on GCOS (checked). The same with a space instead of the tab normalizes as 00123. Depending on whether we use isspace or COB_D2I, and whether we consider the possibility of a leading sign, we'll get different results. I'm not sure what would be the best thing to do. Maybe we could have an option (not a dialect one) to specify which kind of sign to expect (leading separate, trailing embedded, none...) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants