Unnecessary error-checking branch in `lexer.c`. #126469

qqwqqw689 · 2024-11-06T05:44:28Z

Bug report

Bug description:

In file lexer.c we have

    Py_ssize_t invalid = _PyUnicode_ScanIdentifier(s);
    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

link

But for function _PyUnicode_ScanIdentifier in unicodeobject.c file

Py_ssize_t
_PyUnicode_ScanIdentifier(PyObject *self)
{
    Py_ssize_t i;
    Py_ssize_t len = PyUnicode_GET_LENGTH(self);
    if (len == 0) {
        /* an empty string is not a valid identifier */
        return 0;
    }

    int kind = PyUnicode_KIND(self);
    const void *data = PyUnicode_DATA(self);
    Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
    /* PEP 3131 says that the first character must be in
       XID_Start and subsequent characters in XID_Continue,
       and for the ASCII range, the 2.x rules apply (i.e
       start with letters and underscore, continue with
       letters, digits, underscore). However, given the current
       definition of XID_Start and XID_Continue, it is sufficient
       to check just for these, except that _ must be allowed
       as starting an identifier.  */
    if (!_PyUnicode_IsXidStart(ch) && ch != 0x5F /* LOW LINE */) {
        return 0;
    }

    for (i = 1; i < len; i++) {
        ch = PyUnicode_READ(kind, data, i);
        if (!_PyUnicode_IsXidContinue(ch)) {
            return i;
        }
    }
    return i;
}

link
This function will never return a invalid valiable <0, so

    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

will never be executed.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux, macOS, Windows, Other

Linked PRs

gh-126469: remove unnecessary error-checking branch in lexer.c #126473

The text was updated successfully, but these errors were encountered:

rruuaanng · 2024-11-06T07:00:02Z

There seems to be no problem. Are you willing to submit PR to fix it? I don't have time now (if you don't want to repair it, you can leave it to me, maybe it will be a little late.

qqwqqw689 · 2024-11-06T07:06:25Z

@rruuaanng yes , I can.

skirpichev · 2024-11-06T11:38:57Z

I'm not sure it's a bug. This code was introduced in 74ea6b5. Obviously, at that time the check was required. The PyUnicode_READY() test was removed in f9c9354. So, it seems now _PyUnicode_ScanIdentifier() - can't fail.

We can either (1) remove error check or (2) just keep that as-is, in case it might be required in future. I think we can go with (1).

CC @methane as author of f9c9354

ZeroIntensity · 2024-11-06T14:01:50Z

IMO, there's no need to remove that. We might want to make that raise an error in the future, and most compilers should (hopefully) optimize that away if it can't ever happen anyway.

qqwqqw689 · 2024-11-06T14:27:27Z

@ZeroIntensity So make it a comment or something else like __builtin_expect?

ZeroIntensity · 2024-11-06T15:09:48Z

__builtin_expect could work, yeah.

qqwqqw689 added the type-bug An unexpected behavior, bug, or error label Nov 6, 2024

bedevere-app bot mentioned this issue Nov 6, 2024

gh-126469: remove unnecessary error-checking branch in lexer.c #126473

Open

tomasr8 added the topic-parser label Nov 6, 2024

picnixz changed the title ~~Maybe a bug in lexer.c file' error checking.~~ Unnecessary error-checking branch in lexer.c. Nov 6, 2024

ZeroIntensity added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary error-checking branch in `lexer.c`. #126469

Unnecessary error-checking branch in `lexer.c`. #126469

qqwqqw689 commented Nov 6, 2024 •

edited by bedevere-app bot

Loading

rruuaanng commented Nov 6, 2024

qqwqqw689 commented Nov 6, 2024

skirpichev commented Nov 6, 2024 •

edited

Loading

ZeroIntensity commented Nov 6, 2024

qqwqqw689 commented Nov 6, 2024

ZeroIntensity commented Nov 6, 2024

Unnecessary error-checking branch in lexer.c. #126469

Unnecessary error-checking branch in lexer.c. #126469

Comments

qqwqqw689 commented Nov 6, 2024 • edited by bedevere-app bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

rruuaanng commented Nov 6, 2024

qqwqqw689 commented Nov 6, 2024

skirpichev commented Nov 6, 2024 • edited Loading

ZeroIntensity commented Nov 6, 2024

qqwqqw689 commented Nov 6, 2024

ZeroIntensity commented Nov 6, 2024

Unnecessary error-checking branch in `lexer.c`. #126469

Unnecessary error-checking branch in `lexer.c`. #126469

qqwqqw689 commented Nov 6, 2024 •

edited by bedevere-app bot

Loading

skirpichev commented Nov 6, 2024 •

edited

Loading