Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary error-checking branch in lexer.c. #126469

Open
qqwqqw689 opened this issue Nov 6, 2024 · 6 comments
Open

Unnecessary error-checking branch in lexer.c. #126469

qqwqqw689 opened this issue Nov 6, 2024 · 6 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-parser type-bug An unexpected behavior, bug, or error

Comments

@qqwqqw689
Copy link
Contributor

qqwqqw689 commented Nov 6, 2024

Bug report

Bug description:

In file lexer.c we have

    Py_ssize_t invalid = _PyUnicode_ScanIdentifier(s);
    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

link

But for function _PyUnicode_ScanIdentifier in unicodeobject.c file

Py_ssize_t
_PyUnicode_ScanIdentifier(PyObject *self)
{
    Py_ssize_t i;
    Py_ssize_t len = PyUnicode_GET_LENGTH(self);
    if (len == 0) {
        /* an empty string is not a valid identifier */
        return 0;
    }

    int kind = PyUnicode_KIND(self);
    const void *data = PyUnicode_DATA(self);
    Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
    /* PEP 3131 says that the first character must be in
       XID_Start and subsequent characters in XID_Continue,
       and for the ASCII range, the 2.x rules apply (i.e
       start with letters and underscore, continue with
       letters, digits, underscore). However, given the current
       definition of XID_Start and XID_Continue, it is sufficient
       to check just for these, except that _ must be allowed
       as starting an identifier.  */
    if (!_PyUnicode_IsXidStart(ch) && ch != 0x5F /* LOW LINE */) {
        return 0;
    }

    for (i = 1; i < len; i++) {
        ch = PyUnicode_READ(kind, data, i);
        if (!_PyUnicode_IsXidContinue(ch)) {
            return i;
        }
    }
    return i;
}

link
This function will never return a invalid valiable <0, so

    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

will never be executed.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux, macOS, Windows, Other

Linked PRs

@qqwqqw689 qqwqqw689 added the type-bug An unexpected behavior, bug, or error label Nov 6, 2024
@rruuaanng
Copy link
Contributor

There seems to be no problem. Are you willing to submit PR to fix it? I don't have time now (if you don't want to repair it, you can leave it to me, maybe it will be a little late.

@qqwqqw689
Copy link
Contributor Author

@rruuaanng yes , I can.

@skirpichev
Copy link
Member

skirpichev commented Nov 6, 2024

I'm not sure it's a bug. This code was introduced in 74ea6b5. Obviously, at that time the check was required. The PyUnicode_READY() test was removed in f9c9354. So, it seems now _PyUnicode_ScanIdentifier() - can't fail.

We can either (1) remove error check or (2) just keep that as-is, in case it might be required in future. I think we can go with (1).

CC @methane as author of f9c9354

@picnixz picnixz changed the title Maybe a bug in lexer.c file' error checking. Unnecessary error-checking branch in lexer.c. Nov 6, 2024
@ZeroIntensity ZeroIntensity added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 6, 2024
@ZeroIntensity
Copy link
Member

IMO, there's no need to remove that. We might want to make that raise an error in the future, and most compilers should (hopefully) optimize that away if it can't ever happen anyway.

@qqwqqw689
Copy link
Contributor Author

@ZeroIntensity So make it a comment or something else like __builtin_expect?

@ZeroIntensity
Copy link
Member

__builtin_expect could work, yeah.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-parser type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants