Skip to content

Discussion about the parser performances. #2554

Open
@denis-migdal

Description

@denis-migdal

Hi,

Currently the parser is Brython's biggest bottleneck.
Indeed, the py2ast process is responsible of ~74% of the total execution time.

I took a quick glance at it, and noticed several things:

  1. $.Parser does an inefficient copy of tokens (1,671 tokens for my tests, 150k+ on bigger files).
    Do we really need to perform such a copy ? If so, maybe using .filter() or :
    const tokens = new Array(_tokens.length); // preallocate
    let offset   = 0;
    // add a token:
    tokens[offset++] =  _tokens[i];
    // ....
    tokens.length = offset;
    would be more efficient ?
  2. I noticed a tokens.splice(), if it is called several times, it can be quite trouble some... do we really need it ?
    One way would be to mark the token, e.g. token.TO_REMOVE = true and then remove then all at once, when we are done.
  3. A token as a lot of fields (9) with some that seems redundant:
    • 4x position : the start position may be deductible from the previous token ?
    • line : why do we need to store it as when we have the lineno property ?
    • type/num_type : do we really need to store type ? I have the feeling we can deduce it from num_type ?
    • string/bytes : I didn't noticed a case where they have a different value.
  4. Lot of :
    var EXTRA = {};
    EXTRA.lineno = token.lineno;
    EXTRA.?          = token.?;
    I think it would help JS engine to do :
    const EXTRA = {
        lineno: token.lineno,
        ?          : token.?
    }

I didn't take a deeper look, but ofc if it is possible to store the tokens into a pre-allocated Float64Array, it could also help a lot (this can be a pre-allocated buffer we can copy then reuse when parsing several scripts).

Cordially,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions