Skip to content

[Performances] Use DOP principles for the AST generation ? #2545

Open
@denis-migdal

Description

@denis-migdal

Hi,

I recently learned about the DOP (Data Oriented Programming) principles.

I wonder if it could help to speed up the Py2JS process by changing how the AST is stored.

In the previous Brython version (not up-to-date), I have the following performances :

  • Py2Ast : 36ms
  • AST2js : 18ms
  • browser js parsing : 4.4ms
  • execution : 4.8ms

Py2JS is therefore ~5/6 of the total execution time.

DOP seems to promise quite good speed up. Zig parser was rewritten using it, with a significant speed gain:
https://www.youtube.com/watch?v=IroPQ150F6c

The main principle is to :

  • reduce object creations ;
  • reduce the amount of RAM -> L2 cache updates.
  • reduce the amount of copy / if / etc.

We could transform the ASTNode tree into a struct of arrays, greatly reducing object creations, and possibly access time.

For example, some of the AST node properties, e.g. lineno, could be stored in a :

const lineno = new Float64Array(); // we could pre-allocate such array depending on the script length (e.g. script.length / 3).

// get the lineno of the ASTNode with the id my_astnode_id.
lineno[my_astnode_id]

Children could also be stored in arrays :

const firstChildren = new Float64Array(); // we could pre-allocate such array depending on the script length (e.g. script.length / 3).
const nextBrother = new Float64Array(); 

// get the first children of the ASTNode with the id my_astnode_id.
let childID = firstChildren[my_astnode_id]
// get the next children :
let child     = nextBrother[childID]  // if childID === -1 => no more children.

toJS function could be fetched from a NODE_TYPE_ID :

const nodes_type = new Float64Array();

// get the toJS for my id my_astnode_id.
toJS[ nodes_type[my_astnode_id] ]( ... );

I don't really have time to work on it (still haven't finished SBrython), maybe I'll try it in a few years ;).

EDIT: maybe I could try on my SBrython ASTConv to see if I have some changes in performances with it.
Could be quicker to test if this is a promising solution. Won't be able to work on it for maybe a year, but that's something that I could try once I'll have some more free time.

Cordially,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions