Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why can't we compress the ATN? #4732

Open
p7r0x7 opened this issue Nov 17, 2024 · 3 comments
Open

Why can't we compress the ATN? #4732

p7r0x7 opened this issue Nov 17, 2024 · 3 comments

Comments

@p7r0x7
Copy link

p7r0x7 commented Nov 17, 2024

Depending on how simply its implemented, it could be incredibly beneficial. Personally, since I'm already using zstd in my compiler project, I wouldn't mind zstd, but a super simple compression implementation could work.

@mike-lischke
Copy link
Member

Should not be a 3rd party lib, to avoid forcing all targets to have that in their specific environment. Instead a simple RLE might make more sense, but it's unclear if compressing the serialized ATN has any significant impact (on code size or runtime speed).

Instead maybe a new serialization format might be the better choice? However, I don't think that will ever be considered in ANTLR4. Instead follow the ANTLRng project, where this might become a reality.

@kaby76
Copy link
Contributor

kaby76 commented Dec 11, 2024

Should not be a 3rd party lib, to avoid forcing all targets to have that in their specific environment. Instead a simple RLE might make more sense, but it's unclear if compressing the serialized ATN has any significant impact (on code size or runtime speed).

Instead maybe a new serialization format might be the better choice? However, I don't think that will ever be considered in ANTLR4. Instead follow the ANTLRng project, where this might become a reality.

I'm not sure whether every ATN actually needs to be fully unpacked, and whether it should even be done in the parser constructor. For our largest grammar sql/plsql, unpacking takes 0.2s in C#, but most of the grammar isn't even used for the parse. For the entire test suite of 379 files, it's only 66% of the rules that are used. You would think that for small tests the %-used is even less.

But is there a problem? Is this too much time or space required?

@p7r0x7
Copy link
Author

p7r0x7 commented Dec 20, 2024

It could be optional, but zstd is brilliant and ubiquitous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants