Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree sitter memory leak when using the YAML parser in an extension #24742

Closed
kartikvashistha opened this issue Feb 12, 2025 · 9 comments · Fixed by #25054
Closed

Tree sitter memory leak when using the YAML parser in an extension #24742

kartikvashistha opened this issue Feb 12, 2025 · 9 comments · Fixed by #25054
Assignees

Comments

@kartikvashistha
Copy link
Contributor

Summary

Hi team, I'm the author of the Ansible Extension.

For the past few months, we have been facing an issue caused by tree-sitter highlighting wherein it breaks highlighting for a given Ansible file. The memory issue specifically occurs when a text insert is initiated either just before a comment # or --- (eg: shift+o when using vim mode on the mentioned symbols). This results in highlighting getting broken, with a memory leak caused specifically by ts_malloc_default from tree_sitter, which is reset only after either the editor is killed by an OOM service or manually closed.

One rather odd behaviour that we noticed was that when we tried to re-use a yaml parser for Ansible files, the memory leak started affecting non Ansible YAML files as well. Flummoxed, I created an identical YAML parser for Ansible, with the only difference being the name, and lo-behold this solved the memory leak issue for all non Ansible YAMl files when the extension was installed (and limited the memory leak to only files detected as Ansible files, when inserting text above # or ---).

I ran heaptrack to better find out where this issue was occurring, and my best guess is that the syntax map gets broken when taking a snapshot is either because the language name isn't sent correctly to ts_parser_parse or due to some speed issue when retrieving the name of the grammar which might be mitigated by say ts_parser_set_timeout_micros or set_sync_parse_timeout (under Buffer implementation).

(Please note: this is only my current best guess, I have no idea how right this is just yet.)
Image

The issue goes away completely if we remove the grammar table from extension.toml and re-use the internal YAML grammar, but unfortunately as you might be aware, the extension cli doesn't allow for this. This works fine only when installing the extension locally.

Any and all help here will be greatly appreciated. Thanks!

Steps to trigger the problem:

  1. Install the Ansible Extension;
  2. Configure the extension settings as described in the README;
  3. Clone this repo locally;
  4. Open this file and try inserting text above a comment or ---, and observe the memory leak on a system monitor tool of your choice.

Actual Behavior:
Highlighting breaks & Zed gobbles up all of the memory.

Expected Behavior:
No memory leak occurs.

Link to the heaptrack zst file: https://www.dropbox.com/scl/fi/7m8q4qbvshrzdka60tbxg/heaptrack.zed.109104.zst?rlkey=oj0e92m7kbt3vof54em41kfwm&st=zangv4eq&dl=0

Zed Version and System Specs

Zed: v0.172.11 (Zed)
OS: Linux Wayland Fedora 41
Memory: 31.2 GiB
Architecture: x86_64
GPU: NVIDIA GeForce RTX 2080 SUPER || NVIDIA || 565.77

@kartikvashistha
Copy link
Contributor Author

Also jfyi: the memory leak has been confirmed to be occurring on both macOS and Linux.

@ConradIrwin
Copy link
Member

Thanks for reporting! We'll look into this, as there should be no way a rogue extension can bring Zed down

@maxbrunsfeld maxbrunsfeld self-assigned this Feb 17, 2025
@maxbrunsfeld
Copy link
Collaborator

maxbrunsfeld commented Feb 17, 2025

Ok, I can reproduce this. It looks like Tree-sitter is going into an infinite loop due to a buggy external scanner in the zed-tree-sitter-yaml grammar.

@kartikvashistha I think that your external scanner is producing zero-length comment tokens, which is probably not intended. Because comments can appear anywhere, the parser doesn't change its state when it sees a comment. Thus we get an infinite loop of empty comments.

I've added protection against this bug in tree-sitter/tree-sitter#4213; Tree-sitter will now ignore empty external tokens that are considered extra.

@kartikvashistha
Copy link
Contributor Author

Thanks a lot for having a look at and working on this @maxbrunsfeld!

Just to be clear, the external scanner currently used by the Ansible extension should be the same as the one being used by zed's tree sitter yaml. What I never understood is why this wasn't an issue with normal YAML files, which use the same exact scanner 🤔

Additionally, when you get the time, would you mind briefly telling how you proceeded to debug this? I have spent a lot of time trying to debug this and couldn't really understand how to even proceed lol 😅

@maxbrunsfeld
Copy link
Collaborator

Thanks for pointing that out. I didn't realize they were exactly the same code. I think there is some undefined behavior in the YAML parser such that it behaves differently when loaded via WASM. Investigating...

@maxbrunsfeld
Copy link
Collaborator

Ok, the YAML parser is a bit weird, but I was able to track down the difference in behavior between the WASM and non-WASM code paths, and will fix that in Tree-sitter too. That way, your wasm parser should behave identically to the native YAML one (and won't have the infinite loop described above).

@maxbrunsfeld
Copy link
Collaborator

tree-sitter/tree-sitter#4218

@maxbrunsfeld
Copy link
Collaborator

maxbrunsfeld commented Feb 18, 2025

This fix will go out in Zed 0.175. @kartikvashistha Thanks so much for the reproduction info, and sorry you had to deal with this for so long, while trying to maintain the Ansible extension.

@kartikvashistha
Copy link
Contributor Author

No worries at all @maxbrunsfeld! Thanks a bunch for the speedy resolution!🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants