Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm] postpone creation of handles for show/hide lists #60473

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

2ZeroSix
Copy link

@2ZeroSix 2ZeroSix commented Apr 3, 2025

fixes: #60470

for some reason it fixes deallocation of empty show/hide lists

before (ends with out of memory):
image

after:
image


  • I’ve reviewed the contributor guide and applied the relevant portions to this PR.
Contribution guidelines:
  • See our contributor guide for general expectations for PRs.
  • Larger or significant changes should be discussed in an issue before creating a PR.
  • Contributions to our repos should follow the Dart style guide and use dart format.

Note that this repository uses Gerrit for code reviews. Your pull request will be automatically converted into a Gerrit CL and a link to the CL written into this PR. The review will happen on Gerrit but you can also push additional commits to this PR to update the code review.

fixes: dart-lang#60470

for some reason it fixes deallocation of empty show/hide lists
@2ZeroSix 2ZeroSix changed the title postpone creation of handles for show/hide lists [vm] postpone creation of handles for show/hide lists Apr 3, 2025
Copy link

Thank you for your contribution! This project uses Gerrit for code reviews. Your pull request has automatically been converted into a code review at:

https://dart-review.googlesource.com/c/sdk/+/420301

Please wait for a developer to review your code review at the above link; you can speed up the review if you sign into Gerrit and manually add a reviewer that has recently worked on the relevant code. See CONTRIBUTING.md to learn how to upload changes to Gerrit directly.

Additional commits pushed to this PR will update both the PR and the corresponding Gerrit CL. After the review is complete on the CL, your reviewer will merge the CL (automatically closing this PR).

Copy link

https://dart-review.googlesource.com/c/sdk/+/420301 has been updated with the latest commits from this pull request.

@mraleph
Copy link
Member

mraleph commented Apr 4, 2025

Interesting, thanks. I think we can land it - though notice that you also need to add two checks xyz_list.IsNull() before xyz_list.Length() > 0 in if statements after the loop.

Could you try the patch1 from this CL instead of this one against your code base? I am curious how that performs with respect to memory usage.

I think we need to be more vigilant with handles when loading kernel, I suspect loading produces loads of those and all temporary objects and handles themselves are retained until the loading completes and surrounding HANDLESCOPE is destroyed. (cc @alexmarkov for visibility)

Footnotes

  1. If you never used Gerrit there is a Download button on right hand side right above the list of modified files. It shows different ways to apply the patch.

Copy link

https://dart-review.googlesource.com/c/sdk/+/420301 has been updated with the latest commits from this pull request.

1 similar comment
Copy link

https://dart-review.googlesource.com/c/sdk/+/420301 has been updated with the latest commits from this pull request.

@2ZeroSix
Copy link
Author

2ZeroSix commented Apr 4, 2025

@mraleph

Interesting, thanks. I think we can land it - though notice that you also need to add two checks xyz_list.IsNull() before xyz_list.Length() > 0 in if statements after the loop.

done

Could you try the patch1 from this CL instead of this one against your code base? I am curious how that performs with respect to memory usage.

my patch
image

your patch
image

both patches
image

@2ZeroSix
Copy link
Author

2ZeroSix commented Apr 4, 2025

execution time under profiler was consistently better with both patches applied, i'm running a benchmark to see if the difference is real (will be done in approx 3.5 hours)

2ZeroSix added a commit to 2ZeroSix/sdk that referenced this pull request Apr 4, 2025
Closes dart-lang#60473

GitOrigin-RevId: 4e731c9
Change-Id: I5dd9b179ff0ed4f546ca0fea16456ccda7545f10
@2ZeroSix
Copy link
Author

2ZeroSix commented Apr 4, 2025

execution time under profiler was consistently better with both patches applied, i'm running a benchmark to see if the difference is real (will be done in approx 3.5 hours)

diff between https://dart-review.googlesource.com/c/sdk/+/420321 and both fixes is not convincing enough yet, but overall these fixes seems to reduce gen_snapshot time by ~7% for our app

Will check against a commit that can be built with a version without these fixes and try to remove additional noise by measuring only the LoadProgram step tomorrow

result

script:
https://gist.github.com/2ZeroSix/9f448bd51b596432c30c7bd0b22d9066

output:

================================================================================
BENCHMARK RESULTS
================================================================================

gen_snapshot_both:
  Individual runs (seconds):
    Run 1: 345.40s
    Run 2: 336.66s
    Run 3: 330.03s
    Run 4: 337.52s
    Run 5: 421.26s # excluded
    Run 6: 351.93s
    Run 7: 333.67s
    Run 8: 335.12s
    Run 9: 332.88s
    Run 10: 320.78s
  Average: 335.99s (fastest)
  Median:  335.12s
  Min:     320.78s
  Max:     351.93s
  Range:   31.15s

gen_snapshot_mraleph:
  Individual runs (seconds):
    Run 1: 339.49s
    Run 2: 341.86s
    Run 3: 336.59s
    Run 4: 342.88s
    Run 5: 339.70s
    Run 6: 344.09s
    Run 7: 339.10s
    Run 8: 349.48s
    Run 9: 963.80s # excluded
    Run 10: 341.20s
  Average: 341.59s (1.66% slower than gen_snapshot_both)
  Median:  341.20s
  Min:     336.59s
  Max:     349.48s
  Range:   12.89s

gen_snapshot_2zerosix:
  Individual runs (seconds):
    Run 1: 353.91s
    Run 2: 388.05s
    Run 3: 387.13s
    Run 4: 343.74s
    Run 5: 348.80s
    Run 6: 360.28s
    Run 7: 356.25s
    Run 8: 345.39s
    Run 9: 442.23s # excluded
    Run 10: 361.02s
  Average: 360.50s (7.29% slower than gen_snapshot_both)
  Median:  356.25s
  Min:     343.74s
  Max:     388.05s
  Range:   44.31s

================================================================================

@2ZeroSix
Copy link
Author

2ZeroSix commented Apr 5, 2025

Will check against a commit that can be built with a version without these fixes and try to remove additional noise by measuring only the LoadProgram step tomorrow

new measurements: 2 patches combined reduce time spent before ProcessPendingClasses by 2/3

measurements

patch

diff --git a/runtime/vm/kernel_loader.cc b/runtime/vm/kernel_loader.cc
index 22bf6773536..0c9fa5fde5f 100644
--- a/runtime/vm/kernel_loader.cc
+++ b/runtime/vm/kernel_loader.cc
@@ -512,6 +512,7 @@ ObjectPtr KernelLoader::LoadProgram(bool process_pending_classes) {
     for (intptr_t i = 0; i < length; i++) {
       LoadLibrary(i);
     }
+    OS::Exit(0);
 
     // Finalize still pending classes if requested.
     if (process_pending_classes) {

masured by: https://gist.github.com/2ZeroSix/28a62636f87b3ed42028f3c53d128e1e

================================================================================
BENCHMARK RESULTS FOR app.dill
================================================================================

gen_snapshot_both_only_load:
  Original runs (seconds):
    Run 1: 10.41s
    Run 2: 10.09s
    Run 3: 10.27s
    Run 4: 11.34s
    Run 5: 10.51s
    Run 6: 10.10s
    Run 7: 10.28s
    Run 8: 10.20s
    Run 9: 10.12s
    Run 10: 10.14s
  Average: 10.35s (fastest)
  Median:  10.24s (fastest)
  Min:     10.09s
  Max:     11.34s
  Range:   1.25s

gen_snapshot_mraleph_only_load:
  Original runs (seconds):
    Run 1: 16.84s
    Run 2: 16.67s
    Run 3: 16.56s
    Run 4: 16.65s
    Run 5: 16.53s
    Run 6: 16.67s
    Run 7: 17.37s
    Run 8: 16.59s
    Run 9: 16.56s
    Run 10: 18.51s
  Average: 16.90s (63.30% slower than gen_snapshot_both_only_load)
  Median:  16.66s (62.72% slower than gen_snapshot_both_only_load)
  Min:     16.53s
  Max:     18.51s
  Range:   1.98s

gen_snapshot_2zerosix_only_load:
  Original runs (seconds):
    Run 1: 19.73s
    Run 2: 20.01s
    Run 3: 19.83s
    Run 4: 20.26s
    Run 5: 21.18s
    Run 6: 21.58s
    Run 7: 19.84s
    Run 8: 24.28s
    Run 9: 19.99s
    Run 10: 19.66s
  Filtered runs (excluding outliers):
    Run 1: 19.73s
    Run 2: 20.01s
    Run 3: 19.83s
    Run 4: 20.26s
    Run 5: 21.18s
    Run 6: 21.58s
    Run 7: 19.84s
    Run 8: 19.99s
    Run 9: 19.66s
  Average: 20.23s (95.55% slower than gen_snapshot_both_only_load)
  Median:  19.99s (95.32% slower than gen_snapshot_both_only_load)
  Min:     19.66s
  Max:     21.58s
  Range:   1.91s

gen_snapshot_original_only_load:
  Original runs (seconds):
    Run 1: 38.30s
    Run 2: 29.88s
    Run 3: 32.24s
    Run 4: 30.70s
    Run 5: 39.87s
    Run 6: 31.30s
    Run 7: 30.89s
    Run 8: 30.35s
    Run 9: 29.97s
    Run 10: 29.88s
  Filtered runs (excluding outliers):
    Run 1: 29.88s
    Run 2: 32.24s
    Run 3: 30.70s
    Run 4: 31.30s
    Run 5: 30.89s
    Run 6: 30.35s
    Run 7: 29.97s
    Run 8: 29.88s
  Average: 30.65s (196.25% slower than gen_snapshot_both_only_load)
  Median:  30.53s (198.22% slower than gen_snapshot_both_only_load)
  Min:     29.88s
  Max:     32.24s
  Range:   2.36s

================================================================================

================================================================================
BENCHMARK RESULTS FOR app_healthy.dill
================================================================================

gen_snapshot_both_only_load:
  Original runs (seconds):
    Run 1: 9.96s
    Run 2: 9.85s
    Run 3: 9.73s
    Run 4: 9.97s
    Run 5: 10.06s
    Run 6: 10.03s
    Run 7: 9.79s
    Run 8: 10.88s
    Run 9: 9.83s
    Run 10: 9.79s
  Average: 9.99s (fastest)
  Median:  9.90s (fastest)
  Min:     9.73s
  Max:     10.88s
  Range:   1.15s

gen_snapshot_mraleph_only_load:
  Original runs (seconds):
    Run 1: 15.94s
    Run 2: 16.91s
    Run 3: 16.48s
    Run 4: 15.90s
    Run 5: 16.93s
    Run 6: 16.01s
    Run 7: 17.00s
    Run 8: 17.46s
    Run 9: 16.46s
    Run 10: 18.02s
  Average: 16.71s (67.30% slower than gen_snapshot_both_only_load)
  Median:  16.69s (68.58% slower than gen_snapshot_both_only_load)
  Min:     15.90s
  Max:     18.02s
  Range:   2.12s

gen_snapshot_2zerosix_only_load:
  Original runs (seconds):
    Run 1: 18.59s
    Run 2: 19.08s
    Run 3: 18.96s
    Run 4: 20.06s
    Run 5: 19.72s
    Run 6: 19.79s
    Run 7: 20.31s
    Run 8: 20.38s
    Run 9: 19.18s
    Run 10: 18.86s
  Average: 19.49s (95.15% slower than gen_snapshot_both_only_load)
  Median:  19.45s (96.39% slower than gen_snapshot_both_only_load)
  Min:     18.59s
  Max:     20.38s
  Range:   1.79s

gen_snapshot_original_only_load:
  Original runs (seconds):
    Run 1: 30.82s
    Run 2: 36.09s
    Run 3: 30.63s
    Run 4: 34.65s
    Run 5: 34.95s
    Run 6: 28.49s
    Run 7: 29.06s
    Run 8: 28.39s
    Run 9: 28.65s
    Run 10: 29.06s
  Filtered runs (excluding outliers):
    Run 1: 30.82s
    Run 2: 30.63s
    Run 3: 28.49s
    Run 4: 29.06s
    Run 5: 28.39s
    Run 6: 28.65s
    Run 7: 29.06s
  Average: 29.30s (193.32% slower than gen_snapshot_both_only_load)
  Median:  29.06s (193.40% slower than gen_snapshot_both_only_load)
  Min:     28.39s
  Max:     30.82s
  Range:   2.43s

================================================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[gen_snapshot] Out of memory during types canonicalization with compressed pointers
2 participants