-
Notifications
You must be signed in to change notification settings - Fork 430
[deploy preview] Fast inverted tree, take 2 #4806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This speeds up CallTree.getChildren.
The old code was saving time by sorting siblings only for the nodes that were displayed based on the current (preview) range selection. This made it cheaper to compute the flame graph for a small range, but it meant that it had to re-do the sort every time the selection changed. Now we do the sorting once, based on the entire call node table. This is expensive but it is a one-time cost. Then we cache the "ordered rows" and don't have to sort again on each range selection change.
We still build an inverted call node table. But the thread remains untouched. This means that the stackTable and the callNodeTable are upside down with respect to each other when the call tree is inverted. This can be a bit confusing. Some code needs to match non-inverted stacks to inverted call nodes. For these purposes, a StackToInvertedCallNodeMatcher class is introduced. Some address timings / line timings code is now unnecessary, specifically the code that was computing global information (and not information about a single call node) with special treatment for the inverted case. The unnecessary inverted case implementation was removed.
Compute the inverted selected call node in the action creator, not in the reducer. This means we don't have to pass the call tree as far down. Also move the computation into the CallTree class. And fix it to stop at nodes with heaviest self time, even if they're not leaf nodes.
This puts the conversion functions between call node indexes and call node paths onto the new CallNodeInfo interface. It also provides accessor methods for the traditional members. This will allow us to have two implementations in the future: One for the regular call tree, and one for the inverted call tree.
The flame graph is only drawn when we're not inverted, so getNonInvertedCallNodeTable always returns the same as getCallNodeTable.
This also removes an unused argument from some methods.
We are iterating over the filtered samples, which were already constrained to the zoomed range. Checking the time again might be useful to handle error cases where samples not sorted by time and the range filtering didn't work properly, but I don't think that's useful either; _accumulateInBuffer will make sure to never write outside of its bounds.
We were checking for transparent categories in order to optimize out the work for samples that were idle. However, this check was quite expensive in itself. These days, we have CPU usage information almost always, and checking for zero CPU is faster, so let's do that instead.
This doesn't seem to have made anything faster though.
This is quite effective in Firefox, unfortunately. It also means we can save repeated lookups of the implementation and categories in the "thisNodeIndex === needleNodeIndex" case.
This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable, because we'll now only be comparing integers. This speeds up _createInvertedRootCallNodeTable by almost 2x.
This speeds up _accumulateSampleCategories by 28%! It avoids repeated conversions between floats and integers. We use 1.15.16 (1 sign bit, 15 bits left of the decimal point, 16 bits right of the decimal point) for the i32 values.
This doesn't improve performance but simplifies the code a little bit.
Using firstChild / nextSibling / currentLastChild instead is 3.5x faster!
This change massively speeds up symbolication.
Superseded by #4900. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Deploy preview
┆Issue is synchronized with this Jira Task