-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make inverting the call tree fast, by computing inverted call nodes lazily #4900
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4900 +/- ##
==========================================
- Coverage 88.49% 88.47% -0.03%
==========================================
Files 304 304
Lines 27461 27955 +494
Branches 7430 7510 +80
==========================================
+ Hits 24302 24732 +430
- Misses 2934 2991 +57
- Partials 225 232 +7 ☔ View full report in Codecov by Sentry. |
8fd4e74
to
533a704
Compare
This just adds a new interface that we can hang functionality off of which is specific to the inverted tree. No functional changes.
22b07b4
to
b730acb
Compare
This is now ready for review! A few more comments are probably needed in the "Create inverted call nodes lazily" commit, but I think it's reviewable in its current state. |
This is the main new concept in this PR that allows us to make the inverted tree fast. See the comment above CallNodeInfoInverted in src/types/profile-derived.js for details. The PR is structured as follows: - Implement the suffix order in a brute force manner (this commit). - Use the suffix order to re-implement everything that was using the inverted call node table. - Once nothing is using the inverted call node table directly anymore, make it fast. We make it fast by rewriting the computation of the inverted call node table and of the suffix order so that we only materialize inverted call nodes that are displayed in the call tree, and not for every sample. And we only compute the suffix order to the level of precision needed to have correct ranges for all materialized inverted call nodes.
This function is used by getNativeSymbolsForCallNodeInverted, getStackAddressInfoForCallNodeInverted, and getStackLineInfoForCallNodeInverted. This replaces a call to getStackIndexToCallNodeIndex() with a call to getStackIndexToNonInvertedCallNodeIndex(). It also mostly removes the use of the inverted call node table for this code. (There's still a place that accesses callNodeInfo.getCallNodeTable().depth, but this will be fixed in a later commit.) We want to eliminate all callers to getStackIndexToCallNodeIndex() because we don't want to compute a mapping from non-inverted stack index to inverted call node index upfront.
… order. This replaces a call to getStackIndexToCallNodeIndex() with a call to getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to getCallNodeTable(). And it replaces a SampleIndexToCallNodeIndex mapping with a SampleIndexToNonInvertedCallNodeIndex mapping.
This replaces a call to getStackIndexToCallNodeIndex() with a call to getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to getCallNodeTable().
…ted call nodes. This function is used when hovering or clicking the activity graph. This commit replaces a SampleIndexToCallNodeIndex mapping with a SampleIndexToNonInvertedCallNodeIndex mapping.
The stack chart is always non-inverted, so this commit is functionally neutral. This lets us remove the now-unused function getSampleIndexToCallNodeIndexForFilteredThread.
This removes a few more uses of getCallNodeTable().
This replaces lots of uses of getCallNodeTable() with uses of getNonInvertedCallNodeTable(). It also replaces lots of uses of getStackIndexToCallNodeIndex() with uses of getStackIndexToNonInvertedCallNodeIndex(). We now compute the call tree timings quite differently for inverted mode compared to non-inverted mode. There's one part of the work that's shared: The getCallNodeLeafAndSummary computes the self time for each non-inverted node, and the result is used for both the inverted and the non-inverted call tree timings. The CallTreeTimings Flow type is turned into an enum, with a different type for CallTreeTimingsNonInverted and for CallTreeTimingsInverted. A new implementation for the CallTreeInternal interface is added.
All these places now deal with non-inverted call nodes, and for those, what we meant by "leaf" and by "self" was always the same thing. And I prefer the word "self" because "leaf" usually means "has no children" and that's not the case here. We still use the word "leaf" in many parts of the documentation.
Whether a function recurses (directly or indirectly) is the same in the inverted call node table and in the non-inverted call node table.
This just stops exposing it from the interface. The way we compute it will change in the next commit.
This is the main commit of this PR. Now that nothing is relying on having an inverted call node for each sample, or on having a fully-computed inverted call node table, we can make it so that we only add entries to the inverted call node table when we actually need a node, for example because it was revealed in the call tree. This makes it a lot faster to click the "Invert call stack" checkbox - before this commit, we were computing a lot of inverted call nodes that were never shown to the user. After this commit, CallNodeInfoInvertedImpl no longer inherits from CallNodeInfoImpl - it is now a fully separate implementation.
The new structure gives us a nice guarantee about roots of the inverted tree: There is an inverted root for every func, and their indexes are identical. This makes it really cheap to translate between the call node index and the func index (no conversion or lookup is necessary) and also makes it cheap to check if a node is a root. This commit also replaces a few maps and sets with typed arrays for performance. This is easier now that the root indexes are all contiguous.
This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable, because we'll now only be comparing integers. This speeds up _createInvertedRootCallNodeTable by almost 2x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitted my current comments! Hopefully they still make sense, but I think they do :-)
* Tree Left aligned Right aligned Reordered by suffix | ||
* - [cn0] A = A = A [so0] [so0] [cn0] A | ||
* - [cn1] B = A -> B = A -> B [so3] [so1] [cn4] A <- A | ||
* - [cn2] A = A -> B -> A = A -> B -> A [so2] ↘↗ [so2] [cn2] A <- B <- A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A difference with the previous situation, is that previously, in the inverted call tree, we'd have both [A, B] and [A, B, A], despite that [B, A] doesn't exist in the uninverted call tree.
Now several inverted call nodes (in this case [A, B] and [A, B, A]) could map to the same non-inverted call node. I guess the non-inverted call node maps to its deeper version?
* cnX: Non-inverted call node index X | ||
* soX: Suffix order index X | ||
* inX: Inverted call node index X | ||
* so:X..Y: Suffix order index range soX..soY (soY excluded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment above, I think I would have prefered to read it with soY included. But I understand that the code uses also this convention of excluding the last range boundary, so good for me.
Maybe we could make it better by just putting the legend above, right after the title Example
, so that when reading the examples all abbreviations are defined already.
src/utils/bisect.js
Outdated
*/ | ||
export function bisectLowerBound( | ||
array: number[] | $TypedArray, | ||
f: (number) => number, // < 0 if arg is before needle, > 0 if after, === 0 if same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to change it since the code will be removed, but it would have been good to actually call these functions aCompare
like mentioned in the comment (or at least some name that says clearly that this is a compare function)
src/profile-logic/call-node-info.js
Outdated
const callPath = this.getCallNodePathFromIndex(callNodeIndex); | ||
return bisectEqualRange( | ||
this._suffixOrderedCallNodes, | ||
(callNodeIndex: IndexIntoCallNodeTable) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more comment around the compare function would be useful.
Esp give a few different examples with callnodes of different lengths, and explain why this gives the intended result.
Also it would probably be a good idea to call the 2 callNodeIndex
differently: there's one as the parameter of the main function (the needleCallNodeIndex), and another one as the parameter of the compare function (the iteratingCallNodeIndex maybe?), that are not the same.
I don't understand it fully. Is it similar to _compareNonInvertedCallNodesInSuffixOrder
below, except that we computed the needle's callPath once at the start for a faster access in the iteration, while it would be more costly than necessary for all the iterating callNodeIndexes?
* cnX: Non-inverted call node index X | ||
* soX: Suffix order index X | ||
* inX: Inverted call node index X | ||
* so:X..Y: Suffix order index range soX..soY (soY excluded) | ||
*/ | ||
export interface CallNodeInfoInverted extends CallNodeInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not yet convinced of the advantage of using an interface over moving all the comments near the class implementation, especially that it's implemented by just one class. I feel like this makes all these explanations a bit far from where they could be useful.
(optional comment)
childrenSortedByFunc: InvertedCallNodeHandle[], | ||
func: IndexIntoFuncTable | ||
): InvertedCallNodeHandle | null { | ||
// TODO: Use bisection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left for a future reader?
/** | ||
* For an inverted call node whose children haven't been created yet, this | ||
* returns the "deep nodes" corresponding to its suffix ordered call nodes. | ||
* A deep node is the k'th parent node of a non-inverted call node, where k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: isn't this instead?
* A deep node is the k'th parent node of a non-inverted call node, where k | |
* A deep node is the k'th parent node of an inverted call node, where k |
*/ | ||
_createChildren( | ||
parentNodeHandle: InvertedCallNodeHandle, | ||
specialChildFunc: IndexIntoFuncTable | null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to mention what this "special child" is in the comment too
* of the self node. The deep node's function determines which one of the | ||
* children in the *inverted* tree the non-inverted node is assigned to. | ||
*/ | ||
_createChildren( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to have an overview of how things work, let's say depth by depth. Especially the "dance" between the deep nodes information and the children and the parent and all that. It would be good to be explicit about the order of operations that makes everything work.
const currentFunc = | ||
this._nonInvertedCallNodeTable.func[currentCallNodeIndex]; | ||
return currentFunc - expectedFunc; | ||
unsortedCallNodesSelfNodeCol.push(selfNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear to me why this operation isn't done when there's no parent.
Deploy preview
Fixes #467, fixes #337, fixes #3313.
Some changes that could still be made to this PR:
┆Issue is synchronized with this Jira Task