Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make inverting the call tree fast, by computing inverted call nodes lazily #4900

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

mstange
Copy link
Contributor

@mstange mstange commented Jan 23, 2024

Deploy preview

Fixes #467, fixes #337, fixes #3313.

Some changes that could still be made to this PR:

  • A commit at the end which removes bisectLowerBound(), bisectUpperBound(), and bisectEqualRange() again
  • More comments around the lazy inverted call node info:
    • How the suffix order is computed incrementally as call nodes are created
    • Something about "deep nodes", i.e. the n'th parent of a non-inverted node which corresponds to an inverted node at depth n
    • More about how children are created, and that a node either knows about all of its children or none of them; in other words, if a node has been created, all its sibling nodes have been created, too
  • A test which checks inverted diff profiles (call nodes with 0 diff time should be visible if they have non-zero diff descendants)
  • Find out why code coverage is saying that some methods CallNodeInfoInvertedImpl are never being hit, and possibly add a test or remove code

┆Issue is synchronized with this Jira Task

Copy link

codecov bot commented Jan 23, 2024

Codecov Report

Attention: Patch coverage is 90.20173% with 68 lines in your changes missing coverage. Please review.

Project coverage is 88.47%. Comparing base (9dd06a0) to head (086aec7).

Files Patch % Lines
src/profile-logic/call-node-info.js 85.34% 44 Missing and 7 partials ⚠️
src/profile-logic/profile-data.js 92.64% 10 Missing ⚠️
src/profile-logic/call-tree.js 96.75% 5 Missing ⚠️
src/components/flame-graph/FlameGraph.js 66.66% 1 Missing ⚠️
src/components/stack-chart/index.js 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4900      +/-   ##
==========================================
- Coverage   88.49%   88.47%   -0.03%     
==========================================
  Files         304      304              
  Lines       27461    27955     +494     
  Branches     7430     7510      +80     
==========================================
+ Hits        24302    24732     +430     
- Misses       2934     2991      +57     
- Partials      225      232       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This just adds a new interface that we can hang functionality off of
which is specific to the inverted tree. No functional changes.
@mstange mstange force-pushed the fast-invert4 branch 3 times, most recently from 22b07b4 to b730acb Compare August 7, 2024 23:35
@mstange mstange marked this pull request as ready for review August 7, 2024 23:35
@mstange mstange requested a review from julienw August 7, 2024 23:35
@mstange
Copy link
Contributor Author

mstange commented Aug 7, 2024

This is now ready for review! A few more comments are probably needed in the "Create inverted call nodes lazily" commit, but I think it's reviewable in its current state.

This is the main new concept in this PR that allows us to make
the inverted tree fast. See the comment above CallNodeInfoInverted in
src/types/profile-derived.js for details.

The PR is structured as follows:
 - Implement the suffix order in a brute force manner (this commit).
 - Use the suffix order to re-implement everything that was using the
   inverted call node table.
 - Once nothing is using the inverted call node table directly anymore,
   make it fast. We make it fast by rewriting the computation of the
   inverted call node table and of the suffix order so that we only
   materialize inverted call nodes that are displayed in the call tree,
   and not for every sample. And we only compute the suffix order to the
   level of precision needed to have correct ranges for all materialized
   inverted call nodes.
This function is used by getNativeSymbolsForCallNodeInverted,
getStackAddressInfoForCallNodeInverted, and
getStackLineInfoForCallNodeInverted.

This replaces a call to getStackIndexToCallNodeIndex() with a call to
getStackIndexToNonInvertedCallNodeIndex(). It also mostly removes the
use of the inverted call node table for this code. (There's still a
place that accesses callNodeInfo.getCallNodeTable().depth, but this will
be fixed in a later commit.)

We want to eliminate all callers to getStackIndexToCallNodeIndex() because
we don't want to compute a mapping from non-inverted stack index to
inverted call node index upfront.
… order.

This replaces a call to getStackIndexToCallNodeIndex() with a call to
getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to
getCallNodeTable(). And it replaces a SampleIndexToCallNodeIndex
mapping with a SampleIndexToNonInvertedCallNodeIndex mapping.
This replaces a call to getStackIndexToCallNodeIndex() with a call to
getStackIndexToNonInvertedCallNodeIndex(). It also removes a call to
getCallNodeTable().
…ted call nodes.

This function is used when hovering or clicking the activity graph.

This commit replaces a SampleIndexToCallNodeIndex mapping with a
SampleIndexToNonInvertedCallNodeIndex mapping.
The stack chart is always non-inverted, so this commit is functionally
neutral.

This lets us remove the now-unused function
getSampleIndexToCallNodeIndexForFilteredThread.
This removes a few more uses of getCallNodeTable().
This replaces lots of uses of getCallNodeTable() with uses of
getNonInvertedCallNodeTable().
It also replaces lots of uses of getStackIndexToCallNodeIndex() with
uses of getStackIndexToNonInvertedCallNodeIndex().

We now compute the call tree timings quite differently for inverted mode
compared to non-inverted mode. There's one part of the work that's shared:
The getCallNodeLeafAndSummary computes the self time for each non-inverted
node, and the result is used for both the inverted and the non-inverted
call tree timings.
The CallTreeTimings Flow type is turned into an enum, with a different
type for CallTreeTimingsNonInverted and for CallTreeTimingsInverted.
A new implementation for the CallTreeInternal interface is added.
All these places now deal with non-inverted call nodes, and for those,
what we meant by "leaf" and by "self" was always the same thing.
And I prefer the word "self" because "leaf" usually means "has no children"
and that's not the case here.
We still use the word "leaf" in many parts of the documentation.
Whether a function recurses (directly or indirectly) is the same
in the inverted call node table and in the non-inverted call node
table.
This just stops exposing it from the interface.

The way we compute it will change in the next commit.
This is the main commit of this PR. Now that nothing is relying on having
an inverted call node for each sample, or on having a fully-computed
inverted call node table, we can make it so that we only add entries to
the inverted call node table when we actually need a node, for example
because it was revealed in the call tree. This makes it a lot faster
to click the "Invert call stack" checkbox - before this commit, we were
computing a lot of inverted call nodes that were never shown to the user.

After this commit, CallNodeInfoInvertedImpl no longer inherits from
CallNodeInfoImpl - it is now a fully separate implementation.
The new structure gives us a nice guarantee about roots
of the inverted tree: There is an inverted root for every
func, and their indexes are identical. This makes it really
cheap to translate between the call node index and the func index
(no conversion or lookup is necessary) and also makes it cheap
to check if a node is a root.

This commit also replaces a few maps and sets with typed arrays
for performance. This is easier now that the root indexes are
all contiguous.
This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable,
because we'll now only be comparing integers. This speeds up
_createInvertedRootCallNodeTable by almost 2x.
mstange added a commit that referenced this pull request Aug 9, 2024
…erComparator (#5076)

In #4900 I'm going to add an alternative code path to compute the
samples selected states for the inverted view. I've broken this new test
out of that PR to reduce its scope a tiny bit.
Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitted my current comments! Hopefully they still make sense, but I think they do :-)

* Tree Left aligned Right aligned Reordered by suffix
* - [cn0] A = A = A [so0] [so0] [cn0] A
* - [cn1] B = A -> B = A -> B [so3] [so1] [cn4] A <- A
* - [cn2] A = A -> B -> A = A -> B -> A [so2] ↘↗ [so2] [cn2] A <- B <- A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A difference with the previous situation, is that previously, in the inverted call tree, we'd have both [A, B] and [A, B, A], despite that [B, A] doesn't exist in the uninverted call tree.
Now several inverted call nodes (in this case [A, B] and [A, B, A]) could map to the same non-inverted call node. I guess the non-inverted call node maps to its deeper version?

* cnX: Non-inverted call node index X
* soX: Suffix order index X
* inX: Inverted call node index X
* so:X..Y: Suffix order index range soX..soY (soY excluded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment above, I think I would have prefered to read it with soY included. But I understand that the code uses also this convention of excluding the last range boundary, so good for me.

Maybe we could make it better by just putting the legend above, right after the title Example, so that when reading the examples all abbreviations are defined already.

*/
export function bisectLowerBound(
array: number[] | $TypedArray,
f: (number) => number, // < 0 if arg is before needle, > 0 if after, === 0 if same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to change it since the code will be removed, but it would have been good to actually call these functions aCompare like mentioned in the comment (or at least some name that says clearly that this is a compare function)

const callPath = this.getCallNodePathFromIndex(callNodeIndex);
return bisectEqualRange(
this._suffixOrderedCallNodes,
(callNodeIndex: IndexIntoCallNodeTable) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comment around the compare function would be useful.
Esp give a few different examples with callnodes of different lengths, and explain why this gives the intended result.

Also it would probably be a good idea to call the 2 callNodeIndex differently: there's one as the parameter of the main function (the needleCallNodeIndex), and another one as the parameter of the compare function (the iteratingCallNodeIndex maybe?), that are not the same.

I don't understand it fully. Is it similar to _compareNonInvertedCallNodesInSuffixOrder below, except that we computed the needle's callPath once at the start for a faster access in the iteration, while it would be more costly than necessary for all the iterating callNodeIndexes?

* cnX: Non-inverted call node index X
* soX: Suffix order index X
* inX: Inverted call node index X
* so:X..Y: Suffix order index range soX..soY (soY excluded)
*/
export interface CallNodeInfoInverted extends CallNodeInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet convinced of the advantage of using an interface over moving all the comments near the class implementation, especially that it's implemented by just one class. I feel like this makes all these explanations a bit far from where they could be useful.

(optional comment)

childrenSortedByFunc: InvertedCallNodeHandle[],
func: IndexIntoFuncTable
): InvertedCallNodeHandle | null {
// TODO: Use bisection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left for a future reader?

/**
* For an inverted call node whose children haven't been created yet, this
* returns the "deep nodes" corresponding to its suffix ordered call nodes.
* A deep node is the k'th parent node of a non-inverted call node, where k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isn't this instead?

Suggested change
* A deep node is the k'th parent node of a non-inverted call node, where k
* A deep node is the k'th parent node of an inverted call node, where k

*/
_createChildren(
parentNodeHandle: InvertedCallNodeHandle,
specialChildFunc: IndexIntoFuncTable | null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to mention what this "special child" is in the comment too

* of the self node. The deep node's function determines which one of the
* children in the *inverted* tree the non-inverted node is assigned to.
*/
_createChildren(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to have an overview of how things work, let's say depth by depth. Especially the "dance" between the deep nodes information and the children and the parent and all that. It would be good to be explicit about the order of operations that makes everything work.

const currentFunc =
this._nonInvertedCallNodeTable.func[currentCallNodeIndex];
return currentFunc - expectedFunc;
unsortedCallNodesSelfNodeCol.push(selfNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear to me why this operation isn't done when there's no parent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants