[deploy preview] Fast inverted tree, take 2 #4806

mstange · 2023-11-21T15:35:50Z

┆Issue is synchronized with this Jira Task

This speeds up CallTree.getChildren.

The old code was saving time by sorting siblings only for the nodes that were displayed based on the current (preview) range selection. This made it cheaper to compute the flame graph for a small range, but it meant that it had to re-do the sort every time the selection changed. Now we do the sorting once, based on the entire call node table. This is expensive but it is a one-time cost. Then we cache the "ordered rows" and don't have to sort again on each range selection change.

We still build an inverted call node table. But the thread remains untouched. This means that the stackTable and the callNodeTable are upside down with respect to each other when the call tree is inverted. This can be a bit confusing. Some code needs to match non-inverted stacks to inverted call nodes. For these purposes, a StackToInvertedCallNodeMatcher class is introduced. Some address timings / line timings code is now unnecessary, specifically the code that was computing global information (and not information about a single call node) with special treatment for the inverted case. The unnecessary inverted case implementation was removed.

Compute the inverted selected call node in the action creator, not in the reducer. This means we don't have to pass the call tree as far down. Also move the computation into the CallTree class. And fix it to stop at nodes with heaviest self time, even if they're not leaf nodes.

This puts the conversion functions between call node indexes and call node paths onto the new CallNodeInfo interface. It also provides accessor methods for the traditional members. This will allow us to have two implementations in the future: One for the regular call tree, and one for the inverted call tree.

The flame graph is only drawn when we're not inverted, so getNonInvertedCallNodeTable always returns the same as getCallNodeTable.

…ode.

…ode table.

This also removes an unused argument from some methods.

We are iterating over the filtered samples, which were already constrained to the zoomed range. Checking the time again might be useful to handle error cases where samples not sorted by time and the range filtering didn't work properly, but I don't think that's useful either; _accumulateInBuffer will make sure to never write outside of its bounds.

We were checking for transparent categories in order to optimize out the work for samples that were idle. However, this check was quite expensive in itself. These days, we have CPU usage information almost always, and checking for zero CPU is faster, so let's do that instead.

…e pixel.

This doesn't seem to have made anything faster though.

This is quite effective in Firefox, unfortunately. It also means we can save repeated lookups of the implementation and categories in the "thisNodeIndex === needleNodeIndex" case.

This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable, because we'll now only be comparing integers. This speeds up _createInvertedRootCallNodeTable by almost 2x.

This speeds up _accumulateSampleCategories by 28%! It avoids repeated conversions between floats and integers. We use 1.15.16 (1 sign bit, 15 bits left of the decimal point, 16 bits right of the decimal point) for the i32 values.

This doesn't improve performance but simplifies the code a little bit.

Using firstChild / nextSibling / currentLastChild instead is 3.5x faster!

This change massively speeds up symbolication.

mstange · 2024-08-08T14:50:39Z

Superseded by #4900.

mstange added 30 commits November 4, 2023 18:04

Only look up windowID when it's used.

e33b09a

Add firstChild and nextSibling columns to the call node table.

fbebfcf

This speeds up CallTree.getChildren.

Early-exit for empty call node tables.

2d1ac50

Make CallNodeTable sorted in depth-first traversal order.

97b24fe

Replace firstChild with nextAfterDescendants.

bfb941f

Take advantage of sorted call nodes in getSamplesSelectedStates.

d7e1b66

Take advantage of sorted call nodes in getTimingsForCallNodeIndex.

bfcd29e

Take advantage of nextSibling in getCallNodeIndexFromParentAndFunc.

13966c9

Take advantage of sorted call node table in getTreeOrderComparator.

d6e9224

Rename invertCallStack to computeThreadWithInvertedStackTable.

5350b52

Use a call node path to make the stack copy string.

28fabde

Expose non-inverted call node table on inverted CallNodeInfo.

72d4643

Use getNonInvertedCallNodeTable in flame graph code.

c060cbd

The flame graph is only drawn when we're not inverted, so getNonInvertedCallNodeTable always returns the same as getCallNodeTable.

Simplify heightFunction.

a39d73f

Use non-inverted call node table for ThreadStackGraph.

07f1bab

Stop using sampleCallNodes in ThreadSampleGraph.

a1eb782

Use getNonInvertedCallNodeTable() in flame graph code and some test c…

d70a3d2

…ode.

Use non-inverted call node table for sample tree order.

e4c060f

Implement StackToInvertedCallNodeMatcher with the non-inverted call n…

f052f15

…ode table.

Rename CallTreeCountsAndSummary to CallTreeTimings.

0bd3830

Inverted call tree

093c67a

WIP stack chart

ea45899

Pass CallNodeDisplayData instead of CallTree to TooltipCallNode.

68802fd

This also removes an unused argument from some methods.

More WIP stack chart

e9a33d6

WIP stack chart (user timings)

0487fcf

Even more WIP stack chart.

7f28737

mstange added 28 commits November 25, 2023 09:36

Handle sampleTime adjustment at the beginning of the loop.

52d9410

Store the CPU delta for the next iteration and combine some null checks.

4dd1048

Compute the cpuRatio in the caller.

49cde7e

Compute pixels in the caller.

d6dc677

Add a fast path for the case where we're only contributing to a singl…

4385d0a

…e pixel.

Add selectors for sample categories and sample CPU percentages.

731cbb0

Use pre-computed sampleCategories and sampleCPUPercentages.

be66f0b

Precompute sample positions.

3928d60

Only get this.mutablePercentageBuffers once.

f6f3f15

Manually inline _accumulateInBuffer.

9ad4999

Look up percentageBuffer with one instead of two array accesses.

bbf0703

This doesn't seem to have made anything faster though.

Store sampleCount.

f8d927a

Only get samples.length once.

7726004

Work around https://bugzilla.mozilla.org/show_bug.cgi?id=1858310 .

4bd1ffc

Optimize accumulateDataToTimings.

7821f76

Inline-away accumulateDataToTimings.

e091f2f

This is quite effective in Firefox, unfortunately. It also means we can save repeated lookups of the implementation and categories in the "thisNodeIndex === needleNodeIndex" case.

Make sourceFramesInlinedIntoSymbol an Int32Array.

d46c8e1

This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable, because we'll now only be comparing integers. This speeds up _createInvertedRootCallNodeTable by almost 2x.

This was supposed to be Math.abs.

289e8a7

Work around circular dependency.

2875bb6

Reduce assignments to percentageBuffer.

60827b2

Always access the same buffer.

c395ae8

Call fill

86d48cd

Use fixed point math.

89be8ba

This speeds up _accumulateSampleCategories by 28%! It avoids repeated conversions between floats and integers. We use 1.15.16 (1 sign bit, 15 bits left of the decimal point, 16 bits right of the decimal point) for the i32 values.

Inline-away addCallNode.

b15a83c

This doesn't improve performance but simplifies the code a little bit.

Don't use a Map in getUninvertedCallNodeInfoComponents.

e602910

Using firstChild / nextSibling / currentLastChild instead is 3.5x faster!

Stop using a Map for inline stack substitution.

ae26ae6

This change massively speeds up symbolication.

mstange closed this Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[deploy preview] Fast inverted tree, take 2 #4806

[deploy preview] Fast inverted tree, take 2 #4806

Uh oh!

mstange commented Nov 21, 2023 •

edited by data-sync-user

Loading

Uh oh!

mstange commented Aug 8, 2024

Uh oh!

Uh oh!

[deploy preview] Fast inverted tree, take 2 #4806

[deploy preview] Fast inverted tree, take 2 #4806

Uh oh!

Conversation

mstange commented Nov 21, 2023 • edited by data-sync-user Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mstange commented Aug 8, 2024

Uh oh!

Uh oh!

mstange commented Nov 21, 2023 •

edited by data-sync-user

Loading