Skip to content

Commit 43c5970

Browse files
authored
[doc] Add performance documentation (#2739)
1 parent bb1e08a commit 43c5970

File tree

2 files changed

+123
-0
lines changed

2 files changed

+123
-0
lines changed

doc/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
## Documentation applicable to all interop
44

55
- [3 Layers in Dart Interop](interop-layers.md)
6+
- [Performance analysis and improvements](performance.md)
67

78
## Guides
89

doc/performance.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Performance with Native Code
2+
3+
How to assess performance of Dart and native code, and how to improve it.
4+
5+
## Profiling Performance
6+
7+
| Tool | Platform | Primary Use Case | Measures (Dart CPU) | Measures (Native CPU) | Measures (Dart Heap) | Measures (Native Heap) |
8+
| --------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- |
9+
| [Dart DevTools] | All | Profiles Dart VM, UI jank, Dart heap | Yes | Opaque "Native" block | Yes | Tracks "External" VM-aware memory only; Misses native-heap leaks |
10+
| [Xcode Instruments (Time Profiler)] | iOS/macOS | Profiles native CPU call stacks | No | Yes (full symbolication) | No | No |
11+
| [Xcode Instruments (Leaks/Allocations)] | iOS/macOS | Profiles native heap (malloc, mmap) | No | No | No | Yes |
12+
| [Android Studio Profiler (CPU)] | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No |
13+
| [Perfetto (heapprofd)] | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) |
14+
| [Linux perf] | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No |
15+
| [Visual Studio CPU Usage Profiler] | Windows | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No |
16+
| [WPA (Heap Analysis)] | Windows | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) |
17+
18+
<!-- TODO: Add documentation for the other tools. -->
19+
20+
### Dart DevTools
21+
22+
For only assessing the performance of the Dart code, and treating native code as
23+
a black box, use the Dart performance tooling.
24+
25+
See the documentation on https://dart.dev/tools/dart-devtools and
26+
https://docs.flutter.dev/perf. For FFI, most specifically, you can use
27+
https://docs.flutter.dev/tools/devtools/cpu-profiler and
28+
https://docs.flutter.dev/tools/devtools/performance#timeline-events-tab.
29+
For synchronous FFI calls you can add synchronous timeline events, and for
30+
asynchronous code (using async callbacks or helper isolates) you can use async
31+
events.
32+
33+
### `perf` on Linux
34+
35+
To see both Dart and native symbols in a flame graph, you can use `perf` on
36+
Linux.
37+
38+
To run the [FfiCall benchmark] in JIT mode with `perf`:
39+
40+
```
41+
$ perf record -g dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart && \
42+
perf report --hierarchy
43+
```
44+
45+
Note that Flutter apps are deployed in AOT mode. So prefer profiling in AOT
46+
mode.
47+
48+
For AOT, we currently don't have a [single command
49+
yet](https://github.com/dart-lang/sdk/issues/54254). You need to use
50+
`precompiler2` command from the Dart SDK. See [building the Dart SDK] for how to
51+
build the Dart SDK.
52+
53+
```
54+
$ pkg/vm/tool/precompiler2 benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \
55+
perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart.bin && \
56+
perf report --hierarchy
57+
```
58+
59+
To analyze a performance issue in Flutter, it is best to reproduce the issue in
60+
Dart standalone.
61+
62+
## Improving performance
63+
64+
There are some typical patterns to improve performance:
65+
66+
* To avoid dropped frames, move long-running FFI calls to a helper isolate.
67+
* To avoid copying data where possible:
68+
* Keep data in native memory, operating on [`Pointer`][]s and using
69+
[`asTypedList`][] to convert the pointers into [`TypedData`][].
70+
* For short calls, if the memory is in Dart, avoid copying by using leaf calls
71+
([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]) and [`address`]. (Leaf
72+
calls prevent the Dart GC from running on all isolates, which allows giving
73+
a pointer to native code of an object in Dart.)
74+
* Use [`Isolate.exit`][] to send large data from a helper isolate to the main
75+
isolate after a large computation.
76+
* For many small calls, limit the overhead per call. This makes a significant
77+
difference for calls shorter than 1 us (one millionth of a second), and can be
78+
considered for calls of up to 10 us.
79+
* Use leaf calls ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]).
80+
* Prefer using [build hooks][] with [`Native`] `external`
81+
functions over [`DynamicLibrary.lookupFunction`][] and
82+
[`Pointer.asFunction`][].
83+
84+
For reference, the [FfiCall benchmark][] reports 1000 FFI calls in AOT on Linux x64:
85+
```
86+
FfiCall.Uint8x01(RunTime): 234.61104068226345 us.
87+
FfiCall.Uint8x01Leaf(RunTime): 71.9994712538334 us.
88+
FfiCall.Uint8x01Native(RunTime): 216.07292770828917 us.
89+
FfiCall.Uint8x01NativeLeaf(RunTime): 27.64136415181509 us.
90+
```
91+
A single call that is native-leaf takes 28 ns, while an `asFunction`-non-leaf
92+
takes 235 ns. So for calls taking ~1000 ns that's a 20% speedup.
93+
94+
## Community sources
95+
96+
* (Video) Using Dart FFI for Compute-Heavy Tasks:
97+
https://www.youtube.com/watch?v=eJR5C0VRCjU
98+
* (Video) Maximize Speed with Dart FFI: Beginner’s Guide to High-Performance
99+
Integration https://www.youtube.com/watch?v=HF8gHAakb1Q
100+
101+
[`address`]: https://api.dart.dev/dart-ffi/StructAddress/address.html
102+
[`asTypedList`]: https://api.dart.dev/dart-ffi/Uint8Pointer/asTypedList.html
103+
[`DynamicLibrary.lookupFunction`]: https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html
104+
[`isLeaf` (2)]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html
105+
[`isLeaf` (3)]:https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html
106+
[`isLeaf`]: https://api.dart.dev/dart-ffi/Native/isLeaf.html
107+
[`Isolate.exit`]: https://api.dart.dev/dart-isolate/Isolate/exit.html
108+
[`Native`]: https://api.dart.dev/dart-ffi/Native-class.html
109+
[`Pointer.asFunction`]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html
110+
[`Pointer`]: https://api.dart.dev/dart-ffi/Pointer-class.html
111+
[`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html
112+
[Android Studio Profiler (CPU)]: https://developer.android.com/studio/profile
113+
[build hooks]: https://dart.dev/tools/hooks
114+
[building the Dart SDK]: https://github.com/dart-lang/sdk/blob/main/docs/Building.md
115+
[Dart DevTools]: https://dart.dev/tools/dart-devtools
116+
[FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart
117+
[Linux perf]: https://perfwiki.github.io/main/
118+
[Perfetto (heapprofd)]: https://perfetto.dev/
119+
[Visual Studio CPU Usage Profiler]: https://learn.microsoft.com/en-us/visualstudio/profiling/cpu-usage
120+
[WPA (Heap Analysis)]: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer
121+
[Xcode Instruments (Leaks/Allocations)]: https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use
122+
[Xcode Instruments (Time Profiler)]: https://developer.apple.com/tutorials/instruments

0 commit comments

Comments
 (0)