|  | 
|  | 1 | +# Performance with Native Code | 
|  | 2 | + | 
|  | 3 | +How to assess performance of Dart and native code, and how to improve it. | 
|  | 4 | + | 
|  | 5 | +## Profiling Performance | 
|  | 6 | + | 
|  | 7 | +| Tool                                    | Platform  | Primary Use Case                        | Measures (Dart CPU)          | Measures (Native CPU)    | Measures (Dart Heap) | Measures (Native Heap)                                           | | 
|  | 8 | +| --------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- | | 
|  | 9 | +| [Dart DevTools]                         | All       | Profiles Dart VM, UI jank, Dart heap    | Yes                          | Opaque "Native" block    | Yes                  | Tracks "External" VM-aware memory only; Misses native-heap leaks | | 
|  | 10 | +| [Xcode Instruments (Time Profiler)]     | iOS/macOS | Profiles native CPU call stacks         | No                           | Yes (full symbolication) | No                   | No                                                               | | 
|  | 11 | +| [Xcode Instruments (Leaks/Allocations)] | iOS/macOS | Profiles native heap (malloc, mmap)     | No                           | No                       | No                   | Yes                                                              | | 
|  | 12 | +| [Android Studio Profiler (CPU)]         | Android   | Profiles native C/C++ CPU execution     | No                           | Yes (traces C++ calls)   | No                   | No                                                               | | 
|  | 13 | +| [Perfetto (heapprofd)]                  | Android   | Advanced native heap profiling          | No                           | No                       | No                   | Yes (traces malloc/free call stacks)                             | | 
|  | 14 | +| [Linux perf]                            | Linux     | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes                      | No                   | No                                                               | | 
|  | 15 | +| [Visual Studio CPU Usage Profiler]      | Windows   | Profiles native C/C++ CPU execution     | No                           | Yes (traces C++ calls)   | No                   | No                                                               | | 
|  | 16 | +| [WPA (Heap Analysis)]                   | Windows   | Advanced native heap profiling          | No                           | No                       | No                   | Yes (traces malloc/free call stacks)                             | | 
|  | 17 | + | 
|  | 18 | +<!-- TODO: Add documentation for the other tools. --> | 
|  | 19 | + | 
|  | 20 | +### Dart DevTools | 
|  | 21 | + | 
|  | 22 | +For only assessing the performance of the Dart code, and treating native code as | 
|  | 23 | +a black box, use the Dart performance tooling. | 
|  | 24 | + | 
|  | 25 | +See the documentation on https://dart.dev/tools/dart-devtools and | 
|  | 26 | +https://docs.flutter.dev/perf. For FFI, most specifically, you can use | 
|  | 27 | +https://docs.flutter.dev/tools/devtools/cpu-profiler and | 
|  | 28 | +https://docs.flutter.dev/tools/devtools/performance#timeline-events-tab. | 
|  | 29 | +For synchronous FFI calls you can add synchronous timeline events, and for | 
|  | 30 | +asynchronous code (using async callbacks or helper isolates) you can use async | 
|  | 31 | +events. | 
|  | 32 | + | 
|  | 33 | +### `perf` on Linux | 
|  | 34 | + | 
|  | 35 | +To see both Dart and native symbols in a flame graph, you can use `perf` on | 
|  | 36 | +Linux. | 
|  | 37 | + | 
|  | 38 | +To run the [FfiCall benchmark] in JIT mode with `perf`:   | 
|  | 39 | + | 
|  | 40 | +``` | 
|  | 41 | +$ perf record -g dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart && \ | 
|  | 42 | +perf report --hierarchy | 
|  | 43 | +``` | 
|  | 44 | + | 
|  | 45 | +Note that Flutter apps are deployed in AOT mode. So prefer profiling in AOT | 
|  | 46 | +mode. | 
|  | 47 | + | 
|  | 48 | +For AOT, we currently don't have a [single command | 
|  | 49 | +yet](https://github.com/dart-lang/sdk/issues/54254). You need to use | 
|  | 50 | +`precompiler2` command from the Dart SDK. See [building the Dart SDK] for how to | 
|  | 51 | +build the Dart SDK. | 
|  | 52 | + | 
|  | 53 | +``` | 
|  | 54 | +$ pkg/vm/tool/precompiler2 benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \ | 
|  | 55 | +perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart.bin && \ | 
|  | 56 | +perf report --hierarchy | 
|  | 57 | +``` | 
|  | 58 | + | 
|  | 59 | +To analyze a performance issue in Flutter, it is best to reproduce the issue in | 
|  | 60 | +Dart standalone. | 
|  | 61 | + | 
|  | 62 | +## Improving performance | 
|  | 63 | + | 
|  | 64 | +There are some typical patterns to improve performance: | 
|  | 65 | + | 
|  | 66 | +* To avoid dropped frames, move long-running FFI calls to a helper isolate. | 
|  | 67 | +* To avoid copying data where possible: | 
|  | 68 | +  * Keep data in native memory, operating on [`Pointer`][]s and using | 
|  | 69 | +    [`asTypedList`][] to convert the pointers into [`TypedData`][]. | 
|  | 70 | +  * For short calls, if the memory is in Dart, avoid copying by using leaf calls | 
|  | 71 | +    ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]) and [`address`]. (Leaf | 
|  | 72 | +    calls prevent the Dart GC from running on all isolates, which allows giving | 
|  | 73 | +    a pointer to native code of an object in Dart.) | 
|  | 74 | +  * Use [`Isolate.exit`][] to send large data from a helper isolate to the main | 
|  | 75 | +    isolate after a large computation. | 
|  | 76 | +* For many small calls, limit the overhead per call. This makes a significant | 
|  | 77 | +  difference for calls shorter than 1 us (one millionth of a second), and can be | 
|  | 78 | +  considered for calls of up to 10 us. | 
|  | 79 | +  * Use leaf calls ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]). | 
|  | 80 | +  * Prefer using [build hooks][] with [`Native`] `external` | 
|  | 81 | +    functions over [`DynamicLibrary.lookupFunction`][] and | 
|  | 82 | +    [`Pointer.asFunction`][]. | 
|  | 83 | +   | 
|  | 84 | +  For reference, the [FfiCall benchmark][] reports 1000 FFI calls in AOT on Linux x64: | 
|  | 85 | +  ``` | 
|  | 86 | +  FfiCall.Uint8x01(RunTime): 234.61104068226345 us. | 
|  | 87 | +  FfiCall.Uint8x01Leaf(RunTime): 71.9994712538334 us. | 
|  | 88 | +  FfiCall.Uint8x01Native(RunTime): 216.07292770828917 us. | 
|  | 89 | +  FfiCall.Uint8x01NativeLeaf(RunTime): 27.64136415181509 us. | 
|  | 90 | +  ``` | 
|  | 91 | +  A single call that is native-leaf takes 28 ns, while an `asFunction`-non-leaf | 
|  | 92 | +  takes 235 ns. So for calls taking ~1000 ns that's a 20% speedup. | 
|  | 93 | + | 
|  | 94 | +## Community sources | 
|  | 95 | + | 
|  | 96 | +* (Video) Using Dart FFI for Compute-Heavy Tasks: | 
|  | 97 | +  https://www.youtube.com/watch?v=eJR5C0VRCjU | 
|  | 98 | +* (Video) Maximize Speed with Dart FFI: Beginner’s Guide to High-Performance | 
|  | 99 | +  Integration https://www.youtube.com/watch?v=HF8gHAakb1Q | 
|  | 100 | + | 
|  | 101 | +[`address`]: https://api.dart.dev/dart-ffi/StructAddress/address.html | 
|  | 102 | +[`asTypedList`]: https://api.dart.dev/dart-ffi/Uint8Pointer/asTypedList.html | 
|  | 103 | +[`DynamicLibrary.lookupFunction`]: https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html | 
|  | 104 | +[`isLeaf` (2)]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html | 
|  | 105 | +[`isLeaf` (3)]:https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html | 
|  | 106 | +[`isLeaf`]: https://api.dart.dev/dart-ffi/Native/isLeaf.html | 
|  | 107 | +[`Isolate.exit`]: https://api.dart.dev/dart-isolate/Isolate/exit.html | 
|  | 108 | +[`Native`]: https://api.dart.dev/dart-ffi/Native-class.html | 
|  | 109 | +[`Pointer.asFunction`]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html | 
|  | 110 | +[`Pointer`]: https://api.dart.dev/dart-ffi/Pointer-class.html | 
|  | 111 | +[`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html | 
|  | 112 | +[Android Studio Profiler (CPU)]: https://developer.android.com/studio/profile | 
|  | 113 | +[build hooks]: https://dart.dev/tools/hooks | 
|  | 114 | +[building the Dart SDK]: https://github.com/dart-lang/sdk/blob/main/docs/Building.md | 
|  | 115 | +[Dart DevTools]: https://dart.dev/tools/dart-devtools | 
|  | 116 | +[FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart | 
|  | 117 | +[Linux perf]: https://perfwiki.github.io/main/ | 
|  | 118 | +[Perfetto (heapprofd)]: https://perfetto.dev/ | 
|  | 119 | +[Visual Studio CPU Usage Profiler]: https://learn.microsoft.com/en-us/visualstudio/profiling/cpu-usage | 
|  | 120 | +[WPA (Heap Analysis)]: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer | 
|  | 121 | +[Xcode Instruments (Leaks/Allocations)]: https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use | 
|  | 122 | +[Xcode Instruments (Time Profiler)]: https://developer.apple.com/tutorials/instruments | 
0 commit comments