From 925b7e9cc03d4dea58019882fa9fa36dcd98f666 Mon Sep 17 00:00:00 2001 From: Daco Harkes Date: Wed, 29 Oct 2025 10:30:15 -0300 Subject: [PATCH 1/6] [doc] Add performance documentation --- doc/performance.md | 97 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 doc/performance.md diff --git a/doc/performance.md b/doc/performance.md new file mode 100644 index 000000000..562830a88 --- /dev/null +++ b/doc/performance.md @@ -0,0 +1,97 @@ +# Performance with Native Code + +How to assess performance of Dart and native code, and how to improve it. + +## Profiling Performance + + +| Tool | Platform | Primary Use Case | Measures (Dart CPU) | Measures (Native CPU) | Measures (Dart Heap) | Measures (Native Heap) | +| ------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- | +| Dart DevTools | All | Profiles Dart VM, UI jank, Dart heap | Yes | Opaque "Native" block | Yes | Tracks "External" VM-aware memory only; Misses native-heap leaks | +| Xcode Instruments (Time Profiler) | iOS/macOS | Profiles native CPU call stacks | No | Yes (full symbolication) | No | No | +| Xcode Instruments (Leaks/Allocations) | iOS/macOS | Profiles native heap (malloc, mmap) | No | No | No | Yes | +| Android Studio Profiler (CPU) | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | +| Perfetto (heapprofd) | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | +| Linux perf | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No | + + + +### Dart DevTools + +For only assessing the performance of the Dart code, and treating native code as +a black box, use the Dart performance tooling. + +See the documentation on https://dart.dev/tools/dart-devtools and +https://docs.flutter.dev/perf. For FFI, most specifically, you can use +https://docs.flutter.dev/tools/devtools/cpu-profiler and +https://docs.flutter.dev/tools/devtools/performance#timeline-events-tab. +For synchronous FFI calls you can add synchronous timeline events, and for +asynchronous code (using async callbacks or helper isolates) you can use async +events. + +### `perf` On Linux + +To see both Dart and native symbols in a flame graph, you can use `perf` on +Linux. + +For JIT: + +``` +$ perf record -g out/DebugX64/dart-sdk/bin/dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart +``` + +For AOT, we currently don't have a [single command +yet](https://github.com/dart-lang/sdk/issues/54254). You need to use +`precompiler2` command from the Dart SDK: + +``` +$ pkg/vm/tool/precompiler2 --packages=.packages benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \ +perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols --profile-period=10000 benchmarks/FfiCall/dart/FfiCall.dart.bin +``` + +## Improving performance + +There are some typical patterns to improve performance: + +* To avoid dropped frames, move long-running FFI calls to a helper isolate. +* To avoid copying data where possible: + * Keep data in native memory, operating on [`Pointer`][]s and using + [`asTypedList`][] to convert the pointers into [`TypedData`][]. + * For short calls, if the memory is in Dart, avoid copying by using leaf calls + ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]) and [`address`]. (Leaf + calls prevent the Dart GC from running on all isolates, which allows giving + a pointer to native code of an object in Dart.) + * Use [`Isolate.exit`][] to send large data from a helper isolate to the main + isolate after a large computation. +* For many small calls, limit the overhead per call. This makes a significant + difference for calls shorter than 1 us (one millionth of a second), and can be + considered for calls of up to 10 us. + * Use leaf calls ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]). + * Prefer using [build hooks][] with [`Native`] `external` + functions over [`DynamicLibrary.lookupFunction`][] and + [`Pointer.asFunction`][]. + + For reference, this benchmark reports a 1000 FFI calls in AOT on Linux x64. + ``` + FfiCall.Uint8x01(RunTime): 234.61104068226345 us. + FfiCall.Uint8x01Leaf(RunTime): 71.9994712538334 us. + FfiCall.Uint8x01Native(RunTime): 216.07292770828917 us. + FfiCall.Uint8x01NativeLeaf(RunTime): 27.64136415181509 us. + ``` + A single call that is native-leaf takes 28 ns, while an `asFunction`-non-leaf + takes 235 ns. So for calls taking ~1000 ns that's a 20% speedup. + +[`address`]: https://api.dart.dev/dart-ffi/StructAddress/address.html +[`asTypedList`]: https://api.dart.dev/dart-ffi/Uint8Pointer/asTypedList.html +[`DynamicLibrary.lookupFunction`]: https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html +[`isLeaf` (2)]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html +[`isLeaf` (3)]:https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html +[`isLeaf`]: https://api.dart.dev/dart-ffi/Native/isLeaf.html +[`Isolate.exit`]: https://api.dart.dev/dart-isolate/Isolate/exit.html +[`Native`]: https://api.dart.dev/dart-ffi/Native-class.html +[`Pointer.asFunction`]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html +[`Pointer`]: https://api.dart.dev/dart-ffi/Pointer-class.html +[`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html +[build hooks]: https://dart.dev/tools/hooks + + From d254e758b8d792d72589781c233e5d99482876ae Mon Sep 17 00:00:00 2001 From: Daco Harkes Date: Thu, 30 Oct 2025 08:17:28 -0300 Subject: [PATCH 2/6] address comments --- doc/performance.md | 39 +++++++++++++++++++++++---------------- 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/doc/performance.md b/doc/performance.md index 562830a88..5fc4c5df6 100644 --- a/doc/performance.md +++ b/doc/performance.md @@ -4,15 +4,14 @@ How to assess performance of Dart and native code, and how to improve it. ## Profiling Performance - -| Tool | Platform | Primary Use Case | Measures (Dart CPU) | Measures (Native CPU) | Measures (Dart Heap) | Measures (Native Heap) | -| ------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- | -| Dart DevTools | All | Profiles Dart VM, UI jank, Dart heap | Yes | Opaque "Native" block | Yes | Tracks "External" VM-aware memory only; Misses native-heap leaks | -| Xcode Instruments (Time Profiler) | iOS/macOS | Profiles native CPU call stacks | No | Yes (full symbolication) | No | No | -| Xcode Instruments (Leaks/Allocations) | iOS/macOS | Profiles native heap (malloc, mmap) | No | No | No | Yes | -| Android Studio Profiler (CPU) | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | -| Perfetto (heapprofd) | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | -| Linux perf | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No | +| Tool | Platform | Primary Use Case | Measures (Dart CPU) | Measures (Native CPU) | Measures (Dart Heap) | Measures (Native Heap) | +| --------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- | +| [Dart DevTools] | All | Profiles Dart VM, UI jank, Dart heap | Yes | Opaque "Native" block | Yes | Tracks "External" VM-aware memory only; Misses native-heap leaks | +| [Xcode Instruments (Time Profiler)] | iOS/macOS | Profiles native CPU call stacks | No | Yes (full symbolication) | No | No | +| [Xcode Instruments (Leaks/Allocations)] | iOS/macOS | Profiles native heap (malloc, mmap) | No | No | No | Yes | +| [Android Studio Profiler (CPU)] | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | +| [Perfetto (heapprofd)] | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | +| [Linux perf] | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No | @@ -29,12 +28,12 @@ For synchronous FFI calls you can add synchronous timeline events, and for asynchronous code (using async callbacks or helper isolates) you can use async events. -### `perf` On Linux +### `perf` on Linux To see both Dart and native symbols in a flame graph, you can use `perf` on Linux. -For JIT: +To run the [FfiCall benchmark] in JIT mode with `perf`: ``` $ perf record -g out/DebugX64/dart-sdk/bin/dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart @@ -49,6 +48,9 @@ $ pkg/vm/tool/precompiler2 --packages=.packages benchmarks/FfiCall/dart/FfiCall. perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols --profile-period=10000 benchmarks/FfiCall/dart/FfiCall.dart.bin ``` +To analyze a performance issue in Flutter, it is best to reproduce the issue in +Dart standalone. + ## Improving performance There are some typical patterns to improve performance: @@ -68,10 +70,10 @@ There are some typical patterns to improve performance: considered for calls of up to 10 us. * Use leaf calls ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]). * Prefer using [build hooks][] with [`Native`] `external` - functions over [`DynamicLibrary.lookupFunction`][] and - [`Pointer.asFunction`][]. + functions over [`DynamicLibrary.lookupFunction`][] and + [`Pointer.asFunction`][]. - For reference, this benchmark reports a 1000 FFI calls in AOT on Linux x64. + For reference, the [FfiCall benchmark][] reports 1000 FFI calls in AOT on Linux x64: ``` FfiCall.Uint8x01(RunTime): 234.61104068226345 us. FfiCall.Uint8x01Leaf(RunTime): 71.9994712538334 us. @@ -92,6 +94,11 @@ There are some typical patterns to improve performance: [`Pointer.asFunction`]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html [`Pointer`]: https://api.dart.dev/dart-ffi/Pointer-class.html [`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html +[Android Studio Profiler (CPU)]: https://developer.android.com/studio/profile [build hooks]: https://dart.dev/tools/hooks - - +[Dart DevTools]: https://dart.dev/tools/dart-devtools +[FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart +[Linux perf]: https://perfwiki.github.io/main/ +[Perfetto (heapprofd)]: https://perfetto.dev/ +[Xcode Instruments (Leaks/Allocations)]: https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use +[Xcode Instruments (Time Profiler)]: https://developer.apple.com/tutorials/instruments From 2209199e638adb5891c96881471007f41771fa6a Mon Sep 17 00:00:00 2001 From: Daco Harkes Date: Thu, 30 Oct 2025 08:37:41 -0300 Subject: [PATCH 3/6] Add windows perf tools --- doc/performance.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/performance.md b/doc/performance.md index 5fc4c5df6..aba65ad54 100644 --- a/doc/performance.md +++ b/doc/performance.md @@ -12,6 +12,8 @@ How to assess performance of Dart and native code, and how to improve it. | [Android Studio Profiler (CPU)] | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | | [Perfetto (heapprofd)] | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | | [Linux perf] | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No | +| [Visual Studio CPU Usage Profiler] | Windows | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | +| [WPA (Heap Analysis)] | Windows | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | @@ -100,5 +102,7 @@ There are some typical patterns to improve performance: [FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart [Linux perf]: https://perfwiki.github.io/main/ [Perfetto (heapprofd)]: https://perfetto.dev/ +[Visual Studio CPU Usage Profiler]: https://learn.microsoft.com/en-us/visualstudio/profiling/cpu-usage +[WPA (Heap Analysis)]: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer [Xcode Instruments (Leaks/Allocations)]: https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use [Xcode Instruments (Time Profiler)]: https://developer.apple.com/tutorials/instruments From 7227c6d0835f8a520d4b915f2425fe6a1c27c4a8 Mon Sep 17 00:00:00 2001 From: Daco Harkes Date: Thu, 30 Oct 2025 08:39:28 -0300 Subject: [PATCH 4/6] add link --- doc/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/README.md b/doc/README.md index d74586045..527b4505f 100644 --- a/doc/README.md +++ b/doc/README.md @@ -3,6 +3,7 @@ ## Documentation applicable to all interop - [3 Layers in Dart Interop](interop-layers.md) +- [Performance analysis and improvements](performance.md) ## Guides From aac23dc0e0644e7418172488175eeee3777604b7 Mon Sep 17 00:00:00 2001 From: Daco Harkes Date: Thu, 30 Oct 2025 09:02:05 -0300 Subject: [PATCH 5/6] update the `perf` documentation --- doc/performance.md | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/doc/performance.md b/doc/performance.md index aba65ad54..01f718990 100644 --- a/doc/performance.md +++ b/doc/performance.md @@ -12,8 +12,8 @@ How to assess performance of Dart and native code, and how to improve it. | [Android Studio Profiler (CPU)] | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | | [Perfetto (heapprofd)] | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | | [Linux perf] | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No | -| [Visual Studio CPU Usage Profiler] | Windows | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | -| [WPA (Heap Analysis)] | Windows | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | +| [Visual Studio CPU Usage Profiler] | Windows | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | +| [WPA (Heap Analysis)] | Windows | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | @@ -38,16 +38,22 @@ Linux. To run the [FfiCall benchmark] in JIT mode with `perf`: ``` -$ perf record -g out/DebugX64/dart-sdk/bin/dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart +$ perf record -g dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart && \ +perf report --hierarchy ``` +Note that Flutter apps are deployed in AOT mode. So prefer profiling in AOT +mode. + For AOT, we currently don't have a [single command yet](https://github.com/dart-lang/sdk/issues/54254). You need to use -`precompiler2` command from the Dart SDK: +`precompiler2` command from the Dart SDK. See [building the Dart SDK] for how to +build the Dart SDK. ``` -$ pkg/vm/tool/precompiler2 --packages=.packages benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \ -perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols --profile-period=10000 benchmarks/FfiCall/dart/FfiCall.dart.bin +$ pkg/vm/tool/precompiler2 benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \ +perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart.bin && \ +perf report --hierarchy ``` To analyze a performance issue in Flutter, it is best to reproduce the issue in @@ -98,6 +104,7 @@ There are some typical patterns to improve performance: [`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html [Android Studio Profiler (CPU)]: https://developer.android.com/studio/profile [build hooks]: https://dart.dev/tools/hooks +[building the Dart SDK]: https://github.com/dart-lang/sdk/blob/main/docs/Building.md [Dart DevTools]: https://dart.dev/tools/dart-devtools [FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart [Linux perf]: https://perfwiki.github.io/main/ From e9f94d6292a37e2f44b164d010323fa472fbfb8a Mon Sep 17 00:00:00 2001 From: Daco Harkes Date: Thu, 30 Oct 2025 09:14:58 -0300 Subject: [PATCH 6/6] Add some community sources --- doc/performance.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/performance.md b/doc/performance.md index 01f718990..0cede093d 100644 --- a/doc/performance.md +++ b/doc/performance.md @@ -91,6 +91,13 @@ There are some typical patterns to improve performance: A single call that is native-leaf takes 28 ns, while an `asFunction`-non-leaf takes 235 ns. So for calls taking ~1000 ns that's a 20% speedup. +## Community sources + +* (Video) Using Dart FFI for Compute-Heavy Tasks: + https://www.youtube.com/watch?v=eJR5C0VRCjU +* (Video) Maximize Speed with Dart FFI: Beginner’s Guide to High-Performance + Integration https://www.youtube.com/watch?v=HF8gHAakb1Q + [`address`]: https://api.dart.dev/dart-ffi/StructAddress/address.html [`asTypedList`]: https://api.dart.dev/dart-ffi/Uint8Pointer/asTypedList.html [`DynamicLibrary.lookupFunction`]: https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html