diff --git a/doc/README.md b/doc/README.md index d74586045..527b4505f 100644 --- a/doc/README.md +++ b/doc/README.md @@ -3,6 +3,7 @@ ## Documentation applicable to all interop - [3 Layers in Dart Interop](interop-layers.md) +- [Performance analysis and improvements](performance.md) ## Guides diff --git a/doc/performance.md b/doc/performance.md new file mode 100644 index 000000000..0cede093d --- /dev/null +++ b/doc/performance.md @@ -0,0 +1,122 @@ +# Performance with Native Code + +How to assess performance of Dart and native code, and how to improve it. + +## Profiling Performance + +| Tool | Platform | Primary Use Case | Measures (Dart CPU) | Measures (Native CPU) | Measures (Dart Heap) | Measures (Native Heap) | +| --------------------------------------- | --------- | --------------------------------------- | ---------------------------- | ------------------------ | -------------------- | ---------------------------------------------------------------- | +| [Dart DevTools] | All | Profiles Dart VM, UI jank, Dart heap | Yes | Opaque "Native" block | Yes | Tracks "External" VM-aware memory only; Misses native-heap leaks | +| [Xcode Instruments (Time Profiler)] | iOS/macOS | Profiles native CPU call stacks | No | Yes (full symbolication) | No | No | +| [Xcode Instruments (Leaks/Allocations)] | iOS/macOS | Profiles native heap (malloc, mmap) | No | No | No | Yes | +| [Android Studio Profiler (CPU)] | Android | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | +| [Perfetto (heapprofd)] | Android | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | +| [Linux perf] | Linux | Unified Dart AOT + Native CPU profiling | Yes (requires special flags) | Yes | No | No | +| [Visual Studio CPU Usage Profiler] | Windows | Profiles native C/C++ CPU execution | No | Yes (traces C++ calls) | No | No | +| [WPA (Heap Analysis)] | Windows | Advanced native heap profiling | No | No | No | Yes (traces malloc/free call stacks) | + + + +### Dart DevTools + +For only assessing the performance of the Dart code, and treating native code as +a black box, use the Dart performance tooling. + +See the documentation on https://dart.dev/tools/dart-devtools and +https://docs.flutter.dev/perf. For FFI, most specifically, you can use +https://docs.flutter.dev/tools/devtools/cpu-profiler and +https://docs.flutter.dev/tools/devtools/performance#timeline-events-tab. +For synchronous FFI calls you can add synchronous timeline events, and for +asynchronous code (using async callbacks or helper isolates) you can use async +events. + +### `perf` on Linux + +To see both Dart and native symbols in a flame graph, you can use `perf` on +Linux. + +To run the [FfiCall benchmark] in JIT mode with `perf`: + +``` +$ perf record -g dart --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart && \ +perf report --hierarchy +``` + +Note that Flutter apps are deployed in AOT mode. So prefer profiling in AOT +mode. + +For AOT, we currently don't have a [single command +yet](https://github.com/dart-lang/sdk/issues/54254). You need to use +`precompiler2` command from the Dart SDK. See [building the Dart SDK] for how to +build the Dart SDK. + +``` +$ pkg/vm/tool/precompiler2 benchmarks/FfiCall/dart/FfiCall.dart benchmarks/FfiCall/dart/FfiCall.dart.bin && \ +perf record -g pkg/vm/tool/dart_precompiled_runtime2 --generate-perf-events-symbols benchmarks/FfiCall/dart/FfiCall.dart.bin && \ +perf report --hierarchy +``` + +To analyze a performance issue in Flutter, it is best to reproduce the issue in +Dart standalone. + +## Improving performance + +There are some typical patterns to improve performance: + +* To avoid dropped frames, move long-running FFI calls to a helper isolate. +* To avoid copying data where possible: + * Keep data in native memory, operating on [`Pointer`][]s and using + [`asTypedList`][] to convert the pointers into [`TypedData`][]. + * For short calls, if the memory is in Dart, avoid copying by using leaf calls + ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]) and [`address`]. (Leaf + calls prevent the Dart GC from running on all isolates, which allows giving + a pointer to native code of an object in Dart.) + * Use [`Isolate.exit`][] to send large data from a helper isolate to the main + isolate after a large computation. +* For many small calls, limit the overhead per call. This makes a significant + difference for calls shorter than 1 us (one millionth of a second), and can be + considered for calls of up to 10 us. + * Use leaf calls ([`isLeaf`][], [`isLeaf` (2)][], [`isLeaf` (3)][]). + * Prefer using [build hooks][] with [`Native`] `external` + functions over [`DynamicLibrary.lookupFunction`][] and + [`Pointer.asFunction`][]. + + For reference, the [FfiCall benchmark][] reports 1000 FFI calls in AOT on Linux x64: + ``` + FfiCall.Uint8x01(RunTime): 234.61104068226345 us. + FfiCall.Uint8x01Leaf(RunTime): 71.9994712538334 us. + FfiCall.Uint8x01Native(RunTime): 216.07292770828917 us. + FfiCall.Uint8x01NativeLeaf(RunTime): 27.64136415181509 us. + ``` + A single call that is native-leaf takes 28 ns, while an `asFunction`-non-leaf + takes 235 ns. So for calls taking ~1000 ns that's a 20% speedup. + +## Community sources + +* (Video) Using Dart FFI for Compute-Heavy Tasks: + https://www.youtube.com/watch?v=eJR5C0VRCjU +* (Video) Maximize Speed with Dart FFI: Beginner’s Guide to High-Performance + Integration https://www.youtube.com/watch?v=HF8gHAakb1Q + +[`address`]: https://api.dart.dev/dart-ffi/StructAddress/address.html +[`asTypedList`]: https://api.dart.dev/dart-ffi/Uint8Pointer/asTypedList.html +[`DynamicLibrary.lookupFunction`]: https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html +[`isLeaf` (2)]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html +[`isLeaf` (3)]:https://api.dart.dev/dart-ffi/DynamicLibraryExtension/lookupFunction.html +[`isLeaf`]: https://api.dart.dev/dart-ffi/Native/isLeaf.html +[`Isolate.exit`]: https://api.dart.dev/dart-isolate/Isolate/exit.html +[`Native`]: https://api.dart.dev/dart-ffi/Native-class.html +[`Pointer.asFunction`]: https://api.dart.dev/dart-ffi/NativeFunctionPointer/asFunction.html +[`Pointer`]: https://api.dart.dev/dart-ffi/Pointer-class.html +[`TypedData`]: https://api.dart.dev/dart-typed_data/TypedData-class.html +[Android Studio Profiler (CPU)]: https://developer.android.com/studio/profile +[build hooks]: https://dart.dev/tools/hooks +[building the Dart SDK]: https://github.com/dart-lang/sdk/blob/main/docs/Building.md +[Dart DevTools]: https://dart.dev/tools/dart-devtools +[FfiCall benchmark]: https://github.com/dart-lang/sdk/blob/main/benchmarks/FfiCall/dart/FfiCall.dart +[Linux perf]: https://perfwiki.github.io/main/ +[Perfetto (heapprofd)]: https://perfetto.dev/ +[Visual Studio CPU Usage Profiler]: https://learn.microsoft.com/en-us/visualstudio/profiling/cpu-usage +[WPA (Heap Analysis)]: https://learn.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer +[Xcode Instruments (Leaks/Allocations)]: https://developer.apple.com/documentation/xcode/gathering-information-about-memory-use +[Xcode Instruments (Time Profiler)]: https://developer.apple.com/tutorials/instruments