Add some basic benchmarks for mcap and websocketserver #277

eloff · 2025-03-11T19:26:37Z

This adds 3 benchmark scenarios each for mcap writing and websocketserver.

One for json messages, one for large protobuf (SceneUpdate) and one for multiple threads. Each scenario has 4+ parameterized benchmarks to see how it scales along the dimension of interest.

Multi-threaded performance follows a U-shape with performance best in the middle. Adding threads helps, up to a point where contention starts to dominate the runtime.

We can use this to get a baseline when profiling and optimizing performance.

Here's the results on my machine:

> cargo bench -p foxglove

     Running benches/mcap_bench.rs (target/release/deps/mcap_bench-8552836f6882eaf7)
Timer precision: 10 ns
mcap_bench            fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ log_json_message                 │               │               │               │         │
│  ├─ 50              173.5 µs      │ 3.171 ms      │ 197.8 µs      │ 239.9 µs      │ 100     │ 100
│  ├─ 100             183.6 µs      │ 2.472 ms      │ 204.4 µs      │ 243 µs        │ 100     │ 100
│  ├─ 200             178.4 µs      │ 1.944 ms      │ 196.8 µs      │ 244.5 µs      │ 100     │ 100
│  ╰─ 400             189.6 µs      │ 2.177 ms      │ 209 µs        │ 293.4 µs      │ 100     │ 100
├─ log_scene_update                 │               │               │               │         │
│  ├─ 1               187.8 µs      │ 2.696 ms      │ 209.3 µs      │ 258 µs        │ 100     │ 100
│  ├─ 2               200.7 µs      │ 2.057 ms      │ 244.2 µs      │ 321 µs        │ 100     │ 100
│  ├─ 4               248.6 µs      │ 2.734 ms      │ 284.8 µs      │ 428.8 µs      │ 100     │ 100
│  ╰─ 8               330.6 µs      │ 2.018 ms      │ 374.2 µs      │ 536.2 µs      │ 100     │ 100
╰─ mutlithreaded_log                │               │               │               │         │
   ├─ 1               8.342 ms      │ 34.25 ms      │ 17.8 ms       │ 18.58 ms      │ 100     │ 100
   ├─ 2               4.804 ms      │ 27.94 ms      │ 14.74 ms      │ 14.96 ms      │ 100     │ 100
   ├─ 4               4.777 ms      │ 11.83 ms      │ 6.694 ms      │ 6.653 ms      │ 100     │ 100
   ├─ 8               5.128 ms      │ 32.91 ms      │ 15.99 ms      │ 15.75 ms      │ 100     │ 100
   ├─ 16              8.569 ms      │ 32.62 ms      │ 29.33 ms      │ 28 ms         │ 100     │ 100
   ╰─ 32              15.31 ms      │ 34.91 ms      │ 32.27 ms      │ 31.58 ms      │ 100     │ 100

     Running benches/websocketserver_bench.rs (target/release/deps/websocketserver_bench-aadf4b8711f23b7c)
Timer precision: 46 ns
websocketserver_bench       fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ roundtrip_json_message                 │               │               │               │         │
│  ├─ 1                     1.478 ms      │ 42.46 ms      │ 41.36 ms      │ 41 ms         │ 100     │ 100
│  ├─ 2                     40.56 ms      │ 42.86 ms      │ 41.72 ms      │ 41.78 ms      │ 100     │ 100
│  ├─ 4                     40.89 ms      │ 43.99 ms      │ 42.22 ms      │ 42.28 ms      │ 100     │ 100
│  ╰─ 8                     41.86 ms      │ 44.72 ms      │ 43.23 ms      │ 43.35 ms      │ 100     │ 100
├─ roundtrip_mutlithreaded                │               │               │               │         │
│  ├─ 1                     12.52 ms      │ 65.63 ms      │ 37.81 ms      │ 41.03 ms      │ 100     │ 100
│  ├─ 2                     10.31 ms      │ 21.69 ms      │ 16.12 ms      │ 16.01 ms      │ 100     │ 100
│  ├─ 4                     5.695 ms      │ 22.45 ms      │ 12.38 ms      │ 13.62 ms      │ 100     │ 100
│  ├─ 8                     6.038 ms      │ 21.16 ms      │ 12.14 ms      │ 13.29 ms      │ 100     │ 100
│  ├─ 16                    4.911 ms      │ 23.31 ms      │ 16.28 ms      │ 14.7 ms       │ 100     │ 100
│  ╰─ 32                    13.49 ms      │ 26.26 ms      │ 20.29 ms      │ 20 ms         │ 100     │ 100
╰─ roundtrip_scene_update                 │               │               │               │         │
   ├─ 1                     1.399 ms      │ 43.29 ms      │ 41.39 ms      │ 41.28 ms      │ 100     │ 100
   ├─ 2                     41.23 ms      │ 44.5 ms       │ 42.26 ms      │ 42.27 ms      │ 100     │ 100
   ├─ 4                     41.51 ms      │ 44.68 ms      │ 42.75 ms      │ 42.92 ms      │ 100     │ 100
   ╰─ 8                     41.73 ms      │ 45.51 ms      │ 43.97 ms      │ 44.08 ms      │ 100     │ 100

linear · 2025-03-11T19:26:40Z

FG-10512 [rust] Performance benchmark tests

Copilot

Pull Request Overview

This pull request adds a suite of benchmarks to measure performance for both MCAP writing and WebSocket server roundtrips, along with updating Cargo.toml to include benchmark definitions and a dependency on divan.

Added benchmark scenarios for MCAP writing (json messages, large protobuf, multiple threads)
Added benchmark scenarios for WebSocket server roundtrip tests (json messages, scene update, multi-threaded)
Exposed a new channel method (“id”) for retrieving a channel’s unique identifier

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

File	Description
rust/foxglove/benches/mcap_bench.rs	Introduces new MCAP benchmarks with parameterized scenarios.
rust/foxglove/benches/websocketserver_bench.rs	Adds WebSocket server benchmarks for roundtrip messaging and threading.
rust/foxglove/Cargo.toml	Updates to include bench definitions and the divan dependency.
rust/foxglove/src/encode.rs	Adds a new method to retrieve the channel ID to support the new benchmarks.

Comments suppressed due to low confidence (2)

rust/foxglove/benches/mcap_bench.rs:107

The function name 'mutlithreaded_log' appears to be misspelled; consider renaming it to 'multithreaded_log' for clarity.

fn mutlithreaded_log(bencher: divan::Bencher, num_threads: usize) {

rust/foxglove/benches/websocketserver_bench.rs:201

The function name 'roundtrip_mutlithreaded' seems to have a typo; consider renaming it to 'roundtrip_multithreaded' for consistency.

fn roundtrip_mutlithreaded(bencher: divan::Bencher, num_threads: usize) {

bryfox

Looks good to me in general.

Locally, between runs, I get wildly different slowest values for the multithreaded log case, even with a single thread. Any idea why?

bryfox · 2025-03-12T14:41:39Z

rust/foxglove/benches/websocketserver_bench.rs

+                    _ = ws_client.next().await.expect("No serverInfo sent");
+
+                    // FG-10395 replace this with something more precise
+                    tokio::time::sleep(std::time::Duration::from_millis(50)).await;


Not a big deal, but why do we need to sleep before adding each client?

Good question, I think this can be moved outside the loop

bryfox · 2025-03-12T14:46:05Z

rust/foxglove/src/encode.rs

@@ -80,6 +80,11 @@ impl<T: Encode> TypedChannel<T> {
    pub fn topic(&self) -> &str {
        &self.inner.topic
    }
+
+    /// Returns the channel ID.
+    pub fn id(&self) -> ChannelId {


If this is only needed for clients, do we want it as part of the public API? Or can this be behind some feature?

We can hide it if we don't want to expose it, I'll do that for now until we have a reason to expose it

gasmith

The end-to-end measurements are nice to have, but I think we might want to target smaller chunks of functionality.

For example, when we're benchmarking the websocket server, I'm not really interested in different kinds of serialization. I want to see how many fixed-sized binary messages we can push per second, for various message sizes.

I'd like to set up 0..N "null" sinks that just drop messages, and measure how much latency we have on the Channel.log() path. It might also be nice to measure allocations on that path. Maybe we also measure allocations for the protobuf and json serializers, for certain schema types.

For particular high-throughput schema types (e.g., compressed video, point cloud), it might be nice to measure how long it takes to shove data into the protobuf struct, and then separately measure how long it takes to serialize.

gasmith · 2025-03-12T19:22:35Z

rust/foxglove/benches/mcap_bench.rs

+    let tmpdir = tempfile::tempdir().unwrap();
+    let path = tmpdir.path().join("test.mcap");
+    let _writer = McapWriter::new()
+        .create_new_buffered_file(&path)
+        .expect("Failed to start mcap writer");


Maybe /dev/null, so we're not measuring filesystem latency?

Oh, nice idea!

gasmith · 2025-03-12T19:27:57Z

rust/foxglove/benches/mcap_bench.rs

+    MSG_CHANNEL.log(&message);
+
+    bencher.bench(|| {
+        for _ in 0..100 {


When I first saw the numbers I was a bit dismayed. Now it makes more sense.

Why 100 iterations inside the loop? To amortize the cost of compressing/flushing chunks to the underlying file?

I'm wondering whether we want to measure these costs as part of the SDK benchmark suite, or whether they would belong better in the mcap library itself. There's a wide range of configuration options for the mcap writer, which will have various impacts on its efficiency.

It's worthwhile doing both

gasmith · 2025-03-12T19:28:39Z

rust/foxglove/benches/mcap_bench.rs

+        let mut entities = Vec::with_capacity(num_entities);
+        for i in 0..num_entities {
+            entities.push(SceneEntity {


Do we want to move these allocations out of the bench() closure?

Nope, this is the usage of the log function for scene update, so we want to measure it

If that's the case, let's measure it separately? Otherwise I don't know what part of the measurement is data-prep, vs serialization, vs mcap writing.

gasmith · 2025-03-12T19:31:48Z

rust/foxglove/benches/mcap_bench.rs

+#[divan::bench(args = [1, 2, 4, 8, 16, 32])]
+fn mutlithreaded_log(bencher: divan::Bencher, num_threads: usize) {


Divan has native support for threads which might be useful here.

gasmith · 2025-03-12T19:59:45Z

rust/foxglove/benches/websocketserver_bench.rs

+        .with_inputs(|| {
+            runtime.block_on(async move {
+                let mut ws_clients = Vec::with_capacity(num_clients);


I think we're invoking this closure for every iteration of the benchmark, which ends up being extraordinarily time-consuming. Let's set up the clients in advance when we set up the server. You can package them up under a tokio Mutex for interior mutability.

That's a good point

gasmith · 2025-03-12T20:01:13Z

rust/foxglove/benches/websocketserver_bench.rs

+            })
+        })
+        .bench_values(|mut ws_clients| {
+            for _ in 0..50 {


Why 50 messages instead of just one?

So that if the cost of the first message is higher, which is the case with mcap, it amortizes it out.

But this is the websocket benchmark.

eloff · 2025-03-12T20:21:58Z

Looks good to me in general.

Locally, between runs, I get wildly different slowest values for the multithreaded log case, even with a single thread. Any idea why?

The slowest case is likely to be dominated by poor scheduling in the kernel, it's basically noise to be discarded.

add some basic benchmarks for mcap and websocketserver

a1993ec

eloff requested review from jtbandes, bryfox, gasmith and Copilot March 11, 2025 19:26

Copilot AI reviewed Mar 11, 2025

View reviewed changes

bryfox reviewed Mar 12, 2025

View reviewed changes

gasmith reviewed Mar 12, 2025

View reviewed changes

eloff closed this May 8, 2025

		#[divan::bench(args = [1, 2, 4, 8, 16, 32])]
		fn mutlithreaded_log(bencher: divan::Bencher, num_threads: usize) {

Add some basic benchmarks for mcap and websocketserver #277

Add some basic benchmarks for mcap and websocketserver #277

Uh oh!

Conversation

eloff commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Mar 11, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

bryfox left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gasmith left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gasmith Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eloff commented Mar 12, 2025

Uh oh!

Uh oh!

eloff commented Mar 11, 2025 •

edited

Loading

gasmith left a comment •

edited

Loading

gasmith Mar 12, 2025 •

edited

Loading