iotune: Add warmup period to measurements #2810

StephanDollberg · 2025-06-18T10:04:00Z

This adds a warmup period to each test during which measurements are
ignored. Disks are often fairly bursty in the initial part of the test.
This skews the measurements and makes us produce io-properties that are
potentially way too large.

It also heavily affects the shown deviation which can look a lot worse
than it is.

The warmup period is at least 3 seconds or ten percent of the test
duration. Whichever one is larger.

avikivity · 2025-06-19T09:49:56Z

apps/iotune/iotune.cc

+warmup_period(std::chrono::duration<double> duration) {
+    auto min_warmup = 3s;
+    auto ten_percent = duration / 10;
+    return min_warmup.count() > ten_percent.count() ? min_warmup : ten_percent;


No need to compare counts, durations can be compared directly.

Also, std::max().

Fixed, needed some explicit casting on ten_percent as we use double durations there for whatever reason.

StephanDollberg · 2025-07-29T11:18:43Z

ping @xemul @robertbindar

robertbindar

Thanks for the patch @StephanDollberg 🚀 ! I left some inline comments, overall I think the code can get simpler, rates are more basic than I think you assumed in there, let me know if I got it wrong.
I'm guessing the patch comes from some practical real world tests, I'd like to see some examples of such runs where you noticed higher deviation and higher io properties and an amelioration of these symptoms with a warmup period.
Also it'd be nice if we determine if 10% is needed or we can work with less.

robertbindar · 2025-07-30T10:41:21Z

apps/iotune/iotune.cc

+            // `issue_request` itself so the first bucket might not be a full
+            // measurement and cause skew otherwise.
+            size_t samples_to_drop = warmup_period(_duration) / period + 1;
+            _rates.erase(_rates.begin(), _rates.begin() + std::min(samples_to_drop, _rates.size()));


Rates here are one number per shard, that is number of requests executed by one worker in duration time, they have no relation with the period. I don't think you have anything else to do here than adjust the start and end measurement points in io_worker.

I don't think this true, you see that elements are inserted into this array every second based on the timer a few lines above.

This is (unfortunately) decoupled from the other measurement stuff in io_worker::issue_request.

Can you point me to the line that does that? I see the rates being updated only in the destructor of the meter. I also double checked yesterday that there's only one rate per shard by printing them in all workloads.

The timer is configured here: https://github.com/scylladb/seastar/blob/master/apps/iotune/iotune.cc#L335-L338

armed here: https://github.com/scylladb/seastar/blob/master/apps/iotune/iotune.cc#L342

E.g.: with the following debug print added (on top of my patch):

diff --git a/apps/iotune/iotune.cc b/apps/iotune/iotune.cc index 2694d124..016b001b 100644 --- a/apps/iotune/iotune.cc +++ b/apps/iotune/iotune.cc @@ -363,6 +363,7 @@ class io_worker { _rates.push_back(_requests); } + fmt::print("rates: {}\n", fmt::join(_rates, ", ")); // Drop samples that got measured during the warm up period. // We drop an extra sample. This is because the one second timer // logic is really independent of when we start counting requests in

I get the follow output:

./build/release/apps/iotune/iotune -c 4 -m 2G --evaluation-directory /mnt/xfs ... Starting Evaluation. This may take a while... Measuring sequential write bandwidth: rates: 0, 0, 0, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163 rates: 0, 0, 0, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162 rates: 0, 0, 0, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163 rates: 0, 0, 0, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162 1162 MiB/s Measuring sequential read bandwidth: rates: 0, 0, 0, 2442, 2441, 2442, 2441, 2442, 2441, 2441, 2442, 2441, 2442, 2441, 2442 2441 MiB/s Measuring random write IOPS: rates: 0, 0, 5, 67966, 67673, 67206, 67812, 67738, 67536, 67564, 67694, 67965, 67495, 67595, 67682 rates: 0, 0, 0, 67486, 67963, 67547, 67570, 67911, 67751, 67630, 67897, 68015, 67716, 67665, 67808 rates: 0, 0, 5, 67573, 67461, 67850, 67646, 67614, 67596, 68169, 67632, 67667, 67750, 67628, 67549 rates: 0, 0, 0, 65796, 65719, 66224, 65789, 65553, 65941, 65451, 65587, 65170, 65850, 65939, 65721 268823 IOPS Measuring random read IOPS: rates: 0, 0, 0, 129175, 129428, 129530, 129675, 129593, 129331, 129487, 129446, 129433, 129327, 129282, 128691 rates: 0, 0, 0, 129233, 129748, 129772, 129672, 129609, 129427, 129688, 129731, 129712, 129483, 129266, 128756 rates: 0, 0, 0, 124915, 125636, 125422, 125735, 125496, 125070, 125390, 125508, 125507, 125154, 125201, 124977 rates: 0, 0, 0, 129388, 129950, 129887, 129651, 129806, 129341, 129827, 129788, 129825, 129263, 129335, 128951 513794 IOPS

One print per shard with each containing a secondly measurement.

Nice and funny, I didn't spot the _rates.push_back(_requests - _prev_requests); in the constructor and the ticker was running for seconds equal to number of shards, my print was misleading 🤦

robertbindar · 2025-07-30T10:41:42Z

apps/iotune/iotune.cc

@@ -376,7 +398,7 @@ class io_worker {

    io_worker(size_t buffer_size, std::chrono::duration<double> duration, std::unique_ptr<request_issuer> reqs, std::unique_ptr<position_generator> pos, std::vector<unsigned>& rates)
        : _buffer_size(buffer_size)
-        , _start_measuring(iotune_clock::now() + std::chrono::duration<double>(10ms))
+        , _start_measuring(iotune_clock::now() + std::chrono::duration<double>(warmup_period(duration)))
        , _end_measuring(_start_measuring + duration)


you need to adjust the end measuring point as well to respect the proportion and the total duration passed by the user.

_end_measuring is already implicitly moved as it adds on to _start_measuring.

Or are you saying that we should subtract the warmup period here? That seems unneeded to me. I think if someone passes --duration then we should measure for that period of time. The warmup is independent of that.

Note I think there is maybe some implicit assumption in the code that wTP (70%) + rTP(10%) + wIOPS(10%) + rIOPS(10%) adds up to 100% of the duration but this is already quite random why the latter tests run for shorter. It's really just needed this way such that the wTP test has actually written the full file for it to be fully extended for the read tests.

In the default config the wIOPS test for example only runs for 12 seconds so if you remove the warmup from that it's even shorter.

_end_measuring is already implicitly moved as it adds on to _start_measuring.

you're correct, sorry.

Or are you saying that we should subtract the warmup period here? That seems unneeded to me. I think if someone passes --duration then we should measure for that period of time. The warmup is independent of that.

Note I think there is maybe some implicit assumption in the code that wTP (70%) + rTP(10%) + wIOPS(10%) + rIOPS(10%) adds up to 100% of the duration but this is already quite random why the latter tests run for shorter. It's really just needed this way such that the wTP test has actually written the full file for it to be fully extended for the read tests.

the first sequential write workload is responsible for creating the dataset on which the other tests run, that's why it runs for 70% of the duration, you need a big enough file to minimize cache hits. Unfotunately I don't know the reasoning why 70% and not less, maybe there were measurements done. I've conducted experiments a few days ago where I didn't see iops artificially increasing even with files generated in 10% of the duration (IIRC the total file size generated in 10% was ~3GB).

In the default config the wIOPS test for example only runs for 12 seconds so if you remove the warmup from that it's even shorter.

StephanDollberg · 2025-07-31T08:38:37Z

Here is an example from m7gd.4xl (wTP: 577MiB/s, rTP: 1201MiB/s, wIOPS: 134168, rIOPS: 268336)

master:

Measuring sequential write bandwidth: 584 MiB/s (deviation 12%)
Measuring sequential read bandwidth: 1362 MiB/s (deviation 39%)
Measuring random write IOPS: 145383 IOPS (deviation 25%)
Measuring random read IOPS: 296471 IOPS (deviation 23%)

this PR:

Measuring sequential write bandwidth: 576 MiB/s
Measuring sequential read bandwidth: 1200 MiB/s
Measuring random write IOPS: 135082 IOPS
Measuring random read IOPS: 268818 IOPS

robertbindar · 2025-07-31T09:07:33Z

apps/iotune/iotune.cc

+std::chrono::duration<double>
+warmup_period(std::chrono::duration<double> duration) {
+    auto min_warmup = 3s;
+    auto ten_percent = std::chrono::duration_cast<std::chrono::seconds>(duration) / 10;


Can you please run some experiments and see if we can work with less than 10%?

I had previously looked into this on a few AWS instances and they all converge pretty fast which is why I went with the 3 second minimum. This was true for both instance types with shared and unshared disks.

My crappy personal consumer disk took a bit longer which is why I went with the 10% thing as I didn't consider 10% to be bad in terms of how much time it adds. But if you don't like it then we could get rid of it or do 5% or something.

5% would be alright, I'll add an arg in the future if it becomes annoying to anyone.

Cool, I have changed it to 5%.

Make iotune ignore the first three seconds of measurements. Disks are often fairly bursty in the initial part of the test. This skews the measured measurements and makes us produce io-properties that are potential way too large. It also heavily affects the shown deviation which can look a lot worse than it is.

robertbindar

looks good, thanks @StephanDollberg.

robertbindar · 2025-08-04T10:47:10Z

@scylladb/scylla-maint please review and merge

mykaul · 2025-08-04T11:20:01Z

@scylladb/scylla-maint please review and merge

@scylladb/seastar-maint please review and merge

avikivity reviewed Jun 19, 2025

View reviewed changes

StephanDollberg force-pushed the stephan/iotune-warmup branch from 66d8ee2 to c4c3a50 Compare June 19, 2025 10:32

robertbindar suggested changes Jul 30, 2025

View reviewed changes

robertbindar reviewed Jul 31, 2025

View reviewed changes

StephanDollberg force-pushed the stephan/iotune-warmup branch from c4c3a50 to ce1cd57 Compare August 4, 2025 08:44

robertbindar approved these changes Aug 4, 2025

View reviewed changes

iotune: Add warmup period to measurements #2810

Are you sure you want to change the base?

iotune: Add warmup period to measurements #2810

Uh oh!

Conversation

StephanDollberg commented Jun 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StephanDollberg commented Jul 29, 2025

Uh oh!

robertbindar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StephanDollberg Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StephanDollberg commented Jul 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robertbindar left a comment

Choose a reason for hiding this comment

Uh oh!

robertbindar commented Aug 4, 2025

Uh oh!

mykaul commented Aug 4, 2025

Uh oh!

Uh oh!

StephanDollberg Jul 31, 2025 •

edited

Loading