-
Notifications
You must be signed in to change notification settings - Fork 1.6k
iotune: Add warmup period to measurements #2810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
iotune: Add warmup period to measurements #2810
Conversation
warmup_period(std::chrono::duration<double> duration) { | ||
auto min_warmup = 3s; | ||
auto ten_percent = duration / 10; | ||
return min_warmup.count() > ten_percent.count() ? min_warmup : ten_percent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to compare counts, durations can be compared directly.
Also, std::max().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, needed some explicit casting on ten_percent
as we use double
durations there for whatever reason.
66d8ee2
to
c4c3a50
Compare
ping @xemul @robertbindar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @StephanDollberg 🚀 ! I left some inline comments, overall I think the code can get simpler, rates are more basic than I think you assumed in there, let me know if I got it wrong.
I'm guessing the patch comes from some practical real world tests, I'd like to see some examples of such runs where you noticed higher deviation and higher io properties and an amelioration of these symptoms with a warmup period.
Also it'd be nice if we determine if 10% is needed or we can work with less.
// `issue_request` itself so the first bucket might not be a full | ||
// measurement and cause skew otherwise. | ||
size_t samples_to_drop = warmup_period(_duration) / period + 1; | ||
_rates.erase(_rates.begin(), _rates.begin() + std::min(samples_to_drop, _rates.size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rates here are one number per shard, that is number of requests executed by one worker in duration time, they have no relation with the period. I don't think you have anything else to do here than adjust the start and end measurement points in io_worker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this true, you see that elements are inserted into this array every second based on the timer a few lines above.
This is (unfortunately) decoupled from the other measurement stuff in io_worker::issue_request
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to the line that does that? I see the rates being updated only in the destructor of the meter. I also double checked yesterday that there's only one rate per shard by printing them in all workloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer is configured here: https://github.com/scylladb/seastar/blob/master/apps/iotune/iotune.cc#L335-L338
armed here: https://github.com/scylladb/seastar/blob/master/apps/iotune/iotune.cc#L342
E.g.: with the following debug print added (on top of my patch):
diff --git a/apps/iotune/iotune.cc b/apps/iotune/iotune.cc
index 2694d124..016b001b 100644
--- a/apps/iotune/iotune.cc
+++ b/apps/iotune/iotune.cc
@@ -363,6 +363,7 @@ class io_worker {
_rates.push_back(_requests);
}
+ fmt::print("rates: {}\n", fmt::join(_rates, ", "));
// Drop samples that got measured during the warm up period.
// We drop an extra sample. This is because the one second timer
// logic is really independent of when we start counting requests in
I get the follow output:
./build/release/apps/iotune/iotune -c 4 -m 2G --evaluation-directory /mnt/xfs
...
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: rates: 0, 0, 0, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163
rates: 0, 0, 0, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162
rates: 0, 0, 0, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163
rates: 0, 0, 0, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162, 1163, 1162, 1163, 1163, 1162, 1163, 1162
1162 MiB/s
Measuring sequential read bandwidth: rates: 0, 0, 0, 2442, 2441, 2442, 2441, 2442, 2441, 2441, 2442, 2441, 2442, 2441, 2442
2441 MiB/s
Measuring random write IOPS: rates: 0, 0, 5, 67966, 67673, 67206, 67812, 67738, 67536, 67564, 67694, 67965, 67495, 67595, 67682
rates: 0, 0, 0, 67486, 67963, 67547, 67570, 67911, 67751, 67630, 67897, 68015, 67716, 67665, 67808
rates: 0, 0, 5, 67573, 67461, 67850, 67646, 67614, 67596, 68169, 67632, 67667, 67750, 67628, 67549
rates: 0, 0, 0, 65796, 65719, 66224, 65789, 65553, 65941, 65451, 65587, 65170, 65850, 65939, 65721
268823 IOPS
Measuring random read IOPS: rates: 0, 0, 0, 129175, 129428, 129530, 129675, 129593, 129331, 129487, 129446, 129433, 129327, 129282, 128691
rates: 0, 0, 0, 129233, 129748, 129772, 129672, 129609, 129427, 129688, 129731, 129712, 129483, 129266, 128756
rates: 0, 0, 0, 124915, 125636, 125422, 125735, 125496, 125070, 125390, 125508, 125507, 125154, 125201, 124977
rates: 0, 0, 0, 129388, 129950, 129887, 129651, 129806, 129341, 129827, 129788, 129825, 129263, 129335, 128951
513794 IOPS
One print per shard with each containing a secondly measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and funny, I didn't spot the _rates.push_back(_requests - _prev_requests);
in the constructor and the ticker was running for seconds equal to number of shards, my print was misleading 🤦
@@ -376,7 +398,7 @@ class io_worker { | |||
|
|||
io_worker(size_t buffer_size, std::chrono::duration<double> duration, std::unique_ptr<request_issuer> reqs, std::unique_ptr<position_generator> pos, std::vector<unsigned>& rates) | |||
: _buffer_size(buffer_size) | |||
, _start_measuring(iotune_clock::now() + std::chrono::duration<double>(10ms)) | |||
, _start_measuring(iotune_clock::now() + std::chrono::duration<double>(warmup_period(duration))) | |||
, _end_measuring(_start_measuring + duration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to adjust the end measuring point as well to respect the proportion and the total duration passed by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_end_measuring
is already implicitly moved as it adds on to _start_measuring
.
Or are you saying that we should subtract the warmup period here? That seems unneeded to me. I think if someone passes --duration
then we should measure for that period of time. The warmup is independent of that.
Note I think there is maybe some implicit assumption in the code that wTP (70%) + rTP(10%) + wIOPS(10%) + rIOPS(10%) adds up to 100% of the duration but this is already quite random why the latter tests run for shorter. It's really just needed this way such that the wTP test has actually written the full file for it to be fully extended for the read tests.
In the default config the wIOPS test for example only runs for 12 seconds so if you remove the warmup from that it's even shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_end_measuring
is already implicitly moved as it adds on to_start_measuring
.
you're correct, sorry.
Or are you saying that we should subtract the warmup period here? That seems unneeded to me. I think if someone passes
--duration
then we should measure for that period of time. The warmup is independent of that.Note I think there is maybe some implicit assumption in the code that wTP (70%) + rTP(10%) + wIOPS(10%) + rIOPS(10%) adds up to 100% of the duration but this is already quite random why the latter tests run for shorter. It's really just needed this way such that the wTP test has actually written the full file for it to be fully extended for the read tests.
the first sequential write workload is responsible for creating the dataset on which the other tests run, that's why it runs for 70% of the duration, you need a big enough file to minimize cache hits. Unfotunately I don't know the reasoning why 70% and not less, maybe there were measurements done. I've conducted experiments a few days ago where I didn't see iops artificially increasing even with files generated in 10% of the duration (IIRC the total file size generated in 10% was ~3GB).
In the default config the wIOPS test for example only runs for 12 seconds so if you remove the warmup from that it's even shorter.
Here is an example from m7gd.4xl (wTP: 577MiB/s, rTP: 1201MiB/s, wIOPS: 134168, rIOPS: 268336) master:
this PR:
|
apps/iotune/iotune.cc
Outdated
std::chrono::duration<double> | ||
warmup_period(std::chrono::duration<double> duration) { | ||
auto min_warmup = 3s; | ||
auto ten_percent = std::chrono::duration_cast<std::chrono::seconds>(duration) / 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please run some experiments and see if we can work with less than 10%?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had previously looked into this on a few AWS instances and they all converge pretty fast which is why I went with the 3 second minimum. This was true for both instance types with shared and unshared disks.
My crappy personal consumer disk took a bit longer which is why I went with the 10% thing as I didn't consider 10% to be bad in terms of how much time it adds. But if you don't like it then we could get rid of it or do 5% or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5% would be alright, I'll add an arg in the future if it becomes annoying to anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I have changed it to 5%.
Make iotune ignore the first three seconds of measurements. Disks are often fairly bursty in the initial part of the test. This skews the measured measurements and makes us produce io-properties that are potential way too large. It also heavily affects the shown deviation which can look a lot worse than it is.
c4c3a50
to
ce1cd57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, thanks @StephanDollberg.
@scylladb/scylla-maint please review and merge |
@scylladb/seastar-maint please review and merge |
This adds a warmup period to each test during which measurements are
ignored. Disks are often fairly bursty in the initial part of the test.
This skews the measurements and makes us produce io-properties that are
potentially way too large.
It also heavily affects the shown deviation which can look a lot worse
than it is.
The warmup period is at least 3 seconds or ten percent of the test
duration. Whichever one is larger.