-
Notifications
You must be signed in to change notification settings - Fork 1.2k
test: Fix scheduler for CPU/RAM starvation #22748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This initial run and a forced retry both have a few retries due to VM boot failures. |
|
first run with fixed scheduler looks promising -- two real flakes, plus |
a9e684f to
a264683
Compare
|
This run was using 13 parallel spots, as I was refining the memory estimates. Turns out they are very good now! RAM analysis:
So that confirms my recent suspicion that with 8 parallel tests we are not RAM bound, as I always thought. Instead, booting many parallel VMs is just too slow and times out, hence we are CPU/load bound. The recent commit already throttled the destructive tests by load, but it wasn't enough yet, and the 13 parallel ND tests are definitively too much. Load Analysis:
The load throttling from the current top commit a264683 didn't trigger at all. Analysis: Peak loads:
Load by VM count during destructive:
|
This should cause 3x affected retries.
Parallism is now purely dynamical, which makes so much more sense!
The load throttling blocked test 166 when load was 12.41 > 12.0 threshold, but then all running tests completed (0 tests running). The scheduler exited thinking it was done, but there are still 84 destructive tests left in the queue. When load is too high but no tests are running, the scheduler has no way to make progress and exits prematurely.
|
for $deity's sake, can this get green at all? Putting back the load test in addition to |
Hopefully reproduces this mess from #22373. No changes for the first run, I want to see it fail.
We are seeing these problems even in PRs without affected "expensive" tests.