-
Notifications
You must be signed in to change notification settings - Fork 23
(RHEL-9322) cgroup: drastically simplify caching of cgroups members mask #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: rhel-8.10.0
Are you sure you want to change the base?
(RHEL-9322) cgroup: drastically simplify caching of cgroups members mask #295
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't just cherry pick the single commit from the lengthy series in systemd/systemd#10894 and hope for the best. I have a gut feeling this will introduce regressions in our handling of cgroups. At very least we shouldn't be introducing dead code.
src/core/unit.h
Outdated
CGroupMask cgroup_members_mask; | ||
CGroupMask cgroup_realized_mask; /* In which hierarchies does this unit's cgroup exist? (only relevant on cgroupsv1) */ | ||
CGroupMask cgroup_enabled_mask; /* Which controllers are enabled (or more correctly: enabled for the children) for this unit's cgroup? (only relevant on cgroupsv2) */ | ||
CGroupMask cgroup_invalidated_mask; /* A mask specifiying controllers which shall be considered invalidated, and require re-realization */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cgroup_invalidated_mask
isn't used anywhere. Also a comment contains typo in "specifiying".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to backport the whole series at first, but abandoned that approach in the end. The PR depends on previous PR(s). It would likely take several days to sort it out and backport everything needed.
This way we can corectly ensure that when a unit that requires some controller goes away, we propagate the removal of it all the way up, so that the controller is turned off in all the parents too. (cherry picked from commit b8b6f32) Related: RHEL-9322
78e2717
to
83ada01
Compare
(cherry picked from commit 43738e0) Related: RHEL-9322
83ada01
to
b719f06
Compare
CI failures looked related but to be sure they are not just flukes @lnykryn force-pushed the feature branch to rerun the checks. |
Unit tests seem to be failing because of #434, But I'm not sure about CentOS CI. |
CentOS CI failure is concerning. @dtardon can you please have a look? |
Oh well, the CentOS CI for this repo is currently FUBAR, since CentOS Stream 8 is no more. I'll need to figure something out to make it working again... |
@mrc0mmand Maybe we could arrange "Redhat developer subscription for teams", i.e. self-service subscription and then we could generate activation key that we would store as GH secret and then use that to enable all needed repos in UBI RHEL container image (e.g. CRB). |
I mean, yeah, that would work for the container stuff (and I plan to actually do that), but this wouldn't work for CentOS CI, since, as the name suggests, it only supports CentOS. So I'll move the C8S job to C9S, which should work just fine. The environment will be different, but it should be enough for making sure we haven't royally screwed something up (we run the same tests internally on actual RHEL 8 anyway). |
CentOS CI should be back (albeit with some compromises, see systemd/systemd-centos-ci@f415a75). I'll look into the container stuff next. |
A quick workaround for the current C8S containers is in #440. |
Previously we tried to be smart: when a new unit appeared and it only
added controllers to the cgroup mask we'd update the cached members mask
in all parents by ORing in the controller flags in their cached values.
Unfortunately this was quite broken, as we missed some conditions when
this cache had to be reset (for example, when a unit got unloaded),
moreover the optimization doesn't work when a controller is removed
anyway (as in that case there's no other way for the parent to iterate
though all children if any other, remaining child unit still needs it).
Hence, let's simplify the logic substantially: instead of updating the
cache on the right events (which we didn't get right), let's simply
invalidate the cache, and generate it lazily when we encounter it later.
This should actually result in better behaviour as we don't have to
calculate the new members mask for a whole subtree whever we have the
suspicion something changed, but can delay it to the point where we
actually need the members mask.
This allows us to simplify things quite a bit, which is good, since
validating this cache for correctness is hard enough.
Fixes: #9512
(cherry picked from commit 5af8805)
Resolves: #2096371