-
-
Notifications
You must be signed in to change notification settings - Fork 69
Avoid clobbering environment variables loaded from /etc/profile.d #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that this results in a lot of new environment variables being loaded into systemd's user manager that weren't previously. I believe this is correct behavior, however it might change the behavior of other user services and qrexec services. Whatever testing needs to be done to make sure that doesn't break the world, should probably be done. |
This will now break some of the systemd-defined variables (including XDG_DATA_DIRS parts added there, so - flatpak). The problem with this approach is that /etc/profile.d scripts are usually handling already existing variables by appending value (that applies at least to PATH, but XDG_DATA_DIRS too). If you now override one with the other, you break /etc/profile.d expectation (it was meant to extend values, but now it will override them). What probably wants happening here, is loading systemd-defined variables earlier, before /etc/profile gets loaded. I'm not 100% sure where that actually should be. Maybe somewhere in https://github.com/QubesOS/qubes-gui-agent-linux/blob/main/gui-agent/qubes-gui-runuser.c (which is used in https://github.com/QubesOS/qubes-gui-agent-linux/blob/main/appvm-scripts/usrbin/qubes-run-xorg) ? Note also that /etc/profile.d is meant for shell variables. I don't think any spec mandates loading it for non-shell apps. So, if you want a variable to be set for various GUI apps (not just shell running in a terminal emulator), you should probably use a different mechanism (likely /etc/environment.d). For example on native Xfce, if you launch an app from the menu, it won't see variables set via /etc/profile.d (you can check via See also QubesOS/qubes-issues#6408 which demanded /etc/environment.d to be respected. And also #151 for some more background. |
Maybe moving to environment.d in the future can be a goal, but right now to my awareness all of the system-wide environment variable manipulation in Whonix (and quite a bit of it done in Qubes it appears) is done in /etc/profile.d or /etc/X11/Xsession.d, and Whonix does a lot of environment variable manipulation. Changing everything to environment.d this late in the game might not be entirely out of the question, but it would definitely fall into the "last resort" category. Loading things from /etc/environment.d first sounds like a good solution. qubes-gui-runuser sounds like kind of an awkward place to do it though since that requires lots of possibly messy string parsing in C. It seems like it would be easier (possibly) to do it in all scripts that eventually end up calling qubes-session. I don't know where all it is called though, and a string search in Github wasn't all that clear. |
I guess after further thought, maybe implementing this in qubes-gui-runser.c is better. Otherwise we risk this bug popping back up if someone creates a new script that calls qubes-session via qubes-gui-runuser and doesn't parse the environment variables from systemd. The environment variables are just going to be newline-separated key-value pairs, so whatever "parsing" is needed can't be that hard. |
I proposed qubes-gui-runuser, because that's what calls shell that will eventually call qubes-session, so it runs before shell startup scripts. |
There's code in qubes-gui-runuser that can work without PAM, it seems to me like a bugfix should ideally work in both scenarios. Probably better to use dbus. edit: There is |
e4c68aa
to
921f753
Compare
Well, turns out when https://dbus.freedesktop.org/doc/api/html/index.html says "If you use this low-level API directly, you're signing up for some pain", it is not joking! My latest commit is entirely untested, probably won't even compile, and doesn't handle the non-PAM case I wanted it to handle, but, for whatever it's worth, it's there. If this is an entirely unsuitable approach, I can give up on this, otherwise I'll finish polishing and testing this in the near future. |
Pain it is indeed... Aren't there any helpers for that? But also, I think it needs to be called only after pam_open_session - this is where session bus gets started. This may mean using |
There are helpers for this, but they are mainly parts of larger application frameworks like (quoting from the API's main page) GLib, Qt, Python, Mono, Java, etc. D-Bus is also intended to be an event-driven RPC framework where things are managed by callback functions and a main loop, whereas what I'm doing is more like calling an HTTP endpoint and getting a response back. That isn't quite the normal use case for D-Bus AFAICT, thus I don't expect there will be a well-supported helper in general for this kind of thing. Good point about pam_open_session, I'll move the call appropriately (and hopefully fully test this before the next push). |
It looks like the no-PAM code in qubes-gui-runuser is entirely broken - the |
921f753
to
046814c
Compare
Alright, this time it builds locally, and it actually seems to work on my end, though I've not tested it nearly as much needed to consider it rigorously tested. I think at this point we can be fairly certain this solution can work, so some security review on the mess required to talk to systemd over D-Bus would be much appreciated. Some TODOs that aren't already in the code:
|
After some further testing (in which I caught the above bugs and also a minor memory leak), it looks like the D-Bus interaction code itself works once those bugs are fixed, but for some reason in qubes-gui-runuser trying to even use D-Bus at all results in the error Edit: This turned out to be because |
ed9fc95
to
88186b6
Compare
Tested on both debian-13-xfce and kicksecure-18, this has the expected results on environment variables within the session, and the file manager button in the Qubes Domains widget now opens pcmanfm-qt in kicksecure-18 as expected. I tested the D-Bus code in isolation with good results, and used static analysis tools and Valgrind on the D-Bus code to try to shake out any hidden bugs. |
It does appear to be available over Varlink, according to
("Environment" is the same name as used by the D-Bus API, so I'm assuming this is the same thing.)
Sure, that would work too. I fear it might trigger this same issue again on slower hardware, but it would probably be easier than trying to use Varlink. I somewhat doubt people are using Qubes on hardware much slower than the VM we were working on yesterday, and I wasn't able to get a systemctl call in qubes-xorg-wrapper to break things yesterday, so this is probably fine. I'm happy to pursue whichever solution you prefer. |
Technically, this needs to still work on bookworm, so that likely rules out Varlink (systemd 254, or even 252 if backports are disabled), right? It would be interesting to try, but I'm afraid we need dbus version as a fallback anyway. |
Good point, it does seem the 'Environment' API is not available over Varlink in systemd 252. One final solution before giving up, maybe rather than using If that fails, I think it's probably best to just use a profile.d based solution like you suggested. |
The VM is up again (the same as before). |
10-4, will work on this now. Thank you! |
So, interesting finding... if I modify my changes so they are effectively a no-op (all added lines in qubes-gui-runuser.c are commented out, no additional libraries or include paths are added to the build process), then install that on debian-13-xfce, the minute-and-30-second hang issue still occurs every so often. It does not occur before that. I'm starting to wonder if maybe a previous change to qubes-gui-runuser.c (or something else in the gui agent) is at fault here and the D-Bus warnings are a red herring? |
Ah, no, there was one effective change I left, which was getting rid of the systemd environment import in /usr/bin/qubes-session. If I clone debian-13-xfce-test to debian-13-xfce, then comment out the environment import in that script, the issue resurfaces. No idea why, but... there it is. Edit to add, just for good measure, I uncommented the environment import lines in qubes-session, then tried another five reboots of debian-13-xfce. The issue went away again. |
Sigh, I'm lost. I narrowed down the bug to this removed block of code from qubes-session:
If that block is removed, whether using the old qubes-gui-runuser or the new one, the VM will hang on What is sorely confusing to me is that virtually any change to this block of the script results in the boot occasionally stalling, including:
So far the only way I've been able to tweak the block without it breaking things is to comment out the I'm done with the VM again for now, thanks for letting me use it again. |
I'm also quite confused... but what happens if you move this block to profile.d? Maybe it will work, even if we don't fully understand why?
As for your attempts at setting it some other way, note the script starts a bunch other programs after that, including the whole /etc/xdg/autostart. Variables need to be exported, not just set, for this to have effect. Maybe some of the programs started there misbehave without some of them?
Or maybe it breaks due to the very fix you try to do? As in, some application misbehave if env changes in profile.d are _not_ rolled back?
|
Possibly. It looks like the VM is running at the moment, so I'm in it again and will give that a shot.
Unlikely, since it breaks the same way whether none of the variables are set or exported, all of them are set and exported (unless they're set with |
88186b6
to
f75d5c6
Compare
Leaving qubes-gui-runuser unmodified and just moving the environment import into /etc/profile.d itself worked. Variables from systemd are getting imported, then augmented by the rest of profile.d, VM shutdown isn't stalled on qubes-gui-agent, and when testing locally on Whonix-Workstation 18, the correct file manager is launched when using the Qubes Domains widget. I did ten reboots of debian-13-xfce using Pushed the new fix, hopefully this one will work for good. |
Ok, starting new openQA run now |
f75d5c6
to
a38cb53
Compare
Recent test run fails at starting sys-gui (but not sys-gui-gpu). I see the following logs on sys-gui's console:
I'm not sure yet whether systemctl message is related to the failure nor whether this PR is to blame, but it's likely. |
Yes, it looks related. It's likely due to this part of qubes-run-xorg:
sys-gui does use the guivm-gui-agent branch. FWIW the test VM is running, with sys-gui created. It's enough to start it and then check |
Well this is curious, because I saw that same error when I tested this on debian-13-xfce yesterday. I ignored it because everything seemed to work anyway, I figured it must have fallen back to some other way of getting the info or maybe the error was unrelated. I'll try this solution and see if checking for a non-empty I'm in the VM and testing now. |
So, this didn't work (at least not my variant of it), doing this and changing the
Well, there's one thing about the block of code in qubes-session we're modifying that isn't happening in any of the attempts that fail, and that is happening in all of the attempts that work. That's the With all of the broken variants of the new code, if the D-Bus connection attempt fails, we just ignore it and move on, trying to start the session anyway, thus resulting in I dislike having an IMO, and assuming my hunch here is right, I think the right solution would be to go back to the fix in |
I think I figured it out!
This would result in exactly the symptoms we see here, I believe. This can be fixed by changing
to this:
I think we should also change this:
to this:
Otherwise we have a (probably very tiny) race window in between the |
… it through to the session qubes-session previously loaded environment variables from `systemctl --user show-environment`, so that variables set using systemd environment generators would be present in the user session. However, doing this meant any environment variables defined or augmented by scripts under /etc/profile.d could be clobbered by versions of those same variables from the systemd user instance, resulting in application misbehavior in some instances. To fix this, load all environment variables from systemd's user instance in qubes-gui-runner instead. This will augment the environment provided by PAM with the environment provided by systemd, then the /etc/profile.d scripts can augment the environment further. This should prevent any variables from being incorrectly clobbered, and allow all mechanisms of providing environment variables to the end-user's session to function as intended. Fixes: QubesOS/qubes-issues#10299
a38cb53
to
6291d8b
Compare
New solution appears to work - I was able to reboot debian-13-xfce with this code in place ten times in a row without any hang during shutdown. This shouldn't result in the same sys-gui issues as before, though I haven't tested it there (I'm hoping openQA can handle that for me). The only things that still worry me about this solution are:
|
PipelineRetry |
qubes-session loads environment variables from
systemctl --user show-environment
, so that variables set using systemd environment generators are present in the user session. Previously, when one of those generators provided an environment variable that was already set by a script under /etc/profile.d, the variable from the environment generator would clobber the variable from /etc/profile.d. This is backwards - variables from /etc/profile.d should take precedence over those defined in systemd environment generators.To fix this, update the systemd user manager's environment with the contents of the session's environment immediately before updating the session's environment with the environment from the systemd user manager. This two-way sync means that all variables defined in both environment generators and profile scripts end up in the user's environment, while making variable definitions from the profile scripts take priority over variable definitions from environment generators.
Fixes: QubesOS/qubes-issues#10299