Skip to content

Conversation

ArrayBolt3
Copy link
Contributor

qubes-session loads environment variables from
systemctl --user show-environment, so that variables set using systemd environment generators are present in the user session. Previously, when one of those generators provided an environment variable that was already set by a script under /etc/profile.d, the variable from the environment generator would clobber the variable from /etc/profile.d. This is backwards - variables from /etc/profile.d should take precedence over those defined in systemd environment generators.

To fix this, update the systemd user manager's environment with the contents of the session's environment immediately before updating the session's environment with the environment from the systemd user manager. This two-way sync means that all variables defined in both environment generators and profile scripts end up in the user's environment, while making variable definitions from the profile scripts take priority over variable definitions from environment generators.

Fixes: QubesOS/qubes-issues#10299

@ArrayBolt3
Copy link
Contributor Author

Note that this results in a lot of new environment variables being loaded into systemd's user manager that weren't previously. I believe this is correct behavior, however it might change the behavior of other user services and qrexec services. Whatever testing needs to be done to make sure that doesn't break the world, should probably be done.

@marmarek
Copy link
Member

marmarek commented Oct 5, 2025

This will now break some of the systemd-defined variables (including XDG_DATA_DIRS parts added there, so - flatpak). The problem with this approach is that /etc/profile.d scripts are usually handling already existing variables by appending value (that applies at least to PATH, but XDG_DATA_DIRS too). If you now override one with the other, you break /etc/profile.d expectation (it was meant to extend values, but now it will override them). What probably wants happening here, is loading systemd-defined variables earlier, before /etc/profile gets loaded. I'm not 100% sure where that actually should be. Maybe somewhere in https://github.com/QubesOS/qubes-gui-agent-linux/blob/main/gui-agent/qubes-gui-runuser.c (which is used in https://github.com/QubesOS/qubes-gui-agent-linux/blob/main/appvm-scripts/usrbin/qubes-run-xorg) ?

Note also that /etc/profile.d is meant for shell variables. I don't think any spec mandates loading it for non-shell apps. So, if you want a variable to be set for various GUI apps (not just shell running in a terminal emulator), you should probably use a different mechanism (likely /etc/environment.d). For example on native Xfce, if you launch an app from the menu, it won't see variables set via /etc/profile.d (you can check via /proc/PID/environ, as ofc opening shell will load /etc/profile.d; check for example LESSOPEN - this one seems to be set only in /etc/profile.d). /etc/profile.d works great for power users who open all applications from a terminal, which is likely why it's such a common habit...

See also QubesOS/qubes-issues#6408 which demanded /etc/environment.d to be respected. And also #151 for some more background.

@ArrayBolt3
Copy link
Contributor Author

Maybe moving to environment.d in the future can be a goal, but right now to my awareness all of the system-wide environment variable manipulation in Whonix (and quite a bit of it done in Qubes it appears) is done in /etc/profile.d or /etc/X11/Xsession.d, and Whonix does a lot of environment variable manipulation. Changing everything to environment.d this late in the game might not be entirely out of the question, but it would definitely fall into the "last resort" category.

Loading things from /etc/environment.d first sounds like a good solution. qubes-gui-runuser sounds like kind of an awkward place to do it though since that requires lots of possibly messy string parsing in C. It seems like it would be easier (possibly) to do it in all scripts that eventually end up calling qubes-session. I don't know where all it is called though, and a string search in Github wasn't all that clear.

@ArrayBolt3
Copy link
Contributor Author

I guess after further thought, maybe implementing this in qubes-gui-runser.c is better. Otherwise we risk this bug popping back up if someone creates a new script that calls qubes-session via qubes-gui-runuser and doesn't parse the environment variables from systemd. The environment variables are just going to be newline-separated key-value pairs, so whatever "parsing" is needed can't be that hard.

@marmarek
Copy link
Member

marmarek commented Oct 6, 2025

I proposed qubes-gui-runuser, because that's what calls shell that will eventually call qubes-session, so it runs before shell startup scripts.
As for parsing, I don't think you really need it. You can simply ask systemd for the variables using proper API (dbus), no need for asking systemctl --user show-environment to format it into string just to have it split into variables back. It will add dbus dependency to that tool, but it's kinda there already via pam_systemd. Oh, maybe there is a simpler solution if there is some pam module that would load those variables?

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 6, 2025

There's code in qubes-gui-runuser that can work without PAM, it seems to me like a bugfix should ideally work in both scenarios. Probably better to use dbus.

edit: There is pam_env, but it can only load explicitly provided environment variables, or variables from a file. We'd have to dump the environment variables from systemd into a file first, then load them with pam_env, to do this in PAM. That sounds less secure and more difficult to me.

@ArrayBolt3 ArrayBolt3 force-pushed the arraybolt3/systemd-env-fix branch from e4c68aa to 921f753 Compare October 6, 2025 23:35
@ArrayBolt3 ArrayBolt3 marked this pull request as draft October 6, 2025 23:35
@ArrayBolt3
Copy link
Contributor Author

Well, turns out when https://dbus.freedesktop.org/doc/api/html/index.html says "If you use this low-level API directly, you're signing up for some pain", it is not joking! My latest commit is entirely untested, probably won't even compile, and doesn't handle the non-PAM case I wanted it to handle, but, for whatever it's worth, it's there. If this is an entirely unsuitable approach, I can give up on this, otherwise I'll finish polishing and testing this in the near future.

@marmarek
Copy link
Member

marmarek commented Oct 7, 2025

Pain it is indeed... Aren't there any helpers for that?
And indeed it doesn't compile (missing build dep).

But also, I think it needs to be called only after pam_open_session - this is where session bus gets started. This may mean using pam_putenv might be not okay (to be checked)? If so, get_systemd_env_from_dbus probably should take already existing array (that was returned by pam_getenvlist) and extend it, instead of allocating new one.

@ArrayBolt3
Copy link
Contributor Author

There are helpers for this, but they are mainly parts of larger application frameworks like (quoting from the API's main page) GLib, Qt, Python, Mono, Java, etc. D-Bus is also intended to be an event-driven RPC framework where things are managed by callback functions and a main loop, whereas what I'm doing is more like calling an HTTP endpoint and getting a response back. That isn't quite the normal use case for D-Bus AFAICT, thus I don't expect there will be a well-supported helper in general for this kind of thing.

Good point about pam_open_session, I'll move the call appropriately (and hopefully fully test this before the next push).

@ArrayBolt3
Copy link
Contributor Author

It looks like the no-PAM code in qubes-gui-runuser is entirely broken - the execve line at the end references a variable env that doesn't even exist in the function or as a global. Not sure if it's worth trying to fix it now, I'll leave it broken and not add the systemd environment fixes. In the long run, should the code be repaired, or would it be better to just discard it?

@ArrayBolt3 ArrayBolt3 force-pushed the arraybolt3/systemd-env-fix branch from 921f753 to 046814c Compare October 8, 2025 03:34
@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 8, 2025

Alright, this time it builds locally, and it actually seems to work on my end, though I've not tested it nearly as much needed to consider it rigorously tested. I think at this point we can be fairly certain this solution can work, so some security review on the mess required to talk to systemd over D-Bus would be much appreciated.

Some TODOs that aren't already in the code:

  • The error messages from the D-Bus interaction code should make their origin more clear, and it's possible the errors themselves could be clearer.
  • The code needs to use "D-Bus" as the term for the message bus, not "DBus".
  • There should be a comment somewhere documenting that one should never pass a char ** that contains any unsafe-to-free memory to get_systemd_env_from_dbus. The function reallocarray's the passed-in array numerous times, and can free elements within the array to replace them with elements copied from systemd.
  • This whole mechanism should be tested in isolation - right now all I know is my environment looks like I'd expect (without things from /etc/profile.d being clobbered) when I boot with this, I don't yet know if getting the environment variables from systemd actually succeeded or not.

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 8, 2025

After some further testing (in which I caught the above bugs and also a minor memory leak), it looks like the D-Bus interaction code itself works once those bugs are fixed, but for some reason in qubes-gui-runuser trying to even use D-Bus at all results in the error qubes-gui-runuser: Failed to initialize DBus, error name: 'org.freedesktop.DBus.Error.Spawn.ExecFailed', error contents: '/usr/bin/dbus-launch terminated abnormally without any error message'.

Edit: This turned out to be because DBUS_SESSION_BUS_ADDRESS wasn't properly set in the environment of qubes-gui-runuser, It needed to be taken from the environment returned by pam_getenvlist and putenv'd into the executable's real environment. Doing that fixed things.

@ArrayBolt3 ArrayBolt3 force-pushed the arraybolt3/systemd-env-fix branch 2 times, most recently from ed9fc95 to 88186b6 Compare October 9, 2025 00:10
@ArrayBolt3 ArrayBolt3 marked this pull request as ready for review October 9, 2025 00:10
@ArrayBolt3
Copy link
Contributor Author

Tested on both debian-13-xfce and kicksecure-18, this has the expected results on environment variables within the session, and the file manager button in the Qubes Domains widget now opens pcmanfm-qt in kicksecure-18 as expected. I tested the D-Bus code in isolation with good results, and used static analysis tools and Valgrind on the D-Bus code to try to shake out any hidden bugs.

@ArrayBolt3
Copy link
Contributor Author

Can you get the env variables via varlink from the user systemd instance? I can't find information about this API being available over Varlink...

It does appear to be available over Varlink, according to systemd/src/shared/varlink-io.systemd.Manager.c:

SD_VARLINK_FIELD_COMMENT("https://www.freedesktop.org/software/systemd/man/"PROJECT_VERSION_STR"/systemd-system.conf.html#ManagerEnvironment="),
                SD_VARLINK_DEFINE_FIELD(Environment, SD_VARLINK_STRING, SD_VARLINK_ARRAY|SD_VARLINK_NULLABLE),

("Environment" is the same name as used by the D-Bus API, so I'm assuming this is the same thing.)

Anyway, maybe we need to settle on an inferior solution - a file in /etc/profile.d with some early prefix (000-?) that would load the initial env from systemd? Maybe excluding few standard variables set in /etc/profile itself (PATH etc)?

Sure, that would work too. I fear it might trigger this same issue again on slower hardware, but it would probably be easier than trying to use Varlink. I somewhat doubt people are using Qubes on hardware much slower than the VM we were working on yesterday, and I wasn't able to get a systemctl call in qubes-xorg-wrapper to break things yesterday, so this is probably fine. I'm happy to pursue whichever solution you prefer.

@marmarek
Copy link
Member

Technically, this needs to still work on bookworm, so that likely rules out Varlink (systemd 254, or even 252 if backports are disabled), right? It would be interesting to try, but I'm afraid we need dbus version as a fallback anyway.

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 12, 2025

Good point, it does seem the 'Environment' API is not available over Varlink in systemd 252.

One final solution before giving up, maybe rather than using dbus_bus_get(), I should handle finding and connecting to the D-Bus bus manually so that I can explicitly close the connection? I don't think that will work, but it at least rules out the possibility of a leaked connection. If I can have access to a VM again I can test that quickly, otherwise I'll just upload a commit that tries to fix that and we can see if CI passes.

If that fails, I think it's probably best to just use a profile.d based solution like you suggested.

@marmarek
Copy link
Member

marmarek commented Oct 12, 2025

The VM is up again (the same as before).

@ArrayBolt3
Copy link
Contributor Author

10-4, will work on this now. Thank you!

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 12, 2025

So, interesting finding... if I modify my changes so they are effectively a no-op (all added lines in qubes-gui-runuser.c are commented out, no additional libraries or include paths are added to the build process), then install that on debian-13-xfce, the minute-and-30-second hang issue still occurs every so often. It does not occur before that. I'm starting to wonder if maybe a previous change to qubes-gui-runuser.c (or something else in the gui agent) is at fault here and the D-Bus warnings are a red herring?

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 12, 2025

Ah, no, there was one effective change I left, which was getting rid of the systemd environment import in /usr/bin/qubes-session. If I clone debian-13-xfce-test to debian-13-xfce, then comment out the environment import in that script, the issue resurfaces. No idea why, but... there it is.

Edit to add, just for good measure, I uncommented the environment import lines in qubes-session, then tried another five reboots of debian-13-xfce. The issue went away again.

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 12, 2025

Sigh, I'm lost. I narrowed down the bug to this removed block of code from qubes-session:

set -a # export all variables
env=$(systemctl --user show-environment) && eval "$env" || exit
set +a
unset env

If that block is removed, whether using the old qubes-gui-runuser or the new one, the VM will hang on qubes-gui-agent.service for a minute and a half about once out of every three and a half boots (when starting and then immediately stopping the VM). If that block is present, whether using the old qubes-gui-runuser or the new one, the VM doesn't hang during shutdown except rarely on dovecot (which we already understand to be a non-issue). Thus it appears the C code has nothing to do with the hangs, only the shell script does.

What is sorely confusing to me is that virtually any change to this block of the script results in the boot occasionally stalling, including:

  • Commenting out the && eval "$env" || exit bit
  • Removing the block and just running systemctl --user show-environment >/dev/null 2>/dev/null in its place
  • Removing the env=... line and replacing it with a bunch of individual variable assignments that put in place the exact same variables that would have been set by eval'ing the string output of systemctl --user show-environment
  • Calling systemctl --user show-environment outside of a subshell and discarding its output, then setting all of the variables that would have been set if its output had been eval'd
  • Replacing the block with a sleep 1 (in the hopes that maybe it was just introducing a bit of a delay that allowed winning a race)
  • Explicitly setting only a subset of those environment variables that seem important, like XDG_CURRENT_DESKTOP and PATH
  • Throwing the whole block into a subshell in the hopes that whatever the block did could be done without polluting the system's environment

So far the only way I've been able to tweak the block without it breaking things is to comment out the set -a and set +a lines, and that still results in some variable overwriting, so I can't really say that really changed the behavior of the block. It's as if the system has to have the environment variables set right here by systemctl --user show-environment, and they have to be set using eval, other mechanisms don't work. This makes no sense to me; setting all the environment variables without using eval should have worked if it was environment related, using systemctl --user show-environment and discarding its output should have worked if D-Bus needed nudged in order to keep things running, and the sleep should have worked if it was just a matter of delaying things, but none of those work.

I'm done with the VM again for now, thanks for letting me use it again.

@marmarek
Copy link
Member

marmarek commented Oct 13, 2025 via email

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 13, 2025

I'm also quite confused... but what happens if you move this block to profile.d? Maybe it will work, even if we don't fully understand why?

Possibly. It looks like the VM is running at the moment, so I'm in it again and will give that a shot.

Or maybe it breaks due to the very fix you try to do? As in, some application misbehave if env changes in profile.d are not rolled back?

Unlikely, since it breaks the same way whether none of the variables are set or exported, all of them are set and exported (unless they're set with eval), or if only some of them are set and exported. "Set everything with eval but don't export anything explicitly" worked... maybe that is a clue?

@ArrayBolt3 ArrayBolt3 force-pushed the arraybolt3/systemd-env-fix branch from 88186b6 to f75d5c6 Compare October 13, 2025 19:57
@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 13, 2025

Leaving qubes-gui-runuser unmodified and just moving the environment import into /etc/profile.d itself worked. Variables from systemd are getting imported, then augmented by the rest of profile.d, VM shutdown isn't stalled on qubes-gui-agent, and when testing locally on Whonix-Workstation 18, the correct file manager is launched when using the Qubes Domains widget. I did ten reboots of debian-13-xfce using qvm-start debian-13-xfce; qvm-shutdown debian-13-xfce;, every one of them worked as expected.

Pushed the new fix, hopefully this one will work for good.

@marmarek
Copy link
Member

Ok, starting new openQA run now

@ArrayBolt3 ArrayBolt3 force-pushed the arraybolt3/systemd-env-fix branch from f75d5c6 to a38cb53 Compare October 13, 2025 21:17
@marmarek
Copy link
Member

Recent test run fails at starting sys-gui (but not sys-gui-gpu). I see the following logs on sys-gui's console:

[2025-10-14 13:34:56] [   16.632174] systemctl[568]: Failed to connect to user scope bus via local transport: $DBUS_SESSION_BUS_ADDRESS and $XDG_RUNTIME_DIR not defined (consider using --machine=<user>@.host --user to connect to bus of other user)
[2025-10-14 13:34:56] [   16.732858] qubes-gui[500]: Xorg exited in the meantime, aborting
[2025-10-14 13:34:56] [   16.735684] systemd[1]: qubes-gui-agent.service: Main process exited, code=exited, status=1/FAILURE
[2025-10-14 13:34:56] [   16.750769] runuser[529]: pam_unix(runuser:session): session closed for user user
[2025-10-14 13:34:56] [   16.751326] systemd[1]: 0;1;38:5:185mqubes-gui-agent.service: Failed with result 'exit-code'.

I'm not sure yet whether systemctl message is related to the failure nor whether this PR is to blame, but it's likely.

@marmarek
Copy link
Member

Yes, it looks related. It's likely due to this part of qubes-run-xorg:

if qsvc guivm-gui-agent; then
    DISPLAY_XORG=:1

    # Create Xorg. Xephyr will be started using qubes-start-xephyr later.
    exec runuser -u "$DEFAULT_USER" -- /bin/sh -l -c "exec /usr/lib/qubes/qubes-xorg-wrapper $DISPLAY_XORG -nolisten tcp vt07 -wr -config xorg-qubes.conf > ~/.xorg-errors 2>&1"
else
    # Use sh -l here to load all session startup scripts (/etc/profile, ~/.profile
    # etc) to populate environment. This is the environment that will be used for
    # all user applications and qrexec calls.
    exec /usr/bin/qubes-gui-runuser "$DEFAULT_USER" /bin/sh -l -c "exec /usr/bin/xinit $XSESSION -- /usr/lib/qubes/qubes-xorg-wrapper :0 -nolisten tcp vt07 -wr -config xorg-qubes.conf > ~/.xsession-errors 2>&1"
fi

sys-gui does use the guivm-gui-agent branch.
So, it uses runuser instead of qubes-gui-runuser. While PAM configuration includes pam_systemd, and logs say user's systemd instance was started, I don't see session bus started. And this is actually expected, because this X instance (:1) is only a thin wrapper to forward a single full-screen window to dom0, actual session bus is started later, under the other X server instance (Xephyr started later, as :0).
So, the fix might be as easy as checking if DBUS_SESSION_BUS_ADDRESS variable is set first.

FWIW the test VM is running, with sys-gui created. It's enough to start it and then check qubes-gui-agent service status (qvm-run -p --nogui sys-gui systemctl status qubes-gui-agent). You can also logout from dom0 and select GUI domain as session type in lightdm. But be careful with screenlocker in sys-gui, as user password is not set there, if you let it lock, you'd need to switch to tty2 and kill it via qvm-run or so.

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 15, 2025

Well this is curious, because I saw that same error when I tested this on debian-13-xfce yesterday. I ignored it because everything seemed to work anyway, I figured it must have fallen back to some other way of getting the info or maybe the error was unrelated. I'll try this solution and see if checking for a non-empty DBUS_SESSION_BUS_ADDRESS does the trick. Either way, I think the || exit in 00-qubes-systemd-env-import.sh should change to an || true. If this fails, there's no reason to block further processing of profile.d or similar.

I'm in the VM and testing now.

@ArrayBolt3
Copy link
Contributor Author

ArrayBolt3 commented Oct 15, 2025

So, the fix might be as easy as checking if DBUS_SESSION_BUS_ADDRESS variable is set first.

So, this didn't work (at least not my variant of it), doing this and changing the || exit to || true resulted in the qubes-gui-agent.service unit locking up again... which I think gives a critical hint about what's happening. Earlier I noted:

It's as if the system has to have the environment variables set right here by systemctl --user show-environment, and they have to be set using eval, other mechanisms don't work.

Well, there's one thing about the block of code in qubes-session we're modifying that isn't happening in any of the attempts that fail, and that is happening in all of the attempts that work. That's the || exit bit, where if systemctl --user show-environment fails, we bail out. With the original code, if we start the VM and then shut it down before D-Bus comes all the way up, the systemctl --user show-environment command probably does fail, resulting in an early termination of the session and allowing qubes-gui-agent to exit normally.

With all of the broken variants of the new code, if the D-Bus connection attempt fails, we just ignore it and move on, trying to start the session anyway, thus resulting in qubes-session continuing to run despite the fact that the session wasn't able to start in the first place. I guess qubes-gui is stuck waiting for the X server that never started to connect, and for some reason it just locks up on SIGTERM? I don't fully understand how or why that happens; the only call in vmside.c's SIGTERM signal handler that could hang is a waitpid() call in terminate_and_cleanup_xorg(), and shouldn't that return immediately with an ECHILD error if X never started? And if X did start, why is it stubbornly refusing to exit? (I assume we're not hanging on the accept() call since accept() can be interrupted by a signal.)

I dislike having an exit in a profile.d script, that seems like it will go wrong for systems where D-Bus or systemd isn't used (something you alluded to previously). What we really want is to be able to try to connect to D-Bus and systemd to import the environment, ignoring any failure (or at most warning about it). We also want the session to be able to be preempted by an attempted shutdown at any time. The code seems like it should do that, but evidently it doesn't.

IMO, and assuming my hunch here is right, I think the right solution would be to go back to the fix in qubes-gui-runuser.c, and then fix whatever's wrong in vmside.c that's resulting in it hanging when sent a SIGTERM. I don't know exactly what that would look like or if it's even possible.

@ArrayBolt3
Copy link
Contributor Author

I don't fully understand how or why that happens; the only call in vmside.c's SIGTERM signal handler that could hang is a waitpid() call in terminate_and_cleanup_xorg(), and shouldn't that return immediately with an ECHILD error if X never started? And if X did start, why is it stubbornly refusing to exit? (I assume we're not hanging on the accept() call since accept() can be interrupted by a signal.)

I think I figured it out!

  • systemd attempts to stop qubes-gui-agent.service, and sends a SIGTERM to qubes-gui, which catches it and attempts to propagate the signal to whatever its child process is via terminate_and_cleanup_xorg().
  • The child process is qubes-run-xorg, which exec's qubes-gui-runuser, thus the SIGTERM is received by qubes-gui-runuser. qubes-gui then waitpid()'s on qubes-gui-runuser.
  • qubes-gui-runuser sets up a very simple signal handler for SIGTERM called propagate_signal(), which simply attempts to run kill(child_pid, signal) and does zero error checking. The idea seems to be that the child process will be killed, resulting in do_execute() returning, resulting in qubes-gui-runuser exiting.
  • However, if qubes-gui-runuser has not yet managed to actually start a child process by the time it receives the SIGTERM, propagate_signal() will act as a no-op, since child_pid will be 0 at that point. do_execute() will then proceed to set up the session as normal, leaving qubes-gui waiting forever for qubes-gui-runuser to exit, which it has no intention of doing.
  • systemd notices this deadlock after a minute and a half, and then SIGKILLs qubes-gui, qubes-gui-runuser, and the whole session (this is why we see things like xfsettingsd getting SIGKILL'd at this point).

This would result in exactly the symptoms we see here, I believe. This can be fixed by changing propagate_signal() from this:

static void propagate_signal(int signal) {
    if (child_pid)
        kill(child_pid, signal);
}

to this:

static void propagate_signal(int signal) {
    if (!child_pid) {
        exit(0);
    }
    kill(child_pid, signal);
}

I think we should also change this:

    pid = fork();

    switch (pid) {
        ....
    }

    child_pid = pid;

to this:

    child_pid = fork();

    switch (child_pid) {
        ....
    }

Otherwise we have a (probably very tiny) race window in between the fork() and the child_pid = pid, which could result in qubes-gui-runuser and qubes-gui exiting, leaving X trying to connect to qubes-gui and never succeeding.

… it through to the session

qubes-session previously loaded environment variables from
`systemctl --user show-environment`, so that variables set using systemd
environment generators would be present in the user session. However,
doing this meant any environment variables defined or augmented by
scripts under /etc/profile.d could be clobbered by versions of those
same variables from the systemd user instance, resulting in application
misbehavior in some instances.

To fix this, load all environment variables from systemd's user instance
in qubes-gui-runner instead. This will augment the environment provided
by PAM with the environment provided by systemd, then the /etc/profile.d
scripts can augment the environment further. This should prevent any
variables from being incorrectly clobbered, and allow all mechanisms of
providing environment variables to the end-user's session to function
as intended.

Fixes: QubesOS/qubes-issues#10299
@ArrayBolt3 ArrayBolt3 force-pushed the arraybolt3/systemd-env-fix branch from a38cb53 to 6291d8b Compare October 15, 2025 03:44
@ArrayBolt3
Copy link
Contributor Author

New solution appears to work - I was able to reboot debian-13-xfce with this code in place ten times in a row without any hang during shutdown. This shouldn't result in the same sys-gui issues as before, though I haven't tested it there (I'm hoping openQA can handle that for me).

The only things that still worry me about this solution are:

  • In one instance, I booted debian-13-xfce and opened a terminal in it, then ran /sbin/shutdown now from within that terminal. During shutdown, qubes-gui-agent.service ended up failing with an exit code of 1 during deactivation. I don't know why, I'm hoping it's because systemd SIGTERM'd X separately from qubes-gui because there was an active session when the shutdown started, and that this is just normal behavior.
  • qubes-gui-agent.service is getting terminated well before I see a session closed for user user message from PAM, despite the fact that qubes-gui-runuser runs pam_close_session() after the child process terminates. It looks like systemd-logind ends up cleaning up the session. If I open a DispVM console in debian-13-xfce and run sudo systemctl stop qubes-gui-agent.service from there, I see the PAM session is closed properly, so I don't think my modified signal handler is broken. I guess one can argue that at least the session is getting cleaned up, so this isn't a big deal, but it is slightly weird.

@marmarek
Copy link
Member

PipelineRetry

@marmarek marmarek merged commit 0a78ebf into QubesOS:main Oct 20, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XDG_DATA_DIRS is reset to a minimal list of directories in applications opened from the Qubes app menu, breaking default application resolution

3 participants