Description
When trying to use secrets with the passdriver it fails with an opaque error code "exit status 2".
This happens both when creating the secret, and (assuming you manually create the secret) when you try to load the secret.
The crux of this issue seems to stem from this message
gpg: Note: database_open 134217901 waiting for lock (held by 6970) ...
Where PID 6970 is keyboxd
admin 6970 0.0 0.0 231492 4276 ? Ss 12:18 0:00 keyboxd --homedir /home/admin/.gnupg --daemon
admin 7041 0.0 0.0 305224 2376 ? Ss 12:18 0:00 keyboxd --homedir /home/admin/.gnupg --daemon
Repro
I am running Fedora 42 (Server Edition), on a PowerEdge 2950. It has 2 2-core CPUs, 32 GiB of RAM, and these issues are occurring with files on the boot XFS SSD. Podman is being run rootlessly, but these errors appear when running the command with sudo.
You of course will need a GPG key installed.
It's worth noting here that if you haven't trusted the key you get the same "exit status 2" error, but it happens immediately, and I haven't bothered diagnosing what error messages it produces in the background.
Once the key is trusted you can attempt to create the secret. In this case I used this test command
printf password123 | podman secret create test_secret --driver=pass --driver-opts=key=ABCD1234 -
This will then stall and eventually report
Error: creating secret test_secret: exit status 2
If we run ps auxf
while it is stalled we should see something to the effect of
admin 46016 1.4 0.1 1942312 44588 pts/0 Sl+ 16:07 0:00 | \\_ podman secret create test_secret --driver=pass --driver-opts=key=ABCD1234 -
admin 46028 0.0 0.0 11704 4500 pts/0 SL+ 16:07 0:00 | \\_ gpg --encrypt -r ABCD1234 -o /home/admin/02595e1876d7790afb3eedb42.gpg
This shows that the command that is stalling is the GPG command. Running this command manually works without a problem.
This repo discards StdErr, but we can still read it with the command
ps auxf | grep -E 'gpg +--' | awk '{print $2}' | xargs -I{} cat /proc/{}/fd/2
This command has 4 parts: ps auxf
prints out the list of processes; grep -E 'gpg +--'
finds the GPG process, we need some special RegEx here to exclude the grep
process that would otherwise also be returned; awk '{print $2}'
came from ChatGPT and extracts the PID; xargs -I{} /proc/{}/fd/2
also came from ChatGPT and passes the PID into the path for ls
and loads the File Descriptor for StdErr.
This will produce at least one log message along the lines of
gpg: Note: database_open 134217901 waiting for lock (held by 6970) ...
The PID 6970 that is holding the lock belongs to one of the instances of keyboxd
admin 6970 0.0 0.0 231492 4276 ? Ss 12:18 0:00 keyboxd --homedir /home/admin/.gnupg --daemon
admin 7041 0.0 0.0 305224 2376 ? Ss 12:18 0:00 keyboxd --homedir /home/admin/.gnupg --daemon
Code Sources
The vague error appears to come from secrets.go#L237
err = driver.Store(secr.ID, data)
if err != nil {
return "", fmt.Errorf("creating secret %s: %w", name, err)
}
There is another location that could also be the source, but it is unlikely since the default driver works.
This then traces to passdriver.go#L139
func (d *Driver) Store(id string, data []byte) error {
if _, err := d.Lookup(id); err == nil {
return fmt.Errorf("%s: %w", id, define.ErrSecretIDExists)
}
in := bytes.NewReader(data)
key, err := d.getPath(id)
if err != nil {
return err
}
return d.gpg(context.TODO(), in, nil, "--encrypt", "-r", d.KeyID, "-o", key)
}
which in turn executes the GPG command at passdriver.go#166.
func (d *Driver) gpg(ctx context.Context, in io.Reader, out io.Writer, args ...string) error {
if d.GPGHomedir != "" {
args = append([]string{"--homedir", d.GPGHomedir}, args...)
}
cmd := exec.CommandContext(ctx, "gpg", args...)
cmd.Env = os.Environ()
cmd.Stdin = in
cmd.Stdout = out
cmd.Stderr = io.Discard
return cmd.Run()
}
It seems important to note that shortly before the GPG encrypt command is called that Lookup is called
func (d *Driver) Lookup(id string) ([]byte, error) {
out := &bytes.Buffer{}
key, err := d.getPath(id)
if err != nil {
return nil, err
}
if err := d.gpg(context.TODO(), nil, out, "--decrypt", key); err != nil {
return nil, fmt.Errorf("%s: %w", id, define.ErrNoSuchSecret)
}
if out.Len() == 0 {
return nil, fmt.Errorf("%s: %w", id, define.ErrNoSuchSecret)
}
return out.Bytes(), nil
}
Additional Testing
I did clone this repo and do some manual testing, but it was limited and had to be done with unit tests.
I essentially recreated the exact command I would be using, but made sure that Stdout & Stderr were redirected. Additionally I logged the GPG commands before than ran. This results in the output
Running command: gpg --decrypt /home/admin/test_secret.gpg
gpg: can't open '/home/admin/test_secret.gpg': No such file or directory
gpg: decrypt_message failed: No such file or directory
Running command: gpg --encrypt -r ABCD1234 -o /home/admin/test_secret.gpg
Based on what I can find, this indicates to me that the decrypt GPG command could be locking the database and it is not getting unlocked by the time the encrypt command runs. I'm not entirely sure why this works in the unit test but not with Podman, if I had to guess, I'd wager that running go test
rebuilds the code without optimizations, and so there is enough of a delay that it just works. A pitfall with this hypothesis is that the "waiting for lock" message can appear multiple times, indicating that it is at least trying several times.
I also manually tried tossing the secret into secrets.json
, and running a container with
podman run --secret=test_secret debian cat "/run/secrets/test_secret"
Which likewise stalls and errors out with
Error: 02595e1876d7790afb3eedb42: no such secret
But intercepting StdErr shows that it is actually the same problem as when creating secrets.
Final Notes
I'm not the only one to have this issue. My friend is running a similar setup, albeit with very different server specs, and was encountering this issue, which led me here.
I've tried running the GPG commands manually many times and don't encounter this error. It only happens when they run as a result of the podman
command.
It would also be nice if Stdout & Stderr weren't simply discarded for the GPG command. I would prefer that StdErr is always written to the console, but having it only write with a verbose flag is still better than the debugging I had to do. I'd even accept an obscure --driver-opt
.