rootless podman can't bind-mount allocdir

Nomad considers filesystem permissions for the allocs directory to be outside of it's own security model (https://developer.hashicorp.com/nomad/docs/concepts/security)
> *Access (read or write) to the Nomad data directory* - Information about the allocations scheduled to a Nomad client is persisted to its data directory. This would include any secrets in any of the allocation's file systems.

To protect the secrets written into job allocation directories from unprivileged local users with access to the nomad client, it's required to set restrictive permissions on the allocs directory or parent, such as `0700`. The important part here is that the other permission does not include +x/1 to allow directory traversal, since secrets are written into subdirectories with accessible permissions (nobody:nobody 0777).

This seems to be fundamentally incompatible with rootless containers, since the unprivileged user needs to traverse into the alloc dir in order to stat them for bind-mounting into the container. Restrictive permissions yield Driver Failure errors such as the following on container startup

```
rpc error: code = Unknown desc = failed to start task, could not create container: cannot create container, status code: 500: {"cause":"permission denied","message":"statfs /data/nomad/server/alloc/1be2b692-465d-a1ac-54ff-e6f7a43c9fa4/alloc: permission denied","response":500}
```

One of the benefits of rootless containers and multiple sockets would be enabling stronger isolation between users on a host. Multiple sockets requires all the users which will run containers under nomad have access to the allocs directory, and therefore inherently all the secrets written to them for all jobs run by all users. This is sadly a dealbreaker for us, since it would allow secrets to be leaked across user boundaries.

The only way I can think to work around this would be nomad setting more restrictive permissions on the alloc directory itself (i.e. the one named after the job uid), e.g. setting ownership to match the podman socket owner, and 0700 permissions. Nomad itself when running as root would be able to bypass the restrictive permissions. Or POSIX ACLs on supported filesystems. I'm not sure if this can be practically implemented in the task driver alone, or if it would need support in Nomad core. At the very least, some information would need to be collected about which filesystem user the directory would need to be made accessible to. Currently the multiple-socket implementation doesn't understand which user "owns" the socket configured.

Alternatively, could this task driver bind-mount the alloc dir into some alternate path accessible by only the podman socket owner (e.g. beneath /run/user/UID), by bypass the more restrictive permission on the parent allocs dir?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rootless podman can't bind-mount allocdir #388

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rootless podman can't bind-mount allocdir #388

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions