Skip to content

Commit 173346d

Browse files
committed
Add Seccomp Notify support
This adds the specification for Seccomp Userspace Notification and the Golang bindings. This contains: - New fields in the seccomp section to use with seccomp userspace notification. - Additional SeccompState struct containing the container state and file descriptors passed for seccomp. This was discussed in the OCI Weekly Discussion on September 16th, 2020. After review on github, this implementation was changed to the "Proposal with listenerPath and listenerExtraMetadata". For more information see: - #1073 (comment) Docs presented on the community meeting (for the old implementation using hooks): - https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020 - https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit Documentation for this feature: - https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification - man pages: seccomp_user_notif.2 at https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif - brauner's blog: https://brauner.github.io/2020/07/23/seccomp-notify.html This PR is an alternative proposal to PR 1038. While similar in nature, the main difference is that this PR adds optional metadata to be sent to the seccomp agent and specifies how the UNIX socket MUST be used. Signed-off-by: Alban Crequy <[email protected]> Signed-off-by: Rodrigo Campos <[email protected]>
1 parent e6143ca commit 173346d

File tree

5 files changed

+95
-11
lines changed

5 files changed

+95
-11
lines changed

config-linux.md

Lines changed: 59 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,19 @@ The following parameters can be specified to set up seccomp:
624624
* `SECCOMP_FILTER_FLAG_TSYNC`
625625
* `SECCOMP_FILTER_FLAG_LOG`
626626
* `SECCOMP_FILTER_FLAG_SPEC_ALLOW`
627+
* `SECCOMP_FILTER_FLAG_NEW_LISTENER`
628+
* `SECCOMP_FILTER_FLAG_TSYNC_ESRCH`
629+
630+
* **`listenerPath`** *(string, OPTIONAL)* - specifies the path of UNIX domain socket over which the runtime will send the [seccomp state](#seccompstate) data structure, using `SCM_RIGHTS` for file descriptors.
631+
This socket MUST use `AF_UNIX` domain and `SOCK_STREAM` type.
632+
The runtime MUST send exactly one [seccomp state](#seccompstate) per connection.
633+
The connection MUST NOT be reused and it MUST be closed after sending a seccomp state.
634+
If sending to this socket fails, the runtime MUST [generate an error](runtime.md#errors).
635+
This field MUST be set if and only if the flag `SECCOMP_FILTER_FLAG_NEW_LISTENER` is used.
636+
637+
* **`listenerMetadata`** *(string, OPTIONAL)* - specifies an opaque data to pass to the seccomp agent.
638+
This string will be sent as a field in the [seccomp state](#seccompstate).
639+
This field MUST NOT be set if `listenerPath` is not set.
627640

628641
* **`syscalls`** *(array of objects, OPTIONAL)* - match a syscall in seccomp.
629642
While this property is OPTIONAL, some values of `defaultAction` are not useful without `syscalls` entries.
@@ -633,7 +646,7 @@ The following parameters can be specified to set up seccomp:
633646
* **`names`** *(array of strings, REQUIRED)* - the names of the syscalls.
634647
`names` MUST contain at least one entry.
635648
* **`action`** *(string, REQUIRED)* - the action for seccomp rules.
636-
A valid list of constants as of libseccomp v2.4.0 is shown below.
649+
A valid list of constants as of libseccomp v2.5.0 is shown below.
637650

638651
* `SCMP_ACT_KILL`
639652
* `SCMP_ACT_KILL_PROCESS`
@@ -642,6 +655,7 @@ The following parameters can be specified to set up seccomp:
642655
* `SCMP_ACT_TRACE`
643656
* `SCMP_ACT_ALLOW`
644657
* `SCMP_ACT_LOG`
658+
* `SCMP_ACT_NOTIFY`
645659

646660
* **`errnoRet`** *(uint, OPTIONAL)* - the errno return code to use.
647661
Some actions like `SCMP_ACT_ERRNO` and `SCMP_ACT_TRACE` allow to specify the errno
@@ -685,6 +699,50 @@ The following parameters can be specified to set up seccomp:
685699
}
686700
```
687701

702+
### <a name="seccompstate" />The Seccomp State
703+
704+
The seccomp state is a data structure passed via a UNIX socket.
705+
The container runtime MUST send the seccomp state over the UNIX socket as regular payload serialized in JSON.
706+
The container runtime MUST also send the file descriptor(s) via `SCM_RIGHTS`: the seccomp file descriptor returned by the seccomp syscall and, optionally, the process file descriptor (e.g as returned by `pidfd_open(2)` or by `clone(2)` with the `CLONE_PID` flag).
707+
The container runtime MAY use several `sendmsg(2)` calls to send the aforementioned data.
708+
If more than one `sendmsg(2)` is used, the file descriptors MUST be sent only in the first call.
709+
710+
The seccomp state includes the following properties:
711+
712+
* **`ociVersion`** (string, REQUIRED) is version of the Open Container Initiative Runtime Specification with which the seccomp state complies.
713+
* **`seccompFd`** (int, REQUIRED) is the index of the file descriptor in the `SCM_RIGHTS` array refering to the seccomp notify file descriptor.
714+
The value MUST be 0.
715+
* **`pid`** (int, REQUIRED) is the process ID, as seen by the runtime, on which the seccomp filter is applied (target process).
716+
* **`pidFd`** (int, OPTIONAL) is the index of the file descriptor in the `SCM_RIGHTS` array referring to the target process file descriptor.
717+
If present, this value MUST NOT be zero.
718+
* **`metadata`** (string, OPTIONAL) is the string set in `listenerMetadata`.
719+
If the `listenerMetadata` is set, then the runtime MUST set this field too.
720+
* **`state`** (map, REQUIRED) is the [state](runtime.md#state) of the container.
721+
722+
Example:
723+
724+
```json
725+
{
726+
"ociVersion": "0.2.0",
727+
"seccompFd": 0,
728+
"pid": 4422,
729+
"pidFd": 1,
730+
"state": {
731+
"ociVersion": "0.2.0",
732+
"id": "oci-container1",
733+
"status": "creating",
734+
"pid": 4422,
735+
"bundle": "/containers/redis",
736+
"annotations": {
737+
"myKey": "myValue"
738+
}
739+
}
740+
}
741+
```
742+
743+
Note that if `state.status` is `creating`, the seccomp filter is created following the [`start`](runtime.md#start) command and `.pid` has the same value as `.state.pid`.
744+
And if `state.status` is `running`, the seccomp filter is created following an `exec` command and `.pid` has a different value than `.state.pid`.
745+
688746
## <a name="configLinuxRootfsMountPropagation" />Rootfs Mount Propagation
689747

690748
**`rootfsPropagation`** (string, OPTIONAL) sets the rootfs's mount propagation.

schema/config-linux.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,12 @@
209209
"$ref": "defs-linux.json#/definitions/SeccompFlag"
210210
}
211211
},
212+
"listenerPath": {
213+
"type": "string"
214+
},
215+
"listenerMetadata": {
216+
"type": "string"
217+
},
212218
"architectures": {
213219
"type": "array",
214220
"items": {

schema/defs-linux.json

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,15 +60,18 @@
6060
"SCMP_ACT_ERRNO",
6161
"SCMP_ACT_TRACE",
6262
"SCMP_ACT_ALLOW",
63-
"SCMP_ACT_LOG"
63+
"SCMP_ACT_LOG",
64+
"SCMP_ACT_NOTIFY"
6465
]
6566
},
6667
"SeccompFlag": {
6768
"type": "string",
6869
"enum": [
6970
"SECCOMP_FILTER_FLAG_TSYNC",
7071
"SECCOMP_FILTER_FLAG_LOG",
71-
"SECCOMP_FILTER_FLAG_SPEC_ALLOW"
72+
"SECCOMP_FILTER_FLAG_SPEC_ALLOW",
73+
"SECCOMP_FILTER_FLAG_NEW_LISTENER",
74+
"SECCOMP_FILTER_FLAG_TSYNC_ESRCH"
7275
]
7376
},
7477
"SeccompOperators": {

specs-go/config.go

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -598,10 +598,12 @@ type VMImage struct {
598598

599599
// LinuxSeccomp represents syscall restrictions
600600
type LinuxSeccomp struct {
601-
DefaultAction LinuxSeccompAction `json:"defaultAction"`
602-
Architectures []Arch `json:"architectures,omitempty"`
603-
Flags []LinuxSeccompFlag `json:"flags,omitempty"`
604-
Syscalls []LinuxSyscall `json:"syscalls,omitempty"`
601+
DefaultAction LinuxSeccompAction `json:"defaultAction"`
602+
Architectures []Arch `json:"architectures,omitempty"`
603+
Flags []LinuxSeccompFlag `json:"flags,omitempty"`
604+
ListenerPath string `json:"listenerPath,omitempty"`
605+
ListenerMetadata string `json:"listenerMetadata,omitempty"`
606+
Syscalls []LinuxSyscall `json:"syscalls,omitempty"`
605607
}
606608

607609
// Arch used for additional architectures

specs-go/state.go

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,17 @@ type ContainerState string
55

66
const (
77
// StateCreating indicates that the container is being created
8-
StateCreating ContainerState = "creating"
8+
StateCreating ContainerState = "creating"
99

1010
// StateCreated indicates that the runtime has finished the create operation
11-
StateCreated ContainerState = "created"
11+
StateCreated ContainerState = "created"
1212

1313
// StateRunning indicates that the container process has executed the
1414
// user-specified program but has not exited
15-
StateRunning ContainerState = "running"
15+
StateRunning ContainerState = "running"
1616

1717
// StateStopped indicates that the container process has exited
18-
StateStopped ContainerState = "stopped"
18+
StateStopped ContainerState = "stopped"
1919
)
2020

2121
// State holds information about the runtime state of the container.
@@ -33,3 +33,18 @@ type State struct {
3333
// Annotations are key values associated with the container.
3434
Annotations map[string]string `json:"annotations,omitempty"`
3535
}
36+
37+
type SeccompState struct {
38+
// Version is the version of the specification that is supported.
39+
Version string `json:"ociVersion"`
40+
// SeccompFd is the index of the file descriptor in the `SCM_RIGHTS` array referring to the seccomp notify file descriptor. It is always zero.
41+
SeccompFd int `json:"seccompFd"`
42+
// Pid is the process ID, as seen by the runtime, on which the seccomp filter is applied (target process).
43+
Pid int `json:"pid"`
44+
// PidFd is is the index of the file descriptor in the `SCM_RIGHTS` array referring to the target process file descriptor (e.g as returned by `pidfd_open(2)` or by `clone(2)` with the `CLONE_PID` flag).
45+
PidFd int `json:"pidFd,omitempty"`
46+
// Opaque metadata copied from the listenerMetadata seccomp field.
47+
Metadata string `json:"metadata,omitempty"`
48+
// State of the container
49+
State State `json:"state"`
50+
}

0 commit comments

Comments
 (0)