Skip to content

Commit a8c4a9e

Browse files
authored
Merge pull request #1074 from kinvolk/rata_seccomp_listenerpath
Add Seccomp Notify support using UNIX sockets and container metadata
2 parents f6174e8 + 58798e7 commit a8c4a9e

File tree

5 files changed

+98
-12
lines changed

5 files changed

+98
-12
lines changed

config-linux.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -629,6 +629,21 @@ The following parameters can be specified to set up seccomp:
629629
* `SECCOMP_FILTER_FLAG_LOG`
630630
* `SECCOMP_FILTER_FLAG_SPEC_ALLOW`
631631

632+
* **`listenerPath`** *(string, OPTIONAL)* - specifies the path of UNIX domain socket over which the runtime will send the [container process state](#containerprocessstate) data structure when the `SCMP_ACT_NOTIFY` action is used.
633+
This socket MUST use `AF_UNIX` domain and `SOCK_STREAM` type.
634+
The runtime MUST send exactly one [container process state](#containerprocessstate) per connection.
635+
The connection MUST NOT be reused and it MUST be closed after sending a seccomp state.
636+
If sending to this socket fails, the runtime MUST [generate an error](runtime.md#errors).
637+
If the `SCMP_ACT_NOTIFY` action is not used this value is ignored.
638+
639+
The runtime sends the following file descriptors using `SCM_RIGHTS` and set their names in the `fds` array of the [container process state](#containerprocessstate):
640+
641+
* **`seccompFd`** (string, REQUIRED) is the seccomp file descriptor returned by the seccomp syscall.
642+
643+
* **`listenerMetadata`** *(string, OPTIONAL)* - specifies an opaque data to pass to the seccomp agent.
644+
This string will be sent as the `metadata` field in the [container process state](#containerprocessstate).
645+
This field MUST NOT be set if `listenerPath` is not set.
646+
632647
* **`syscalls`** *(array of objects, OPTIONAL)* - match a syscall in seccomp.
633648
While this property is OPTIONAL, some values of `defaultAction` are not useful without `syscalls` entries.
634649
For example, if `defaultAction` is `SCMP_ACT_KILL` and `syscalls` is empty or unset, the kernel will kill the container process on its first syscall.
@@ -637,7 +652,7 @@ The following parameters can be specified to set up seccomp:
637652
* **`names`** *(array of strings, REQUIRED)* - the names of the syscalls.
638653
`names` MUST contain at least one entry.
639654
* **`action`** *(string, REQUIRED)* - the action for seccomp rules.
640-
A valid list of constants as of libseccomp v2.4.0 is shown below.
655+
A valid list of constants as of libseccomp v2.5.0 is shown below.
641656

642657
* `SCMP_ACT_KILL`
643658
* `SCMP_ACT_KILL_PROCESS`
@@ -647,6 +662,7 @@ The following parameters can be specified to set up seccomp:
647662
* `SCMP_ACT_TRACE`
648663
* `SCMP_ACT_ALLOW`
649664
* `SCMP_ACT_LOG`
665+
* `SCMP_ACT_NOTIFY`
650666

651667
* **`errnoRet`** *(uint, OPTIONAL)* - the errno return code to use.
652668
Some actions like `SCMP_ACT_ERRNO` and `SCMP_ACT_TRACE` allow to specify the errno code to return.
@@ -691,6 +707,45 @@ The following parameters can be specified to set up seccomp:
691707
}
692708
```
693709

710+
### <a name="containerprocessstate" />The Container Process State
711+
712+
The container process state is a data structure passed via a UNIX socket.
713+
The container runtime MUST send the container process state over the UNIX socket as regular payload serialized in JSON and file descriptors MUST be sent using `SCM_RIGHTS`.
714+
The container runtime MAY use several `sendmsg(2)` calls to send the aforementioned data.
715+
If more than one `sendmsg(2)` is used, the file descriptors MUST be sent only in the first call.
716+
717+
The container process state includes the following properties:
718+
719+
* **`ociVersion`** (string, REQUIRED) is version of the Open Container Initiative Runtime Specification with which the container process state complies.
720+
* **`fds`** (array, OPTIONAL) is a string array containing the names of the file descriptors passed.
721+
The index of the name in this array corresponds to index of the file descriptors in the `SCM_RIGHTS` array.
722+
* **`pid`** (int, REQUIRED) is the container process ID, as seen by the runtime.
723+
* **`metadata`** (string, OPTIONAL) opaque metadata.
724+
* **`state`** ([state](runtime.md#state), REQUIRED) is the state of the container.
725+
726+
Example sending a single `seccompFD` file descriptor in the `SCM_RIGHTS` array:
727+
728+
```json
729+
{
730+
"ociVersion": "0.2.0",
731+
"fds": [
732+
"seccompFd"
733+
],
734+
"pid": 4422,
735+
"metadata": "MKNOD=/dev/null,/dev/net/tun;BPF_MAP_TYPES=hash,array",
736+
"state": {
737+
"ociVersion": "0.2.0",
738+
"id": "oci-container1",
739+
"status": "creating",
740+
"pid": 4422,
741+
"bundle": "/containers/redis",
742+
"annotations": {
743+
"myKey": "myValue"
744+
}
745+
}
746+
}
747+
```
748+
694749
## <a name="configLinuxRootfsMountPropagation" />Rootfs Mount Propagation
695750

696751
**`rootfsPropagation`** (string, OPTIONAL) sets the rootfs's mount propagation.

schema/config-linux.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,12 @@
212212
"$ref": "defs-linux.json#/definitions/SeccompFlag"
213213
}
214214
},
215+
"listenerPath": {
216+
"type": "string"
217+
},
218+
"listenerMetadata": {
219+
"type": "string"
220+
},
215221
"architectures": {
216222
"type": "array",
217223
"items": {

schema/defs-linux.json

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,15 +61,17 @@
6161
"SCMP_ACT_ERRNO",
6262
"SCMP_ACT_TRACE",
6363
"SCMP_ACT_ALLOW",
64-
"SCMP_ACT_LOG"
64+
"SCMP_ACT_LOG",
65+
"SCMP_ACT_NOTIFY"
6566
]
6667
},
6768
"SeccompFlag": {
6869
"type": "string",
6970
"enum": [
7071
"SECCOMP_FILTER_FLAG_TSYNC",
7172
"SECCOMP_FILTER_FLAG_LOG",
72-
"SECCOMP_FILTER_FLAG_SPEC_ALLOW"
73+
"SECCOMP_FILTER_FLAG_SPEC_ALLOW",
74+
"SECCOMP_FILTER_FLAG_NEW_LISTENER"
7375
]
7476
},
7577
"SeccompOperators": {

specs-go/config.go

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -598,11 +598,13 @@ type VMImage struct {
598598

599599
// LinuxSeccomp represents syscall restrictions
600600
type LinuxSeccomp struct {
601-
DefaultAction LinuxSeccompAction `json:"defaultAction"`
602-
DefaultErrnoRet *uint `json:"defaultErrnoRet,omitempty"`
603-
Architectures []Arch `json:"architectures,omitempty"`
604-
Flags []LinuxSeccompFlag `json:"flags,omitempty"`
605-
Syscalls []LinuxSyscall `json:"syscalls,omitempty"`
601+
DefaultAction LinuxSeccompAction `json:"defaultAction"`
602+
DefaultErrnoRet *uint `json:"defaultErrnoRet,omitempty"`
603+
Architectures []Arch `json:"architectures,omitempty"`
604+
Flags []LinuxSeccompFlag `json:"flags,omitempty"`
605+
ListenerPath string `json:"listenerPath,omitempty"`
606+
ListenerMetadata string `json:"listenerMetadata,omitempty"`
607+
Syscalls []LinuxSyscall `json:"syscalls,omitempty"`
606608
}
607609

608610
// Arch used for additional architectures

specs-go/state.go

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,17 @@ type ContainerState string
55

66
const (
77
// StateCreating indicates that the container is being created
8-
StateCreating ContainerState = "creating"
8+
StateCreating ContainerState = "creating"
99

1010
// StateCreated indicates that the runtime has finished the create operation
11-
StateCreated ContainerState = "created"
11+
StateCreated ContainerState = "created"
1212

1313
// StateRunning indicates that the container process has executed the
1414
// user-specified program but has not exited
15-
StateRunning ContainerState = "running"
15+
StateRunning ContainerState = "running"
1616

1717
// StateStopped indicates that the container process has exited
18-
StateStopped ContainerState = "stopped"
18+
StateStopped ContainerState = "stopped"
1919
)
2020

2121
// State holds information about the runtime state of the container.
@@ -33,3 +33,24 @@ type State struct {
3333
// Annotations are key values associated with the container.
3434
Annotations map[string]string `json:"annotations,omitempty"`
3535
}
36+
37+
const (
38+
// SeccompFdName is the name of the seccomp notify file descriptor.
39+
SeccompFdName string = "seccompFd"
40+
)
41+
42+
// ContainerProcessState holds information about the state of a container process.
43+
type ContainerProcessState struct {
44+
// Version is the version of the specification that is supported.
45+
Version string `json:"ociVersion"`
46+
// Fds is a string array containing the names of the file descriptors passed.
47+
// The index of the name in this array corresponds to index of the file
48+
// descriptor in the `SCM_RIGHTS` array.
49+
Fds []string `json:"fds"`
50+
// Pid is the process ID as seen by the runtime.
51+
Pid int `json:"pid"`
52+
// Opaque metadata.
53+
Metadata string `json:"metadata,omitempty"`
54+
// State of the container.
55+
State State `json:"state"`
56+
}

0 commit comments

Comments
 (0)