Seccomp stands for secure computing mode and has been a feature of the Linux kernel since version 2.6.12. It can be used to sandbox the privileges of a process, restricting the calls it is able to make from userspace into the kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a node to your Pods and containers.
Kubernetes v1.19 [stable]
There are four ways to specify a seccomp profile for a pod:
spec.securityContext.seccompProfilespec.containers[*].securityContext.seccompProfilespec.initContainers[*].securityContext.seccompProfilespec.ephemeralContainers[*].securityContext.seccompProfileapiVersion: v1
kind: Pod
metadata:
name: pod
spec:
securityContext:
seccompProfile:
type: Unconfined
ephemeralContainers:
- name: ephemeral-container
image: debian
securityContext:
seccompProfile:
type: RuntimeDefault
initContainers:
- name: init-container
image: debian
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: container
image: docker.io/library/debian:stable
securityContext:
seccompProfile:
type: Localhost
localhostProfile: my-profile.json
The Pod in the example above runs as Unconfined, while the
ephemeral-container and init-container specifically defines
RuntimeDefault. If the ephemeral or init container would not have set the
securityContext.seccompProfile field explicitly, then the value would be
inherited from the Pod. The same applies to the container, which runs a
Localhost profile my-profile.json.
Generally speaking, fields from (ephemeral) containers have a higher priority than the Pod level value, while containers which do not set the seccomp field inherit the profile from the Pod.
privileged: true set in the container's securityContext. Privileged
containers always run as Unconfined.The following values are possible for the seccompProfile.type:
UnconfinedRuntimeDefaultLocalhostlocalhostProfile will be applied, which has to be available on the node
disk (on Linux it's /var/lib/kubelet/seccomp). The availability of the seccomp
profile is verified by the
container runtime
on container creation. If the profile does not exist, then the container
creation will fail with a CreateContainerError.Localhost profilesSeccomp profiles are JSON files following the scheme defined by the OCI runtime specification. A profile basically defines actions based on matched syscalls, but also allows to pass specific values as arguments to syscalls. For example:
{
"defaultAction": "SCMP_ACT_ERRNO",
"defaultErrnoRet": 38,
"syscalls": [
{
"names": [
"adjtimex",
"alarm",
"bind",
"waitid",
"waitpid",
"write",
"writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
The defaultAction in the profile above is defined as SCMP_ACT_ERRNO and
will return as fallback to the actions defined in syscalls. The error is
defined as code 38 via the defaultErrnoRet field.
The following actions are generally possible:
SCMP_ACT_ERRNOSCMP_ACT_ALLOWSCMP_ACT_KILL_PROCESSSCMP_ACT_KILL_THREAD and SCMP_ACT_KILLSCMP_ACT_TRAPSIGSYS signal.SCMP_ACT_NOTIFY and SECCOMP_RET_USER_NOTIF.SCMP_ACT_TRACESCMP_ACT_LOGSome actions like SCMP_ACT_NOTIFY or SECCOMP_RET_USER_NOTIF may be not
supported depending on the container runtime, OCI runtime or Linux kernel
version being used. There may be also further limitations, for example that
SCMP_ACT_NOTIFY cannot be used as defaultAction or for certain syscalls like
write. All those limitations are defined by either the OCI runtime
(runc,
crun) or
libseccomp.
The syscalls JSON array contains a list of objects referencing syscalls by
their respective names. For example, the action SCMP_ACT_ALLOW can be used
to create a whitelist of allowed syscalls as outlined in the example above. It
would also be possible to define another list using the action SCMP_ACT_ERRNO
but a different return (errnoRet) value.
It is also possible to specify the arguments (args) passed to certain
syscalls. More information about those advanced use cases can be found in the
OCI runtime spec
and the Seccomp Linux kernel documentation.