You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am reporting a bug that has already been mentioned on AppArmor bug tracker here:
Despite a profile permitting all mounts, AppArmor denies a mount call when running Buildah in Docker. When
using apparmor=unconfined, it works as expected.
Tested on:
Ubuntu 22.04.5 LTS (apparmor 3.0.4-2ubuntu2.4, Docker version 27.3.1, build ce12230)
Ubuntu 24.04.1TS (apparmor 4.0.1really4.0.1-0ubuntu0.24.04.3, Docker version 27.3.1, build ce12230)
To reproduce:
docker run
--device /dev/fuse
--volume $HOME/containers/:/home/build/.local/share/containers
--security-opt seccomp=seccomp.json
--security-opt apparmor=docker-buildah
quay.io/buildah/stable:v1.33.2
bash -c 'buildah run $(buildah from busybox) echo test'
Output:
error running subprocess: remounting /dev in mount namespace read-only: permission denied
According to AppArmor maintainers, the bug comes from mount flags not allowed by the Linux kernel:
It turns out that buildah is actually passing an invalid set of mount flags to the kernel and that the kernel was ignoring the conflict while we were blocking the remount based on the conflicts.
strace reports the following for the denied mount call:
The man page for the mount syscall states that MS_SHARED, MS_PRIVATE, MS_SLAVE, and MS_UNBINDABLE can only be used MS_REC (with MS_SILENT being ignored). While the kernel was ignoring the conflict, we were denying it.
They pushed a fix on AppArmor's side here. However, they recommend a fix on buildah's side:
Fix committed to the upstream AppArmor parser to allow the conflicting flags, but buildah should also stop passing in those conflicting flags if it hasn't been updated to do so already.
Furthermore, it is not always possible to upgrade AppArmor to the latest dev version on any Linux distro, while it would be fairly easy to upgrade buildah's version if a fix is pushed for this matter. It seems that most problems come from flags specified in run_linux.go, where MS_REC should be specified along other flags.
Thanks for reading,
Steps to reproduce the issue
Steps to reproduce the issue
On Ubuntu 22.04, with docker version 28.1.1 installed:
To allow several calls blocked by default, add unshare and *mount* to allowed syscalls in a dedicated seccomp profile (it is linked below). Another solution can be to directly use /usr/share/containers/seccomp.json.
To allow mount operations, add mount operations to the default AppArmor profile:
#include <tunables/global>
profile runner flags=(attach_disconnected,mediate_deleted) {
#include <abstractions/base>
network,
capability,
file,
umount,
mount,
# Host (privileged) processes may send signals to container processes.
signal (receive) peer=unconfined,
# runc may send signals to container processes (for "docker stop").
signal (receive) peer=runc,
# crun may send signals to container processes (for "docker stop" when used with crun OCI runtime).
signal (receive) peer=crun,
# dockerd may send signals to container processes (for "docker kill").
signal (receive) peer=unconfined,
# Container processes may send signals amongst themselves.
signal (send,receive) peer=runner,
deny @{PROC}/* w, # deny write for all files directly in /proc (not in a subdir)
# deny write to files not in /proc/<number>/** or /proc/sys/**
deny @{PROC}/{[^1-9],[^1-9][^0-9],[^1-9s][^0-9y][^0-9s],[^1-9][^0-9][^0-9][^0-9/]*}/** w,
deny @{PROC}/sys/[^k]** w, # deny /proc/sys except /proc/sys/k* (effectively /proc/sys/kernel)
deny @{PROC}/sys/kernel/{?,??,[^s][^h][^m]**} w, # deny everything except shm* in /proc/sys/kernel/
deny @{PROC}/sysrq-trigger rwklx,
deny @{PROC}/kcore rwklx,
deny /sys/[^f]*/** wklx,
deny /sys/f[^s]*/** wklx,
deny /sys/fs/[^c]*/** wklx,
deny /sys/fs/c[^g]*/** wklx,
deny /sys/fs/cg[^r]*/** wklx,
deny /sys/firmware/** rwklx,
deny /sys/devices/virtual/powercap/** rwklx,
deny /sys/kernel/security/** rwklx,
# suppress ptrace denials when using 'docker ps' or using 'ps' inside a container
ptrace (trace,read,tracedby,readby) peer=runner,
}
latest: Pulling from buildah/stable
Digest: sha256:b0640e208f22a652c27b3996d40542366e7a3e05c4d4a2580bbfcf1ea155cccd
Status: Image is up to date for quay.io/buildah/upstream:latest
STEP 1/2: FROM alpine:latest
Resolved "alpine" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull docker.io/library/alpine:latest...
Getting image source signatures
Copying blob f18232174bc9 done |
Copying config aded1e1a5b done |
Writing manifest to image destination
STEP 2/2: RUN apk update
error running subprocess: remounting /dev in mount namespace read-only: permission denied
Error: building at STEP "RUN apk update": exit status 1
Did you mean `/usr/share/containers/containers.conf` ?# The containers configuration file specifies all of the available configuration# command-line options/flags for container engine tools like Podman & Buildah,# but in a TOML format that can be easily modified and versioned.# Please refer to containers.conf(5) for details of all configuration options.# Not all container engines implement all of the options.# All of the options have hard coded defaults and these options will override# the built in defaults. Users can then override these options via the command# line. Container engines will read containers.conf files in up to three# locations in the following order:# 1. /usr/share/containers/containers.conf# 2. /etc/containers/containers.conf# 3. $HOME/.config/containers/containers.conf (Rootless containers ONLY)# Items specified in the latter containers.conf, if they exist, override the# previous containers.conf settings, or the default settings.
[containers]
# List of annotation. Specified as# "key = value"# If it is empty or commented out, no annotations will be added##annotations = []# Used to change the name of the default AppArmor profile of container engine.##apparmor_profile = "container-default"# Default way to to create a cgroup namespace for the container# Options are:# `private` Create private Cgroup Namespace for the container.# `host` Share host Cgroup Namespace with the container.##cgroupns = "private"# Control container cgroup configuration# Determines whether the container will create CGroups.# Options are:# `enabled` Enable cgroup support within container# `disabled` Disable cgroup support, will inherit cgroups from parent# `no-conmon` Do not create a cgroup dedicated to conmon.##cgroups = "enabled"# List of default capabilities for containers. If it is empty or commented out,# the default capabilities defined in the container engine will be added.#default_capabilities = [
"CHOWN",
"DAC_OVERRIDE",
"FOWNER",
"FSETID",
"KILL",
"NET_BIND_SERVICE",
"SETFCAP",
"SETGID",
"SETPCAP",
"SETUID",
"SYS_CHROOT"
]
# A list of sysctls to be set in containers by default,# specified as "name=value",# for example:"net.ipv4.ping_group_range=0 0".#default_sysctls = [
"net.ipv4.ping_group_range=0 0",
]
# A list of ulimits to be set in containers by default, specified as# "<ulimit name>=<soft limit>:<hard limit>", for example:# "nofile=1024:2048"# See setrlimit(2) for a list of resource names.# Any limit not specified here will be inherited from the process launching the# container engine.# Ulimits has limits for non privileged container engines.##default_ulimits = [# "nofile=1280:2560",#]# List of devices. Specified as# "<device-on-host>:<device-on-container>:<permissions>", for example:# "/dev/sdc:/dev/xvdc:rwm".# If it is empty or commented out, only the default devices will be used##devices = []# List of default DNS options to be added to /etc/resolv.conf inside of the container.##dns_options = []# List of default DNS search domains to be added to /etc/resolv.conf inside of the container.##dns_searches = []# Set default DNS servers.# This option can be used to override the DNS configuration passed to the# container. The special value "none" can be specified to disable creation of# /etc/resolv.conf in the container.# The /etc/resolv.conf file in the image will be used without changes.##dns_servers = []# Environment variable list for the conmon process; used for passing necessary# environment variables to conmon or the runtime.##env = [# "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",# "TERM=xterm",#]# Pass all host environment variables into the container.##env_host = false# Default proxy environment variables passed into the container.# The environment variables passed in include:# http_proxy, https_proxy, ftp_proxy, no_proxy, and the upper case versions of# these. This option is needed when host system uses a proxy but container# should not use proxy. Proxy environment variables specified for the container# in any other way will override the values passed from the host.##http_proxy = true# Run an init inside the container that forwards signals and reaps processes.##init = false# Container init binary, if init=true, this is the init binary to be used for containers.##init_path = "/usr/libexec/podman/catatonit"# Default way to to create an IPC namespace (POSIX SysV IPC) for the container# Options are:# `private` Create private IPC Namespace for the container.# `host` Share host IPC Namespace with the container.##ipcns = "private"# keyring tells the container engine whether to create# a kernel keyring for use within the container.##keyring = true# label tells the container engine whether to use container separation using# MAC(SELinux) labeling or not.# The label flag is ignored on label disabled systems.##label = true# Logging driver for the container. Available options: k8s-file and journald.##log_driver = "k8s-file"# Maximum size allowed for the container log file. Negative numbers indicate# that no size limit is imposed. If positive, it must be >= 8192 to match or# exceed conmon's read buffer. The file is truncated and re-opened so the# limit is never exceeded.##log_size_max = -1# Specifies default format tag for container log messages.# This is useful for creating a specific tag for container log messages.# Containers logs default to truncated container ID as a tag.##log_tag = ""# Default way to to create a Network namespace for the container# Options are:# `private` Create private Network Namespace for the container.# `host` Share host Network Namespace with the container.# `none` Containers do not use the network##netns = "private"# Create /etc/hosts for the container. By default, container engine manage# /etc/hosts, automatically adding the container's own IP address.##no_hosts = false# Default way to to create a PID namespace for the container# Options are:# `private` Create private PID Namespace for the container.# `host` Share host PID Namespace with the container.##pidns = "private"# Maximum number of processes allowed in a container.##pids_limit = 2048# Copy the content from the underlying image into the newly created volume# when the container is created instead of when it is started. If false,# the container engine will not copy the content until the container is started.# Setting it to true may have negative performance implications.##prepare_volume_on_create = false# Indicates the networking to be used for rootless containers##rootless_networking = "slirp4netns"# Path to the seccomp.json profile which is used as the default seccomp profile# for the runtime.##seccomp_profile = "/usr/share/containers/seccomp.json"# Size of /dev/shm. Specified as <number><unit>.# Unit is optional, values:# b (bytes), k (kilobytes), m (megabytes), or g (gigabytes).# If the unit is omitted, the system uses bytes.##shm_size = "65536k"# Set timezone in container. Takes IANA timezones as well as "local",# which sets the timezone in the container to match the host machine.##tz = ""# Set umask inside the container##umask = "0022"# Default way to to create a User namespace for the container# Options are:# `auto` Create unique User Namespace for the container.# `host` Share host User Namespace with the container.##userns = "host"# Number of UIDs to allocate for the automatic container creation.# UIDs are allocated from the "container" UIDs listed in# /etc/subuid & /etc/subgid##userns_size = 65536# Default way to to create a UTS namespace for the container# Options are:# `private` Create private UTS Namespace for the container.# `host` Share host UTS Namespace with the container.##utsns = "private"# List of volumes. Specified as# "<directory-on-host>:<directory-in-container>:<options>", for example:# "/db:/var/lib/db:ro".# If it is empty or commented out, no volumes will be added##volumes = []# The network table contains settings pertaining to the management of# CNI plugins.
[secrets]
#driver = "file"
[secrets.opts]
#root = "/example/directory"
[network]
# Path to directory where CNI plugin binaries are located.##cni_plugin_dirs = [# "/usr/local/libexec/cni",# "/usr/libexec/cni",# "/usr/local/lib/cni",# "/usr/lib/cni",# "/opt/cni/bin",#]# The network name of the default CNI network to attach pods to.##default_network = "podman"# The default subnet for the default CNI network given in default_network.# If a network with that name does not exist, a new network using that name and# this subnet will be created.# Must be a valid IPv4 CIDR prefix.##default_subnet = "10.88.0.0/16"# Path to the directory where CNI configuration files are located.##network_config_dir = "/etc/cni/net.d/"
[engine]
# Index to the active service##active_service = production# Cgroup management implementation used for the runtime.# Valid options "systemd" or "cgroupfs"##cgroup_manager = "systemd"# Environment variables to pass into conmon##conmon_env_vars = [# "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"#]# Paths to look for the conmon container manager binary##conmon_path = [# "/usr/libexec/podman/conmon",# "/usr/local/libexec/podman/conmon",# "/usr/local/lib/podman/conmon",# "/usr/bin/conmon",# "/usr/sbin/conmon",# "/usr/local/bin/conmon",# "/usr/local/sbin/conmon"#]# Specify the keys sequence used to detach a container.# Format is a single character [a-Z] or a comma separated sequence of# `ctrl-<value>`, where `<value>` is one of:# `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_`##detach_keys = "ctrl-p,ctrl-q"# Determines whether engine will reserve ports on the host when they are# forwarded to containers. When enabled, when ports are forwarded to containers,# ports are held open by as long as the container is running, ensuring that# they cannot be reused by other programs on the host. However, this can cause# significant memory usage if a container has many ports forwarded to it.# Disabling this can save memory.##enable_port_reservation = true# Environment variables to be used when running the container engine (e.g., Podman, Buildah).# For example "http_proxy=internal.proxy.company.com".# Note these environment variables will not be used within the container.# Set the env section under [containers] table, if you want to set environment variables for the container.##env = []# Selects which logging mechanism to use for container engine events.# Valid values are `journald`, `file` and `none`.##events_logger = "journald"# A is a list of directories which are used to search for helper binaries.##helper_binaries_dir = [# "/usr/local/libexec/podman",# "/usr/local/lib/podman",# "/usr/libexec/podman",# "/usr/lib/podman",#]# Path to OCI hooks directories for automatically executed hooks.##hooks_dir = [# "/usr/share/containers/oci/hooks.d",#]# Manifest Type (oci, v2s2, or v2s1) to use when pulling, pushing, building# container images. By default image pulled and pushed match the format of the# source image. Building/committing defaults to OCI.##image_default_format = ""# Default transport method for pulling and pushing for images##image_default_transport = "docker://"# Maximum number of image layers to be copied (pulled/pushed) simultaneously.# Not setting this field, or setting it to zero, will fall back to containers/image defaults.##image_parallel_copies = 0# Default command to run the infra container##infra_command = "/pause"# Infra (pause) container image name for pod infra containers. When running a# pod, we start a `pause` process in a container to hold open the namespaces# associated with the pod. This container does nothing other then sleep,# reserving the pods resources for the lifetime of the pod.##infra_image = "k8s.gcr.io/pause:3.4.1"# Specify the locking mechanism to use; valid values are "shm" and "file".# Change the default only if you are sure of what you are doing, in general# "file" is useful only on platforms where cgo is not available for using the# faster "shm" lock type. You may need to run "podman system renumber" after# you change the lock type.##lock_type** = "shm"# Indicates if Podman is running inside a VM via Podman Machine.# Podman uses this value to do extra setup around networking from the# container inside the VM to to host.##machine_enabled = false# MultiImageArchive - if true, the container engine allows for storing archives# (e.g., of the docker-archive transport) with multiple images. By default,# Podman creates single-image archives.##multi_image_archive = "false"# Default engine namespace# If engine is joined to a namespace, it will see only containers and pods# that were created in the same namespace, and will create new containers and# pods in that namespace.# The default namespace is "", which corresponds to no namespace. When no# namespace is set, all containers and pods are visible.##namespace = ""# Path to the slirp4netns binary##network_cmd_path = ""# Default options to pass to the slirp4netns binary.# For example "allow_host_loopback=true"##network_cmd_options = []# Whether to use chroot instead of pivot_root in the runtime##no_pivot_root = false# Number of locks available for containers and pods.# If this is changed, a lock renumber must be performed (e.g. with the# 'podman system renumber' command).##num_locks = 2048# Whether to pull new image before running a container##pull_policy = "missing"# Indicates whether the application should be running in remote mode. This flag modifies the# --remote option on container engines. Setting the flag to true will default# `podman --remote=true` for access to the remote Podman service.##remote = false# Default OCI runtime##runtime = "crun"# List of the OCI runtimes that support --format=json. When json is supported# engine will use it for reporting nicer errors.##runtime_supports_json = ["crun", "runc", "kata", "runsc"]# List of the OCI runtimes that supports running containers with KVM Separation.##runtime_supports_kvm = ["kata"]# List of the OCI runtimes that supports running containers without cgroups.##runtime_supports_nocgroups = ["crun"]# Directory for persistent engine files (database, etc)# By default, this will be configured relative to where the containers/storage# stores containers# Uncomment to change location from this default##static_dir = "/var/lib/containers/storage/libpod"# Number of seconds to wait for container to exit before sending kill signal.##stop_timeout = 10# map of service destinations##[service_destinations]# [service_destinations.production]# URI to access the Podman service# Examples:# rootless "unix://run/user/$UID/podman/podman.sock" (Default)# rootfull "unix://run/podman/podman.sock (Default)# remote rootless ssh://engineering.lab.company.com/run/user/1000/podman/podman.sock# remote rootfull ssh://[email protected]:22/run/podman/podman.sock## uri = "ssh://[email protected]/run/user/1001/podman/podman.sock"# Path to file containing ssh identity key# identity = "~/.ssh/id_rsa"# Directory for temporary files. Must be tmpfs (wiped after reboot)##tmp_dir = "/run/libpod"# Directory for libpod named volumes.# By default, this will be configured relative to where containers/storage# stores containers.# Uncomment to change location from this default.##volume_path = "/var/lib/containers/storage/volumes"# Paths to look for a valid OCI runtime (crun, runc, kata, runsc, etc)
[engine.runtimes]
#crun = [# "/usr/bin/crun",# "/usr/sbin/crun",# "/usr/local/bin/crun",# "/usr/local/sbin/crun",# "/sbin/crun",# "/bin/crun",# "/run/current-system/sw/bin/crun",#]#kata = [# "/usr/bin/kata-runtime",# "/usr/sbin/kata-runtime",# "/usr/local/bin/kata-runtime",# "/usr/local/sbin/kata-runtime",# "/sbin/kata-runtime",# "/bin/kata-runtime",# "/usr/bin/kata-qemu",# "/usr/bin/kata-fc",#]#runc = [# "/usr/bin/runc",# "/usr/sbin/runc",# "/usr/local/bin/runc",# "/usr/local/sbin/runc",# "/sbin/runc",# "/bin/runc",# "/usr/lib/cri-o-runc/sbin/runc",#]#runsc = [# "/usr/bin/runsc",# "/usr/sbin/runsc",# "/usr/local/bin/runsc",# "/usr/local/sbin/runsc",# "/bin/runsc",# "/sbin/runsc",# "/run/current-system/sw/bin/runsc",#]
[engine.volume_plugins]
#testplugin = "/run/podman/plugins/test.sock"
[machine]
# Number of CPU's a machine is created with.##cpus=1# The size of the disk in GB created when init-ing a podman-machine VM.##disk_size=10# The image used when creating a podman-machine VM.##image = "testing"# Memory in MB a machine is created with.##memory=2048# The [machine] table MUST be the last entry in this file.# (Unless another table is added)# TOML does not provide a way to end a table other than a further table being# defined, so every key hereafter will be part of [machine] and not the# main config.
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
Works with a profile in complain mode, but it disables most of AppArmor's security mechanisms.
The text was updated successfully, but these errors were encountered:
Issue Description
Hi,
I am reporting a bug that has already been mentioned on AppArmor bug tracker here:
According to AppArmor maintainers, the bug comes from mount flags not allowed by the Linux kernel:
They pushed a fix on AppArmor's side here. However, they recommend a fix on buildah's side:
Furthermore, it is not always possible to upgrade AppArmor to the latest dev version on any Linux distro, while it would be fairly easy to upgrade buildah's version if a fix is pushed for this matter. It seems that most problems come from flags specified in run_linux.go, where
MS_REC
should be specified along other flags.Thanks for reading,
Steps to reproduce the issue
Steps to reproduce the issue
On Ubuntu 22.04, with docker version 28.1.1 installed:
To allow several calls blocked by default, add
unshare
and*mount*
to allowed syscalls in a dedicated seccomp profile (it is linked below). Another solution can be to directly use/usr/share/containers/seccomp.json
.To allow mount operations, add
mount
operations to the default AppArmor profile:Create simple Dockerfile:
Run:
runner.json
Describe the results you received
Logs:
audit.log:
Describe the results you expected
Should have built the image without failing. In unconfined mode:
Logs:
buildah version output
buildah info output
Provide your storage.conf
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
Works with a profile in complain mode, but it disables most of AppArmor's security mechanisms.
The text was updated successfully, but these errors were encountered: