Skip to content

podman kube play does not respect size= attribute to io.podman.annotations.userns annotation #25896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Lalufu opened this issue Apr 16, 2025 · 4 comments · May be fixed by #25948
Open

podman kube play does not respect size= attribute to io.podman.annotations.userns annotation #25896

Lalufu opened this issue Apr 16, 2025 · 4 comments · May be fixed by #25948
Assignees
Labels
bugweek kind/bug Categorizes issue or PR as related to a bug. triaged Issue has been triaged

Comments

@Lalufu
Copy link

Lalufu commented Apr 16, 2025

Issue Description

Using rootless podman, consider the following yaml

---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    io.podman.annotations.userns/lucidclarke: "auto:size=2048"
  creationTimestamp: "2025-04-16T10:41:54Z"
  labels:
    app: lucidclarke-pod
  name: lucidclarke-pod
spec:
  containers:
  - image: docker.io/library/eclipse-mosquitto:2.0.21
    name: lucidclarke
    securityContext:
      readOnlyRootFilesystem: true
    volumeMounts:
    - mountPath: /mosquitto/log
      name: mosquitto-log-pvc
    - mountPath: /mosquitto/data
      name: mosquitto-data-pvc
  hostUsers: false
  volumes:
  - name: mosquitto-log-pvc
    persistentVolumeClaim:
      claimName: mosquitto-log
  - name: mosquitto-data-pvc
    persistentVolumeClaim:
      claimName: mosquitto-data

The container in question (docker.io/library/eclipse-mosquitto) will attempt to change to UID 1883 by default.

Steps to reproduce the issue

Running the above yaml with podman kube play will result in the lucidclarke container crashlooping with the following log messages:

89f5178cb221 chown: /mosquitto/data: Invalid argument
89f5178cb221 chown: /mosquitto/data: Invalid argument
89f5178cb221 chown: /mosquitto/config/mosquitto.conf: Read-only file system
89f5178cb221 chown: /mosquitto/config: Read-only file system
89f5178cb221 chown: /mosquitto/config: Read-only file system
89f5178cb221 chown: /mosquitto/log: Invalid argument
89f5178cb221 chown: /mosquitto/log: Invalid argument
89f5178cb221 chown: /mosquitto: Read-only file system
89f5178cb221 chown: /mosquitto: Read-only file system
89f5178cb221 1744813902: Error setting groups whilst dropping privileges: Invalid argument.

The "invalid argument" messages are caused by insufficient UID/GID coverage, the pod only provides 1024 UIDs/GIDs, which is insufficient for UID 1882.

When running podman kube play --userns=auto:size=2024, the container starts successfully:

f70112324697 chown: /mosquitto/config/mosquitto.conf: Read-only file system
f70112324697 chown: /mosquitto/config: Read-only file system
f70112324697 chown: /mosquitto/config: Read-only file system
f70112324697 chown: /mosquitto: Read-only file system
f70112324697 chown: /mosquitto: Read-only file system
f70112324697 1744813971: mosquitto version 2.0.21 starting
f70112324697 1744813971: Config loaded from /mosquitto/config/mosquitto.conf.
f70112324697 1744813971: Starting in local only mode. Connections will only be possible from clients running on this machine.
f70112324697 1744813971: Create a configuration file which defines a listener to allow remote access.
f70112324697 1744813971: For more details see https://mosquitto.org/documentation/authentication-methods/
f70112324697 1744813971: Opening ipv4 listen socket on port 1883.
f70112324697 1744813971: Opening ipv6 listen socket on port 1883.
f70112324697 1744813971: mosquitto version 2.0.21 running

(the "read-only file system" messages are harmess)

Describe the results you received

See above

Describe the results you expected

The play should respect the io.podman.annotations.userns/lucidclarke: "auto:size=2048" annotation and allocate sufficient UIDs

podman info output

host:                       
  arch: amd64               
  buildahVersion: 1.39.0    
  cgroupControllers:        
  - cpu                     
  - io                      
  - memory                  
  - pids                    
  cgroupManager: systemd    
  cgroupVersion: v2         
  conmon:                   
    package: conmon-2.1.12-2.fc40.x86_64
    path: /usr/bin/conmon   
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:           
    idlePercent: 91.6       
    systemPercent: 4.68     
    userPercent: 3.71       
  cpus: 8                   
  databaseBackend: sqlite   
  distribution:             
    distribution: fedora    
    version: "40"           
  eventLogger: journald     
  freeLocks: 2043           
  hostname: ethan.home.dn.lalufu.net
  idMappings:               
    gidmap:                 
    - container_id: 0       
      host_id: 10007        
      size: 1               
    - container_id: 1       
      host_id: 2065536      
      size: 65536           
    uidmap:                 
    - container_id: 0       
      host_id: 10007        
      size: 1               
    - container_id: 1       
      host_id: 2065536      
      size: 65536           
  kernel: 6.13.10-100.fc40.x86_64
  linkmode: dynamic         
  logDriver: journald       
  memFree: 28626632704      
  memTotal: 134943465472    
  networkBackend: netavark  
  networkBackendInfo:       
    backend: netavark       
    dns:                    
      package: aardvark-dns-1.14.0-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.14.0
    package: netavark-1.14.1-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.14.1
  ociRuntime:               
    name: crun              
    package: crun-1.20-2.fc40.x86_64
    path: /usr/bin/crun     
    version: |-             
      crun version 1.20     
      commit: 9c9a76ac11994701dd666c4f0b869ceffb599a66
      rundir: /run/user/10007/crun
      spec: 1.0.0           
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux                 
  pasta:                    
    executable: /usr/bin/pasta
    package: passt-0^20250217.ga1e48a0-2.fc40.x86_64
    version: ""             
  remoteSocket:             
    exists: true            
    path: /run/user/10007/podman/podman.sock
  rootlessNetworkCmd: pasta 
  security:                 
    apparmorEnabled: false  
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true          
    seccompEnabled: true    
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true    
  serviceIsRemote: false    
  slirp4netns:              
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.fc40.x86_64
    version: |-             
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.7.0       
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5     
  swapFree: 33895788544     
  swapTotal: 34359734272    
  uptime: 44h 39m 37.00s (Approximately 1.83 days)
  variant: ""               
plugins:                    
  authorization: null       
  log:                      
  - k8s-file                
  - none                    
  - passthrough             
  - journald                
  network:                  
  - bridge                  
  - macvlan                 
  - ipvlan                  
  volume:                   
  - local                   
registries:                 
  search:                   
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io               
store:                      
  configFile: /stank/podman/users/pod-tasmota-mqtt/.config/containers/storage.conf
  containerStore:           
    number: 2               
    paused: 0               
    running: 2              
    stopped: 0              
  graphDriverName: overlay  
  graphOptions: {}          
  graphRoot: /stank/podman/users/pod-tasmota-mqtt/.local/share/containers/storage
  graphRootAllocated: 1528998002688
  graphRootUsed: 544735232  
  graphStatus:              
    Backing Filesystem: zfs 
    Native Overlay Diff: "true"
    Supports d_type: "true" 
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false" 
  imageCopyTmpDir: /var/tmp 
  imageStore:               
    number: 4               
  runRoot: /run/user/10007/containers
  transientStore: false     
  volumePath: /stank/podman/users/pod-tasmota-mqtt/.local/share/containers/storage/volumes
version:                    
  APIVersion: 5.4.0         
  BuildOrigin: Fedora Project
  Built: 1739232000         
  BuiltTime: Tue Feb 11 00:00:00 2025
  GitCommit: ""             
  GoVersion: go1.22.11      
  Os: linux                 
  OsArch: linux/amd64       
  Version: 5.4.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

No response

Additional information

No response

@Lalufu Lalufu added the kind/bug Categorizes issue or PR as related to a bug. label Apr 16, 2025
@Lalufu
Copy link
Author

Lalufu commented Apr 16, 2025

This works when using io.podman.annotations.userns instead of io.podman.annotations.userns/lucidclarke. The latter is what's generated by podman generate kube, so I though that should be fine, but it's not.

@mheon
Copy link
Member

mheon commented Apr 16, 2025

Can you check if the YAML generated by play kube functions correctly (IE, the size annotation is respected, despite having the container name as a suffix)?

Definitely seems like a bug regardless.

@mheon mheon added the triaged Issue has been triaged label Apr 16, 2025
@Lalufu
Copy link
Author

Lalufu commented Apr 16, 2025

No, that does not work.
Create container:

podman container run -it --userns=auto:size=2048 --read-only -v mosquitto-log:/mosquitto/log -v mosquitto-data:/mosquitto/data docker.io/library/eclipse-mosquitto:2.0.21

This works, and the container starts.
Ctrl-C, generate a kube yaml from it:

podman generate kube kind_wescoff > test.yaml

Content:

# Save the output of this file and use kubectl create -f to import
# it into Kubernetes.
#
# Created with podman-5.4.0

# NOTE: If you generated this yaml from an unprivileged and rootless podman container on an SELinux
# enabled system, check the podman generate kube man page for steps to follow to ensure that your pod/container
# has the right permissions to access the volumes added.
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    io.podman.annotations.userns/kindwescoff: auto:size=2048
  creationTimestamp: "2025-04-16T20:01:31Z"
  labels:
    app: kindwescoff-pod
  name: kindwescoff-pod
spec:
  containers:
  - args:
    - /usr/sbin/mosquitto
    - -c
    - /mosquitto/config/mosquitto.conf
    env:
    - name: TERM
      value: xterm
    image: docker.io/library/eclipse-mosquitto:2.0.21
    name: kindwescoff
    securityContext:
      readOnlyRootFilesystem: true
    stdin: true
    tty: true
    volumeMounts:
    - mountPath: /mosquitto/data
      name: mosquitto-data-pvc
    - mountPath: /mosquitto/log
      name: mosquitto-log-pvc
  hostUsers: false
  volumes:
  - name: mosquitto-data-pvc
    persistentVolumeClaim:
      claimName: mosquitto-data
  - name: mosquitto-log-pvc
    persistentVolumeClaim:
      claimName: mosquitto-log

Starting this results in a crash loop with the error message above.

@mheon
Copy link
Member

mheon commented Apr 16, 2025

Alright, that's pretty serious. Generated YAML should always run. I'll mark this as a priority for bug week.

I think that what play kube is doing is probably correct (user namespace settings must be per-pod, so having a container set in the annotation does not make sense IMO) so we need to update generate kube and the documentation to reflect this.

@mheon mheon added the bugweek label Apr 18, 2025
@mheon mheon self-assigned this Apr 21, 2025
mheon added a commit to mheon/libpod that referenced this issue Apr 22, 2025
The `podman generate kube` command on containers follows a
different codepath from pods. Pods store a lot of pod-level
configuration - including user namespace information - in
annotations, so it can be restored by `play kube`. Generating for
a container does not do the same thing, because we don't have a
pod.

However, per-container generation was still generating a nearly
identical user namespace annotation to a pod. Example:

In Pod:
  io.podman.annotations.userns: auto:size=40
Not in Pod:
  io.podman.annotations.userns/awesomegreider: auto:size=2048

The second annotation seems like it should apply a user namespace
config to the generated Kubernetes pod. Instead, it's just adding
an annotation to the awesomegreider container, that says said
container has a user namespace, when it does not in fact have a
user namespace configured because it is now in a pod.

After this PR, both containers in and out of pods generate
identical annotations (the In Pod version, missing container
name) and as such should generate pods with appropriately
configured user namespaces. I also added some conflict detection
to refuse to generate if you try to generate YAML containing two
containers with conflicting user namespace configuration.

Fixes containers#25896

Signed-off-by: Matt Heon <[email protected]>
@mheon mheon linked a pull request Apr 22, 2025 that will close this issue
mheon added a commit to mheon/libpod that referenced this issue Apr 22, 2025
The `podman generate kube` command on containers follows a
different codepath from pods. Pods store a lot of pod-level
configuration - including user namespace information - in
annotations, so it can be restored by `play kube`. Generating for
a container does not do the same thing, because we don't have a
pod.

However, per-container generation was still generating a nearly
identical user namespace annotation to a pod. Example:

In Pod:
  io.podman.annotations.userns: auto:size=40
Not in Pod:
  io.podman.annotations.userns/awesomegreider: auto:size=2048

The second annotation seems like it should apply a user namespace
config to the generated Kubernetes pod. Instead, it's just adding
an annotation to the awesomegreider container, that says said
container has a user namespace, when it does not in fact have a
user namespace configured because it is now in a pod.

After this PR, both containers in and out of pods generate
identical annotations (the In Pod version, missing container
name) and as such should generate pods with appropriately
configured user namespaces. I also added some conflict detection
to refuse to generate if you try to generate YAML containing two
containers with conflicting user namespace configuration.

Fixes containers#25896

Signed-off-by: Matt Heon <[email protected]>
mheon added a commit to mheon/libpod that referenced this issue Apr 23, 2025
The `podman generate kube` command on containers follows a
different codepath from pods. Pods store a lot of pod-level
configuration - including user namespace information - in
annotations, so it can be restored by `play kube`. Generating for
a container does not do the same thing, because we don't have a
pod.

However, per-container generation was still generating a nearly
identical user namespace annotation to a pod. Example:

In Pod:
  io.podman.annotations.userns: auto:size=40
Not in Pod:
  io.podman.annotations.userns/awesomegreider: auto:size=2048

The second annotation seems like it should apply a user namespace
config to the generated Kubernetes pod. Instead, it's just adding
an annotation to the awesomegreider container, that says said
container has a user namespace, when it does not in fact have a
user namespace configured because it is now in a pod.

After this PR, both containers in and out of pods generate
identical annotations (the In Pod version, missing container
name) and as such should generate pods with appropriately
configured user namespaces. I also added some conflict detection
to refuse to generate if you try to generate YAML containing two
containers with conflicting user namespace configuration.

Fixes containers#25896

Signed-off-by: Matt Heon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugweek kind/bug Categorizes issue or PR as related to a bug. triaged Issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants