Skip to content

systemd service pod fails to start after update to v5.4.1 #25786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wdouglascampbell opened this issue Apr 3, 2025 · 9 comments · Fixed by #25796
Closed

systemd service pod fails to start after update to v5.4.1 #25786

wdouglascampbell opened this issue Apr 3, 2025 · 9 comments · Fixed by #25796
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. pods quadlet regression

Comments

@wdouglascampbell
Copy link

Issue Description

I have a number of pods that I have configured to start and run containers within them. I have used the Quadlet approach to defining both the containers and the pod. After I updated to using podman v5.4.1 I am no longer able to successfully start those systemd services that runs the pod.

Steps to reproduce the issue

Steps to reproduce the issue

First set things up and run using Podman v5.4.0

Create ~/.config/containers/systemd/pod-test.kube with the following content:

[Unit]
Description=pod-test

[Kube]
Yaml=pod-test.yaml

[Install]
# Start by default on boot
WantedBy=default.target

Create ~/.config/containers/systemd/pod-test.yaml with the following content:

apiVersion: v1
kind: Pod
metadata:
  name: test

Run pod.

systemctl --user daemon-reload
systemctl --user start pod-test

This results in no errors and the output of podman ps is:

$ podman ps
CONTAINER ID  IMAGE                                    COMMAND     CREATED        STATUS        PORTS       NAMES
8d0dcb224493  localhost/podman-pause:5.4.0-1739232000              7 seconds ago  Up 7 seconds              446e9646ef4e-service
2b63b76f3a48  localhost/podman-pause:5.4.0-1739232000              7 seconds ago  Up 6 seconds              a6dff1485c95-infra

Update to Podman v5.4.1 and try again

After rebooting, you will notice that the output of podman ps is:

$ podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES

Attempting to restart the pod.

systemctl --user restart pod-test

Results in the following error:

Job for pod-test.service failed because the service did not take the steps required by its unit configuration.
See "systemctl --user status pod-test.service" and "journalctl --user -xeu pod-test.service" for details.

and the results of journalctl --user -xeu pod-test.service --no-pager are as follows:

Apr 03 15:30:24 xxx.yyy.net systemd[1697]: Starting pod-test.service - pod-test...
░░ Subject: A start job for unit UNIT has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit UNIT has begun execution.
░░
░░ The job identifier is 60.
Apr 03 15:30:24 xxx.yyy.net pod-test[3200]: Pods stopped:
Apr 03 15:30:24 xxx.yyy.net pod-test[3200]: Pods removed:
Apr 03 15:30:24 xxx.yyy.net pod-test[3200]: Secrets removed:
Apr 03 15:30:24 xxx.yyy.net pod-test[3200]: Volumes removed:
Apr 03 15:30:24 xxx.yyy.net podman[3200]: 2025-04-03 15:30:24.27974382 -0400 EDT m=+0.054001751 network create de9b01cc6de87712999232388955d09e939c2d37914015e3b120205a11b920a4 (name=podman-default-kube-network, type=bridge)
Apr 03 15:30:25 xxx.yyy.net podman[3200]: 2025-04-03 15:30:25.624200151 -0400 EDT m=+1.398459270 container create 129d57f7c6fd87cc61d774fcc17a8f21f517220044853a82b718bb00f547dcd9 (image=localhost/podman-pause:5.4.1-1741651200, name=446e9646ef4e-service, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.180013188 -0400 EDT m=+1.954272307 container create 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56 (image=localhost/podman-pause:5.4.1-1741651200, name=ff9987c01c0a-infra, pod_id=ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555, PODMAN_SYSTEMD_UNIT=pod-test.service, io.buildah.version=1.39.2)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.202894678 -0400 EDT m=+1.977154007 pod create ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555 (image=, name=test)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.204814146 -0400 EDT m=+1.979073614 container restart 129d57f7c6fd87cc61d774fcc17a8f21f517220044853a82b718bb00f547dcd9 (image=localhost/podman-pause:5.4.1-1741651200, name=446e9646ef4e-service, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.426133398 -0400 EDT m=+2.200392098 container init 129d57f7c6fd87cc61d774fcc17a8f21f517220044853a82b718bb00f547dcd9 (image=localhost/podman-pause:5.4.1-1741651200, name=446e9646ef4e-service, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.437744209 -0400 EDT m=+2.212002350 container start 129d57f7c6fd87cc61d774fcc17a8f21f517220044853a82b718bb00f547dcd9 (image=localhost/podman-pause:5.4.1-1741651200, name=446e9646ef4e-service, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:26 xxx.yyy.net pasta[3217]: Couldn't get any nameserver address
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.791236338 -0400 EDT m=+2.565495737 container init 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56 (image=localhost/podman-pause:5.4.1-1741651200, name=ff9987c01c0a-infra, pod_id=ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555, PODMAN_SYSTEMD_UNIT=pod-test.service, io.buildah.version=1.39.2)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.795632472 -0400 EDT m=+2.569890194 container start 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56 (image=localhost/podman-pause:5.4.1-1741651200, name=ff9987c01c0a-infra, pod_id=ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555, PODMAN_SYSTEMD_UNIT=pod-test.service, io.buildah.version=1.39.2)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.867783011 -0400 EDT m=+2.642042199 pod start ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555 (image=, name=test)
Apr 03 15:30:26 xxx.yyy.net podman[3200]: 2025-04-03 15:30:26.954308915 -0400 EDT m=+2.728568313 container stop 129d57f7c6fd87cc61d774fcc17a8f21f517220044853a82b718bb00f547dcd9 (image=localhost/podman-pause:5.4.1-1741651200, name=446e9646ef4e-service, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:27 xxx.yyy.net podman[3200]: 2025-04-03 15:30:27.032210522 -0400 EDT m=+2.806468523 container died 129d57f7c6fd87cc61d774fcc17a8f21f517220044853a82b718bb00f547dcd9 (image=localhost/podman-pause:5.4.1-1741651200, name=446e9646ef4e-service, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:27 xxx.yyy.net pod-test[3200]: Pod:
Apr 03 15:30:27 xxx.yyy.net pod-test[3200]: ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3210 (conmon) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3239 (conmon) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3244 (podman) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3245 (n/a) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3246 (n/a) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3247 (n/a) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3248 (n/a) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3249 (n/a) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net systemd[1697]: pod-test.service: Killing process 3250 (podman) with signal SIGKILL.
Apr 03 15:30:27 xxx.yyy.net podman[3252]: 2025-04-03 15:30:27.227766743 -0400 EDT m=+0.068502623 pod stop ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555 (image=, name=test)
Apr 03 15:30:27 xxx.yyy.net podman[3252]: 2025-04-03 15:30:27.270594824 -0400 EDT m=+0.111330355 container stop 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56 (image=localhost/podman-pause:5.4.1-1741651200, name=ff9987c01c0a-infra, pod_id=ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555, PODMAN_SYSTEMD_UNIT=pod-test.service, io.buildah.version=1.39.2)
Apr 03 15:30:27 xxx.yyy.net podman[3252]: 2025-04-03 15:30:27.270726616 -0400 EDT m=+0.111462007 container died 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56 (image=localhost/podman-pause:5.4.1-1741651200, name=ff9987c01c0a-infra, PODMAN_SYSTEMD_UNIT=pod-test.service, io.buildah.version=1.39.2)
Apr 03 15:30:28 xxx.yyy.net podman[3252]: 2025-04-03 15:30:28.138347713 -0400 EDT m=+0.979084501 pod stop ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555 (image=, name=test)
Apr 03 15:30:28 xxx.yyy.net podman[3252]: 2025-04-03 15:30:28.511276028 -0400 EDT m=+1.352013026 container remove 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56 (image=localhost/podman-pause:5.4.1-1741651200, name=ff9987c01c0a-infra, pod_id=ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555, io.buildah.version=1.39.2, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:28 xxx.yyy.net podman[3252]: 2025-04-03 15:30:28.581574567 -0400 EDT m=+1.422311145 pod remove ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555 (image=, name=test)
Apr 03 15:30:29 xxx.yyy.net podman[3252]: 2025-04-03 15:30:29.123479132 -0400 EDT m=+1.964216129 container remove 129d57f7c6fd87cc61d774fcc17a8f21f517220044853a82b718bb00f547dcd9 (image=localhost/podman-pause:5.4.1-1741651200, name=446e9646ef4e-service, PODMAN_SYSTEMD_UNIT=pod-test.service)
Apr 03 15:30:29 xxx.yyy.net pod-test[3252]: Pods stopped:
Apr 03 15:30:29 xxx.yyy.net pod-test[3252]: Error: stopping container 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56: container 1464a5dfb287997d247aed3fff18da36a8ee58ca5e078db0b784314846ecec56 conmon exited prematurely, exit code could not be retrieved: internal libpod error
Apr 03 15:30:29 xxx.yyy.net pod-test[3252]: Pods removed:
Apr 03 15:30:29 xxx.yyy.net pod-test[3252]: ff9987c01c0a6f461e05c786b506a50e0b12e9a8c54dc458614634d5d769f555
Apr 03 15:30:29 xxx.yyy.net pod-test[3252]: Secrets removed:
Apr 03 15:30:29 xxx.yyy.net pod-test[3252]: Volumes removed:
Apr 03 15:30:29 xxx.yyy.net systemd[1697]: pod-test.service: Failed with result 'protocol'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit UNIT has entered the 'failed' state with result 'protocol'.
Apr 03 15:30:29 xxx.yyy.net systemd[1697]: Failed to start pod-test.service - pod-test.
░░ Subject: A start job for unit UNIT has failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A start job for unit UNIT has finished with a failure.
░░
░░ The job identifier is 60 and the job result is failed.

Describe the results you received

errors

Describe the results you expected

a running pod with no errors like in the previous versions of Podman.

podman info output

host:
  arch: amd64
  buildahVersion: 1.39.2
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.13-1.fc41.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.13, commit: '
  cpuUtilization:
    idlePercent: 76.69
    systemPercent: 6.52
    userPercent: 16.79
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: coreos
    version: "41"
  eventLogger: journald
  freeLocks: 2027
  hostname: coreos.eacompany.net
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.13.6-200.fc41.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 4346818560
  memTotal: 8311504896
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.14.0-1.fc41.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.14.0
    package: netavark-1.14.0-1.fc41.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.14.0
  ociRuntime:
    name: crun
    package: crun-1.20-2.fc41.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.20
      commit: 9c9a76ac11994701dd666c4f0b869ceffb599a66
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20250217.ga1e48a0-2.fc41.x86_64
    version: ""
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.fc41.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.8.0
      SLIRP_CONFIG_VERSION_MAX: 5
      libseccomp: 2.5.5
  swapFree: 0
  swapTotal: 0
  uptime: 0h 4m 59.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /var/home/core/.config/containers/storage.conf
  containerStore:
    number: 12
    paused: 0
    running: 12
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/home/core/.local/share/containers/storage
  graphRootAllocated: 248828112896
  graphRootUsed: 112083517440
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 67
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /var/home/core/.local/share/containers/storage/volumes
version:
  APIVersion: 5.4.1
  BuildOrigin: Fedora Project
  Built: 1741651200
  BuiltTime: Tue Mar 11 08:00:00 2025
  GitCommit: b79bc8afe796cba51dd906270a7e1056ccdfcf9e
  GoVersion: go1.23.7
  Os: linux
  OsArch: linux/amd64
  Version: 5.4.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Ran into the issue on two separate Fedora CoreOS servers and a VM running Fedora CoreOS.

Additional information

I noticed that if I force the Service Type= to forking instead of notify that I can get things running mostly. The service container for the pod fails but everything else runs.

@wdouglascampbell wdouglascampbell added the kind/bug Categorizes issue or PR as related to a bug. label Apr 3, 2025
@wdouglascampbell
Copy link
Author

I guess I should note that the pod that I am demonstrating above is not my actual usage scenario. I just wanted to make sure that I provided an example that was as simple as possible while still showing the issue.

@Honny1 Honny1 added the pods label Apr 4, 2025
@Luap99
Copy link
Member

Luap99 commented Apr 4, 2025

Why do you run a k8s yaml without any containers? Is that even valid in the k8s context?

I would assume 945aade caused this regression so it is likely something we should fix.

@ygalblum
Copy link
Contributor

ygalblum commented Apr 4, 2025

Why do you run a k8s yaml without any containers? Is that even valid in the k8s context?

One case could be a PVC as explained in the documentation: https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html#service-type

I would assume 945aade caused this regression so it is likely something we should fix.

Yes, the think the code should also check if any containers were intended to be executed

@Luap99
Copy link
Member

Luap99 commented Apr 4, 2025

Why do you run a k8s yaml without any containers? Is that even valid in the k8s context?

One case could be a PVC as explained in the documentation: https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html#service-type

Well yes and I assume my patch still allows for it. I am just confused about the example yaml here which declares a pod without any container? Because the service container is only setup when there are pods to be started.

I would assume 945aade caused this regression so it is likely something we should fix.

Yes, the think the code should also check if any containers were intended to be executed

yeah, though the entire kube play code is such a mess that is rather hard to make simple changes without such unintended consequences

@wdouglascampbell
Copy link
Author

Why do you run a k8s yaml without any containers? Is that even valid in the k8s context?

One case could be a PVC as explained in the documentation: https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html#service-type

Yes. Another way is how I have been using the combination of .kube and .container files. I feel like this gives me a simpler configuration compared to having to stuff the entire configuration into the .yaml file especially when the number of containers is large.

For example, an nginx container running in a sites pod:

pod-sites.kube with contents:

[Unit]
Description=pod-sites.service
Wants=container-nginx.service
Before=container-nginx.service

[Kube]
Yaml=pod-sites.yaml
PublishPort=192.168.214.220:80:80
PublishPort=192.168.214.220:443:443

[Install]
# Start by default on boot
WantedBy=default.target

container-nginx.container with contents:

[Unit]
Description=container-nginx.service

[Container]
Image=docker.io/library/nginx:alpine
ContainerName=nginx
PodmanArgs=--pod sites

[Install]
# Start by default on boot
WantedBy=default.target

pod-sites.yaml with contents:

apiVersion: v1
kind: Pod
metadata:
  name: sites
spec:
  dnsPolicy: Default
  dnsConfig:
    options:
      - name: single-request

@wdouglascampbell
Copy link
Author

Is my approach wrong and I have just been "lucky" up to this point that it has worked?

@Luap99
Copy link
Member

Luap99 commented Apr 4, 2025

@wdouglascampbell Any reason you are not using [Pod] units for that? https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html#pod-units-pod

That is what they are designed for and then in the container unit use

Pod=<pod-unit>.pod

That also takes care of the dependency management.


That said if this worked before and my patch broke it then we will fix it again. We are committed to provide a stable experience as we follow semver https://semver.org/.

I am not sure if "lucky" is the right word. That is simply something I didn't knew was possible and we didn't seem to have any tests for it either. Once I fix it we can add a regression test for an empty pod yaml.

@Luap99 Luap99 self-assigned this Apr 4, 2025
@wdouglascampbell
Copy link
Author

@wdouglascampbell Any reason you are not using [Pod] units for that? https://docs.podman.io/en/latest/markdown/podman-systemd.unit.5.html#pod-units-pod

That is what they are designed for and then in the container unit use

Pod=<pod-unit>.pod

That also takes care of the dependency management.

I can only claim ignorance as I continue to learn, but appreciate the link. I will definitely take a look.

That said if this worked before and my patch broke it then we will fix it again. We are committed to provide a stable experience as we follow semver https://semver.org/.

I am not sure if "lucky" is the right word. That is simply something I didn't knew was possible and we didn't seem to have any tests for it either. Once I fix it we can add a regression test for an empty pod yaml.

Awesome. Much appreciated.

Luap99 added a commit to Luap99/libpod that referenced this issue Apr 4, 2025
Since commit 945aade we do tear down the kube units if all pods
failed to start. This however broke the use case of an empty pod as we
did not consider that being starting successfully which is wrong and
caused a regression for at least one user.

To fix this special case the empty pod and consider that running.

Fixes: containers#25786
Fixes: 945aade ("quadlet kube: correctly mark unit as failed")

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99
Copy link
Member

Luap99 commented Apr 4, 2025

#25796 should fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. pods quadlet regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants