Skip to content

Podman machine gets stuck in STARTING state if interrupted during startup #24416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cbr7 opened this issue Oct 30, 2024 · 14 comments · Fixed by #25832
Closed

Podman machine gets stuck in STARTING state if interrupted during startup #24416

cbr7 opened this issue Oct 30, 2024 · 14 comments · Fixed by #25832
Assignees
Labels
jira kind/bug Categorizes issue or PR as related to a bug. machine windows issue/bug on Windows

Comments

@cbr7
Copy link

cbr7 commented Oct 30, 2024

Issue Description

It seems that the podman machine will get stuck in STARTING state permanently if the user interrupts the startup sequence one time. The entire system requires to be restarted or the podman machine needs to be deleted to recover from this. This issue was initially reported in PD repo podman-desktop/podman-desktop#9670.

podman_start_race_condition.mp4

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create a new podman machine using podman machine init command
  2. Start the podman machine created at point 1 using podman machine start command
  3. Quickly after issuing the command from point 2 send CTRL-C command to the terminal (SIGINT) to terminate the operation.
  4. Run podman machine ls and notice that the podman machine created at point 1 will be permanently stuck in STARTING state.

Describe the results you received

Podman machine is stuck permanently in STARTING state.

Describe the results you expected

Presumably the state of the podman machine should be Stopped or alternately Running if the SIGINT is not sent fast enough to prevent the startup. Regardless of the state of the podman machine it should not be frozen execute commands correctly afterwards.

podman info output

If you are unable to run podman info for any reason, please provide the podman version, operating system and its version and the architecture you are running.

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@cbr7 cbr7 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 30, 2024
@Luap99 Luap99 added machine windows issue/bug on Windows labels Oct 30, 2024
@baude
Copy link
Member

baude commented Oct 30, 2024

Your video does not work for me. Please post an actual log or transcript. Why do you call this a race? On your fourth point, you say "will not be permanently stuck" and I think you mean to say "will be"? Can you confirm this and update what you are trying to show in the video? You also need to provide podman info as the template suggests.

@cbr7
Copy link
Author

cbr7 commented Oct 30, 2024

@baude in what way is the video not working for you? I just watched it again, works fine, have you tried a different browser?

About point 4, you are correct, there was a not that should not have been, I've edited the post.

@cbr7
Copy link
Author

cbr7 commented Oct 30, 2024

Attached the output from podman info

info.txt

@cbr7
Copy link
Author

cbr7 commented Oct 31, 2024

UPDATE: I've also managed to reproduce the issue on macOS today, so it's not limited to Windows only.

@baude
Copy link
Member

baude commented Oct 31, 2024

when i click the video, it doesnt play. either way, we prefer people post text things where possible as opposed to binary objects.

@cbr7
Copy link
Author

cbr7 commented Oct 31, 2024

The text is that after podman machine start command is issued the user sends SIGINT signal into the terminal and that causes the problem, issue is 100% reproducible.

Copy link

github-actions bot commented Dec 1, 2024

A friendly reminder that this issue had no activity for 30 days.

@leolivier
Copy link

More or less the same issue here although I don't really know if it happened because I interrupted it during the startup sequence but the thing is my machine was stuck in STARTING state and I couldn't do anything with it in the desktop (I'm on Windows)
I had to use podman machine rm podman-machine-default to remove it and then I was able to recreate it properly on the desktop

> podman info
host:
  arch: amd64
  buildahVersion: 1.38.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.12-2.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 99.47
    systemPercent: 0.36
    userPercent: 0.17
  cpus: 8
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: container
    version: "40"
  eventLogger: journald
  freeLocks: 2048
  hostname: Bagheera
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.167.4-microsoft-standard-WSL2
  linkmode: dynamic
  logDriver: journald
  memFree: 7041286144
  memTotal: 8216662016
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.13.1-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.13.1
    package: netavark-1.13.0-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.13.0
  ociRuntime:
    name: crun
    package: crun-1.18.2-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.18.2
      commit: 00ab38af875ddd0d1a8226addda52e1de18339b5
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20241127.gc0fbc7e-1.fc40.x86_64
    version: |
      pasta 0^20241127.gc0fbc7e-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: unix:///run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2147483648
  swapTotal: 2147483648
  uptime: 0h 32m 0.00s
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 1081101176832
  graphRootUsed: 876167168
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.3.1
  Built: 1732147200
  BuiltTime: Thu Nov 21 01:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.7
  Os: linux
  OsArch: linux/amd64
  Version: 5.3.1

@smallsaucepan
Copy link

Seeing this today this on mac (up to date 15.3). New podman user.

Tried the podman machine rm podman-machine-default workaround, re-initing and re-starting without any joy.

podman machine start

Starting machine "podman-machine-default"

... never returns

podman machine ls

NAME                     VM TYPE     CREATED         LAST UP             CPUS        MEMORY      DISK SIZE
podman-machine-default*  applehv     40 minutes ago  Currently starting  4           2GiB        100GiB

podman info

OS: darwin/amd64
buildOrigin: pkginstaller
provider: applehv
version: 5.4.0

Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM
Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:61995: connect: connection refused

Any other ideas, or info I can provide to help debug?

@cbr7
Copy link
Author

cbr7 commented Feb 14, 2025

@smallsaucepan try podman machine rm -f to delete the currently stuck podman machine.

@smallsaucepan
Copy link

Thanks for the suggestion @cbr7. No luck though.

Tried with debug log level on machine start. The logging below appears and a window opens displaying grub bootloader for a second before displaying "Booting `Fedora CoreOS 41.20..." which then hangs. CPU goes to 400% for at least 20 minutes without any sign of progress in either window.

INFO[0000] boot parameters: &{EFIVariableStorePath:/Users/james/.local/share/containers/podman/machine/applehv/efi-bl-podman-machine-default CreateVariableStore:true}
INFO[0000]
INFO[0000] virtual machine parameters:
INFO[0000]      vCPUs: 4
INFO[0000]      memory: 2048 MiB
INFO[0000]
INFO[0000] Adding virtio-blk device (imagePath: /Users/james/.local/share/containers/podman/machine/applehv/podman-machine-default-amd64.raw)
INFO[0000] Adding virtio-rng device
INFO[0000] Adding virtio-vsock device
INFO[0000] Adding virtio-serial device (logFile: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default.log)
INFO[0000] Adding virtio-net device (nat: false macAddress: [5a:94:ef:e4:0c:ee])
INFO[0000] Using unix socket /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default-gvproxy.sock
INFO[0000] local: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/vfkit-15915-b2d0.sock remote: /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default-gvproxy.sock
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-fs device
INFO[0000] Adding virtio-gpu device
INFO[0000] Adding virtio-input pointing device
INFO[0000] Adding virtio-input keyboard device
INFO[0000] virtual machine is running
INFO[0000] Exposing vsock port 1025 on /var/folders/cq/2lcbk22j3j13215bz7y3nf2r0000gn/T/podman/podman-machine-default.sock (listening)
INFO[0000] Exposing vsock port 1024 on /Users/james/.local/share/containers/podman/machine/applehv/podman-machine-default-ignition.sock (listening)
INFO[0000] waiting for VM to stop
2025-02-15 00:56:42.092 vfkit[88341:305874] +[IMKClient subclass]: chose IMKClient_Modern
2025-02-15 00:56:42.092 vfkit[88341:305874] +[IMKInputSession subclass]: chose IMKInputSession_Modern

Appears to get stuck here while CoreOS fails to boot in the other window.

@benoitf
Copy link
Contributor

benoitf commented Feb 14, 2025

@smallsaucepan I think you're hitting #25121

@cbr7
Copy link
Author

cbr7 commented Feb 14, 2025

@smallsaucepan when all else fails restart the macbook/system, upon restart the podman machine will be off and you will be able to delete it.

EDIT: also what @benoitf said, you seem to have a different issue then what is reported in this ticket.

@smallsaucepan
Copy link

Thanks @benoitf and @cbr7. That's a much better fit to what I'm seeing.

@jakecorrenti jakecorrenti self-assigned this Mar 26, 2025
@baude baude added the jira label Mar 28, 2025
jakecorrenti added a commit to jakecorrenti/podman that referenced this issue Apr 8, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
jakecorrenti added a commit to jakecorrenti/podman that referenced this issue Apr 8, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
jakecorrenti added a commit to jakecorrenti/podman that referenced this issue Apr 8, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
jakecorrenti added a commit to jakecorrenti/podman that referenced this issue Apr 8, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
jakecorrenti added a commit to jakecorrenti/podman that referenced this issue Apr 28, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
jakecorrenti added a commit to jakecorrenti/podman that referenced this issue Apr 28, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
jakecorrenti added a commit to jakecorrenti/podman that referenced this issue Apr 29, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
mheon pushed a commit to mheon/libpod that referenced this issue Apr 30, 2025
In the instance where the user sends a signal, such as SIGINT (Ctl-c)
when a Podman Machine is in the middle of starting, make sure the state
doesn't get stuck in the "Currently Starting" status.

Resolves: containers#24416

Signed-off-by: Jake Correnti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira kind/bug Categorizes issue or PR as related to a bug. machine windows issue/bug on Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants