Skip to content

Podman ExecIDs report inaccurate Running state. #18424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AndroidKitKat opened this issue May 2, 2023 · 1 comment · Fixed by #18437
Closed

Podman ExecIDs report inaccurate Running state. #18424

AndroidKitKat opened this issue May 2, 2023 · 1 comment · Fixed by #18437
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@AndroidKitKat
Copy link
Contributor

AndroidKitKat commented May 2, 2023

Issue Description

It seems that podman ExecIDs are persisting and reporting as "running" for ~5 minutes before being removed.

I've only tested this using the podman Golang bindings. Here is a demonstration program: https://gist.github.com/AndroidKitKat/2e1233b17316d96173fe1cf9f3e8aa48

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create a new container
podman create --name alpine-test --tty alpine:latest
  1. Start the container
podman start alpine-test
  1. Start an exec in the container using the REST API.
    I did this using the Go program referenced in the GitHub Gist above
    I compiled it by doing:
go mod init inspectbug
curl "https://gist.githubusercontent.com/AndroidKitKat/2e1233b17316d96173fe1cf9f3e8aa48/raw/40c2b071a53275d0e270d71aee34051140094e46/main.go" > main.go
go get
go build
./inspectbug

Describe the results you received

In that file, I have an Exec with the command sleep 5 and I have a loop running checking the status of the Exec every 1 second. For a duration of 5 seconds (the duration of the sleep command) + 5 minutes, the inspectResult struct's Running member is true, until eventually the program crashes due to no error handling when checking containers.ExecInspect because the ExecID seems to no longer exist at all.

Here's the output of the program:

[developer@guthix inspectbug]$ ./inspectbug
2023/05/02 14:12:21 Exec ID:  20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
2023/05/02 14:12:21 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
...
2023/05/02 14:17:23 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
2023/05/02 14:17:24 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
2023/05/02 14:17:25 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x43 pc=0xe79322]

Describe the results you expected

After 5 seconds, the Exec shows as "Not running" due to sleep exiting.

podman info output

[developer@guthix inspectbug]$ podman info
host:
  arch: amd64
  buildahVersion: 1.27.3
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.4-1.module+el8.7.0+1154+147ffa21.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.4, commit: ddbeffc1e2a247aef04a1be0bc9b1b5ef5f1cd09'
  cpuUtilization:
    idlePercent: 99.9
    systemPercent: 0.04
    userPercent: 0.06
  cpus: 8
  distribution:
    distribution: '"rocky"'
    version: "8.7"
  eventLogger: file
  hostname: guthix
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-425.19.2.el8_7.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 27185180672
  memTotal: 33146671104
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.1.4-1.module+el8.7.0+1154+147ffa21.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.4
      spec: 1.0.2-dev
      go: go1.18.9
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-2.module+el8.7.0+1154+147ffa21.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 16741560320
  swapTotal: 16741560320
  uptime: 100h 45m 33.00s (Approximately 4.17 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/developer/.config/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 7
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.9-1.module+el8.7.0+1154+147ffa21.x86_64
      Version: |-
        fusermount3 version: 3.3.0
        fuse-overlayfs: version 1.9
        FUSE library version 3.3.0
        using FUSE kernel interface version 7.26
  graphRoot: /storage/containers/storage
  graphRootAllocated: 502921392128
  graphRootUsed: 1801228288
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 9
  runRoot: /run/user/1000
  volumePath: /storage/containers/storage/volumes
version:
  APIVersion: 4.2.0
  Built: 1677003394
  BuiltTime: Tue Feb 21 13:16:34 2023
  GitCommit: ""
  GoVersion: go1.18.9
  Os: linux
  OsArch: linux/amd64
  Version: 4.2.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Running on bare metal on Intel NUCs with 11th gen Intel processors.

I am accessing Podman using the golang bindings.

Additional information

I verified that the sleep command exits by looking at the process list of the container.

@AndroidKitKat AndroidKitKat added the kind/bug Categorizes issue or PR as related to a bug. label May 2, 2023
@Luap99 Luap99 self-assigned this May 3, 2023
Luap99 added a commit to Luap99/libpod that referenced this issue May 3, 2023
The remote API will wait 300s by default before conmon will call the
cleanup. In the meantime when you inspect an exec session started with
ExecStart() (so not attached) and it did exit we do not know that. If
a caller inspects it they think it is still running. To prevent this we
should sync the session based on the exec pid and update the state
accordingly.

For a reproducer see the test in this commit or the issue.

Fixes containers#18424

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99
Copy link
Member

Luap99 commented May 3, 2023

PR #18437 should fix it.

Luap99 added a commit to Luap99/libpod that referenced this issue May 23, 2023
The remote API will wait 300s by default before conmon will call the
cleanup. In the meantime when you inspect an exec session started with
ExecStart() (so not attached) and it did exit we do not know that. If
a caller inspects it they think it is still running. To prevent this we
should sync the session based on the exec pid and update the state
accordingly.

For a reproducer see the test in this commit or the issue.

Fixes containers#18424

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants