Podman ExecIDs report inaccurate Running state. #18424

AndroidKitKat · 2023-05-02T18:15:51Z

Issue Description

It seems that podman ExecIDs are persisting and reporting as "running" for ~5 minutes before being removed.

I've only tested this using the podman Golang bindings. Here is a demonstration program: https://gist.github.com/AndroidKitKat/2e1233b17316d96173fe1cf9f3e8aa48

Steps to reproduce the issue

Create a new container

podman create --name alpine-test --tty alpine:latest

Start the container

podman start alpine-test

Start an exec in the container using the REST API.
I did this using the Go program referenced in the GitHub Gist above
I compiled it by doing:

go mod init inspectbug
curl "https://gist.githubusercontent.com/AndroidKitKat/2e1233b17316d96173fe1cf9f3e8aa48/raw/40c2b071a53275d0e270d71aee34051140094e46/main.go" > main.go
go get
go build
./inspectbug

Describe the results you received

In that file, I have an Exec with the command sleep 5 and I have a loop running checking the status of the Exec every 1 second. For a duration of 5 seconds (the duration of the sleep command) + 5 minutes, the inspectResult struct's Running member is true, until eventually the program crashes due to no error handling when checking containers.ExecInspect because the ExecID seems to no longer exist at all.

Here's the output of the program:

[developer@guthix inspectbug]$ ./inspectbug
2023/05/02 14:12:21 Exec ID:  20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
2023/05/02 14:12:21 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
...
2023/05/02 14:17:23 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
2023/05/02 14:17:24 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
2023/05/02 14:17:25 Still running: 20ed50eb719cd691259d66de361be5793ac1f6d70179d5ef6978fbb72adf0c76
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x43 pc=0xe79322]

Describe the results you expected

After 5 seconds, the Exec shows as "Not running" due to sleep exiting.

podman info output

[developer@guthix inspectbug]$ podman info
host:
  arch: amd64
  buildahVersion: 1.27.3
  cgroupControllers: []
  cgroupManager: cgroupfs
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.4-1.module+el8.7.0+1154+147ffa21.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.4, commit: ddbeffc1e2a247aef04a1be0bc9b1b5ef5f1cd09'
  cpuUtilization:
    idlePercent: 99.9
    systemPercent: 0.04
    userPercent: 0.06
  cpus: 8
  distribution:
    distribution: '"rocky"'
    version: "8.7"
  eventLogger: file
  hostname: guthix
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 4.18.0-425.19.2.el8_7.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 27185180672
  memTotal: 33146671104
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.1.4-1.module+el8.7.0+1154+147ffa21.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.4
      spec: 1.0.2-dev
      go: go1.18.9
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-2.module+el8.7.0+1154+147ffa21.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 16741560320
  swapTotal: 16741560320
  uptime: 100h 45m 33.00s (Approximately 4.17 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /home/developer/.config/containers/storage.conf
  containerStore:
    number: 7
    paused: 0
    running: 7
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.9-1.module+el8.7.0+1154+147ffa21.x86_64
      Version: |-
        fusermount3 version: 3.3.0
        fuse-overlayfs: version 1.9
        FUSE library version 3.3.0
        using FUSE kernel interface version 7.26
  graphRoot: /storage/containers/storage
  graphRootAllocated: 502921392128
  graphRootUsed: 1801228288
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 9
  runRoot: /run/user/1000
  volumePath: /storage/containers/storage/volumes
version:
  APIVersion: 4.2.0
  Built: 1677003394
  BuiltTime: Tue Feb 21 13:16:34 2023
  GitCommit: ""
  GoVersion: go1.18.9
  Os: linux
  OsArch: linux/amd64
  Version: 4.2.0

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

Running on bare metal on Intel NUCs with 11th gen Intel processors.

I am accessing Podman using the golang bindings.

Additional information

I verified that the sleep command exits by looking at the process list of the container.

The text was updated successfully, but these errors were encountered:

The remote API will wait 300s by default before conmon will call the cleanup. In the meantime when you inspect an exec session started with ExecStart() (so not attached) and it did exit we do not know that. If a caller inspects it they think it is still running. To prevent this we should sync the session based on the exec pid and update the state accordingly. For a reproducer see the test in this commit or the issue. Fixes containers#18424 Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2023-05-03T12:55:20Z

PR #18437 should fix it.

The remote API will wait 300s by default before conmon will call the cleanup. In the meantime when you inspect an exec session started with ExecStart() (so not attached) and it did exit we do not know that. If a caller inspects it they think it is still running. To prevent this we should sync the session based on the exec pid and update the state accordingly. For a reproducer see the test in this commit or the issue. Fixes containers#18424 Signed-off-by: Paul Holzinger <[email protected]>

AndroidKitKat added the kind/bug Categorizes issue or PR as related to a bug. label May 2, 2023

Luap99 self-assigned this May 3, 2023

Luap99 mentioned this issue May 3, 2023

remote: exec inspect update exec session status #18437

Merged

openshift-merge-robot closed this as completed in #18437 May 3, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 25, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Podman ExecIDs report inaccurate Running state. #18424

Podman ExecIDs report inaccurate Running state. #18424

AndroidKitKat commented May 2, 2023 •

edited

Loading

Luap99 commented May 3, 2023

Podman ExecIDs report inaccurate Running state. #18424

Podman ExecIDs report inaccurate Running state. #18424

Comments

AndroidKitKat commented May 2, 2023 • edited Loading

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Luap99 commented May 3, 2023

AndroidKitKat commented May 2, 2023 •

edited

Loading