e2e: podman top with ps(1): race between run -d and top #19504

edsantiago · 2023-08-04T02:03:02Z

Infrequent, but seen twice in the last two days:

[It] podman top with ps(1) options
$ podman [options] run -d registry.fedoraproject.org/fedora-minimal:34 sleep inf
522994a6b9a7d80063d324615deed494ecf3bd0053102c83b6c15b7d85a6736c
$ podman [options] top 522994a6b9a7d80063d324615deed494ecf3bd0053102c83b6c15b7d85a6736c aux
[no output, which causes test to fail]

Should be a simple fix: maybe add podman wait --condition=running, or maybe add an echo READY + WaitForReady. Filing as placeholder because I won't have time to get to it until next week.

Also need to add annotations to the len > 1 assertions, something like "number of output lines from top", because those are really hard to debug. (Alternatively, figure out a way to use string...).To(HaveLen(Numerically(>1))).

fedora-38 : int podman fedora-38 rootless host sqlite
- 08-03 20:34 in Podman top podman top with ps(1) options
rawhide : int podman rawhide rootless host sqlite
- 08-03 13:22 in Podman top podman top with ps(1) options

The text was updated successfully, but these errors were encountered:

AKARSHITJOSHI · 2023-08-04T12:42:07Z

Hi @edsantiago I would like to give this a try.

Luap99 · 2023-08-04T12:43:49Z

I am not so sure the answer is that simple, podman run -d should block until the container is started which means it is in a running state already. Also podman top would error out if the container state is not running.

Therefore IMO the problem is not that the process is not running. I think there is an actual bug in podman top where it looses output.

edsantiago · 2023-08-10T23:52:12Z

@Luap99 you're right. This flake today triggered in the second ps command:

$ podman [options] top 31393bb1d50e57ebc4a51b6238d68ffaece551856e1b5377c94ec96ec3630f12 aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   2400  1152 ?        Ss   16:55   0:00 sleep inf
  ^^^^^^^^^^^^^^^ this is where it usually flakes, immediately after the container launches

$ podman [options] top 31393bb1d50e57ebc4a51b6238d68ffaece551856e1b5377c94ec96ec3630f12 ax -o args

[FAILED] Expected
    <[]string | len:0, cap:0>: nil
to equal
    <[]string | len:2, cap:2>: ["COMMAND", "sleep inf"]

That suggests that adding some sort of wait will not fix anything.

And, because we don't have enough stress in our lives, there's this remote flake, also today:

# podman [options] system service --time 0 unix:/run/podman/podman-8842e554755a91e9657f0811f22f0e07bca1fa5721e470431b96d399f183e0e1.sock
? Exit  [BeforeEach] TOP-LEVEL - /var/tmp/go/src/github.com[/containers/podman/test/e2e/common_test.go:109](https://github.com/containers/podman/blob/3c9bb44bd45360d5b79f2a437d5baf4ca5336514/test/e2e/common_test.go#L109) @ 08/10/23 16:34:32.41 (204ms)
time="2023-08-10T16:34:32-05:00" level=warning msg="IdleTracker: StateClosed transition by connection marked un-managed" X-Reference-Id=0xc000015490
...
# podman-remote [options] top a334a8c9f7ae2fc80b7c1bac68ce571d6f0e97cd6b0fb4820c3e8b1a13a9a0a6 aux
Error: unmarshalling into &handlers.ContainerTopOKBody{ContainerTopOKBody:container.ContainerTopOKBody{Processes:[][]string(nil), Titles:[]string(nil)}}, data "": unexpected end of JSON input

I've removed the good first issue label. And I'm going to pour myself a stiff glass of lemonade and call it a day.

Luap99 · 2023-08-11T11:16:08Z

So I tried to reproducer for hours without luck but I think I see the bug in the code so I am just opening a PR with a fix and future will tell us if the flake is fixed with that.

Sometimes there is no output displayed from the podman top command but no error is shown either. Looking at the code I think the issue here is that we do not wait for the output reader to end as it runs in a different goroutine. Thus the last lines of output might be missing. The fix is simply to wait for said goroutine to finish before returning. While at it also fix the missing scanner error check and return the read errors back to the caller. [NO NEW TESTS NEEDED] It is a flake. Fixes containers#19504 Signed-off-by: Paul Holzinger <[email protected]>

edsantiago added Good First Issue This issue would be a good issue for a first time contributor to undertake. flakes Flakes from Continuous Integration labels Aug 4, 2023

edsantiago removed the Good First Issue This issue would be a good issue for a first time contributor to undertake. label Aug 10, 2023

Luap99 mentioned this issue Aug 11, 2023

fix podman top missing output flake #19595

Merged

openshift-merge-robot closed this as completed in #19595 Aug 14, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 13, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e: podman top with ps(1): race between run -d and top #19504

e2e: podman top with ps(1): race between run -d and top #19504

edsantiago commented Aug 4, 2023

AKARSHITJOSHI commented Aug 4, 2023

Luap99 commented Aug 4, 2023

edsantiago commented Aug 10, 2023

Luap99 commented Aug 11, 2023

e2e: podman top with ps(1): race between run -d and top #19504

e2e: podman top with ps(1): race between run -d and top #19504

Comments

edsantiago commented Aug 4, 2023

AKARSHITJOSHI commented Aug 4, 2023

Luap99 commented Aug 4, 2023

edsantiago commented Aug 10, 2023

Luap99 commented Aug 11, 2023