Skip to content

e2e: podman top with ps(1): race between run -d and top #19504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edsantiago opened this issue Aug 4, 2023 · 4 comments · Fixed by #19595
Closed

e2e: podman top with ps(1): race between run -d and top #19504

edsantiago opened this issue Aug 4, 2023 · 4 comments · Fixed by #19595
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

Infrequent, but seen twice in the last two days:

[It] podman top with ps(1) options
$ podman [options] run -d registry.fedoraproject.org/fedora-minimal:34 sleep inf
522994a6b9a7d80063d324615deed494ecf3bd0053102c83b6c15b7d85a6736c
$ podman [options] top 522994a6b9a7d80063d324615deed494ecf3bd0053102c83b6c15b7d85a6736c aux
[no output, which causes test to fail]

Should be a simple fix: maybe add podman wait --condition=running, or maybe add an echo READY + WaitForReady. Filing as placeholder because I won't have time to get to it until next week.

Also need to add annotations to the len > 1 assertions, something like "number of output lines from top", because those are really hard to debug. (Alternatively, figure out a way to use string...).To(HaveLen(Numerically(>1))).

  • fedora-38 : int podman fedora-38 rootless host sqlite
    • 08-03 20:34 in Podman top podman top with ps(1) options
  • rawhide : int podman rawhide rootless host sqlite
    • 08-03 13:22 in Podman top podman top with ps(1) options
@edsantiago edsantiago added Good First Issue This issue would be a good issue for a first time contributor to undertake. flakes Flakes from Continuous Integration labels Aug 4, 2023
@AKARSHITJOSHI
Copy link

Hi @edsantiago I would like to give this a try.

@Luap99
Copy link
Member

Luap99 commented Aug 4, 2023

I am not so sure the answer is that simple, podman run -d should block until the container is started which means it is in a running state already. Also podman top would error out if the container state is not running.

Therefore IMO the problem is not that the process is not running. I think there is an actual bug in podman top where it looses output.

@edsantiago edsantiago removed the Good First Issue This issue would be a good issue for a first time contributor to undertake. label Aug 10, 2023
@edsantiago
Copy link
Member Author

@Luap99 you're right. This flake today triggered in the second ps command:

$ podman [options] top 31393bb1d50e57ebc4a51b6238d68ffaece551856e1b5377c94ec96ec3630f12 aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   2400  1152 ?        Ss   16:55   0:00 sleep inf
  ^^^^^^^^^^^^^^^ this is where it usually flakes, immediately after the container launches

$ podman [options] top 31393bb1d50e57ebc4a51b6238d68ffaece551856e1b5377c94ec96ec3630f12 ax -o args

[FAILED] Expected
    <[]string | len:0, cap:0>: nil
to equal
    <[]string | len:2, cap:2>: ["COMMAND", "sleep inf"]

That suggests that adding some sort of wait will not fix anything.

And, because we don't have enough stress in our lives, there's this remote flake, also today:

# podman [options] system service --time 0 unix:/run/podman/podman-8842e554755a91e9657f0811f22f0e07bca1fa5721e470431b96d399f183e0e1.sock
? Exit  [BeforeEach] TOP-LEVEL - /var/tmp/go/src/github.com[/containers/podman/test/e2e/common_test.go:109](https://github.com/containers/podman/blob/3c9bb44bd45360d5b79f2a437d5baf4ca5336514/test/e2e/common_test.go#L109) @ 08/10/23 16:34:32.41 (204ms)
time="2023-08-10T16:34:32-05:00" level=warning msg="IdleTracker: StateClosed transition by connection marked un-managed" X-Reference-Id=0xc000015490
...
# podman-remote [options] top a334a8c9f7ae2fc80b7c1bac68ce571d6f0e97cd6b0fb4820c3e8b1a13a9a0a6 aux
Error: unmarshalling into &handlers.ContainerTopOKBody{ContainerTopOKBody:container.ContainerTopOKBody{Processes:[][]string(nil), Titles:[]string(nil)}}, data "": unexpected end of JSON input

I've removed the good first issue label. And I'm going to pour myself a stiff glass of lemonade and call it a day.

@Luap99
Copy link
Member

Luap99 commented Aug 11, 2023

So I tried to reproducer for hours without luck but I think I see the bug in the code so I am just opening a PR with a fix and future will tell us if the flake is fixed with that.

Luap99 added a commit to Luap99/libpod that referenced this issue Aug 11, 2023
Sometimes there is no output displayed from the podman top command but
no error is shown either. Looking at the code I think the issue here is
that we do not wait for the output reader to end as it runs in a
different goroutine. Thus the last lines of output might be missing.

The fix is simply to wait for said goroutine to finish before returning.
While at it also fix the missing scanner error check and return the read
errors back to the caller.

[NO NEW TESTS NEEDED] It is a flake.

Fixes containers#19504

Signed-off-by: Paul Holzinger <[email protected]>
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Nov 13, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants