Skip to content

cdi_spec_dirs from containers.conf is not respected #25691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hmenke opened this issue Mar 26, 2025 · 6 comments · Fixed by #25717
Closed

cdi_spec_dirs from containers.conf is not respected #25691

hmenke opened this issue Mar 26, 2025 · 6 comments · Fixed by #25717
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@hmenke
Copy link

hmenke commented Mar 26, 2025

In my ~/.config/containers/containers.conf I have

[engine]
cdi_spec_dirs = ["/tmp/test/cdi"]

but Podman doesn't even try to open that directory. It only looks at the hardcoded default /etc/cdi.

$ strace -f -e trace=openat podman run --gpus all debian:bookworm sh -c exit |& grep cdi
[pid 403372] openat(AT_FDCWD, "/etc/cdi", O_RDONLY|O_CLOEXEC) = 8
[pid 403372] openat(AT_FDCWD, "/etc/cdi", O_RDONLY|O_CLOEXEC) = 8

Originally posted by @hmenke in containers/common#1834 (comment)

@Luap99
Copy link
Member

Luap99 commented Mar 26, 2025

Note this is a podman issue as this was simply never merged into podman.
#21448

@Luap99 Luap99 transferred this issue from containers/common Mar 26, 2025
@Luap99 Luap99 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 26, 2025
@hmenke
Copy link
Author

hmenke commented Mar 26, 2025

I thought Podman uses config.Default() from c/common/pkg/config. At the same time helper_binaries_dir, which plays a similar role in the config hierarchy as cdi_spec_dirs does appear to work for me.

Quick verification:

$ mkdir -p /tmp/test/libexec
$ cat ~/.config/containers/containers.conf 
[engine]
helper_binaries_dir = ["/tmp/test/libexec"]
$ podman run debian:bookworm sh -c exit
Error: could not find "netavark" in one of [/tmp/test/libexec].  To resolve this error, set the helper_binaries_dir key in the `[engine]` section of containers.conf to the directory containing your helper binaries.

Of course now there is an error that netavark cannot be found because the directory /tmp/test/libexec is empty, but it is to show that helper_binaries_dir is correctly forwarded.

@hmenke
Copy link
Author

hmenke commented Mar 26, 2025

I strongly suspect that we need to somehow forward the CdiSpecDirs when creating a new cdi cache here, because this is also the location where the error Error: setting up CDI devices: unresolvable CDI devices is triggered from:

if len(c.config.CDIDevices) > 0 {
registry, err := cdi.NewCache(
cdi.WithAutoRefresh(false),
)
if err != nil {
return nil, nil, fmt.Errorf("creating CDI registry: %w", err)
}
if err := registry.Refresh(); err != nil {
logrus.Debugf("The following error was triggered when refreshing the CDI registry: %v", err)
}
if _, err := registry.InjectDevices(g.Config, c.config.CDIDevices...); err != nil {
return nil, nil, fmt.Errorf("setting up CDI devices: %w", err)
}
}

What I have in mind is something like this

diff --git a/libpod/container_internal_common.go b/libpod/container_internal_common.go
index 017a01e5b..8674e477a 100644
--- a/libpod/container_internal_common.go
+++ b/libpod/container_internal_common.go
@@ -640,6 +640,7 @@ func (c *Container) generateSpec(ctx context.Context) (s *spec.Spec, cleanupFunc
 	if len(c.config.CDIDevices) > 0 {
 		registry, err := cdi.NewCache(
 			cdi.WithAutoRefresh(false),
+			cdi.WithSpecDirs(...CdiSpecDirs),
 		)
 		if err != nil {
 			return nil, nil, fmt.Errorf("creating CDI registry: %w", err)

but where to get CdiSpecDirs from? It is no part of the ContainerConfig referred to by c.config.

@Luap99
Copy link
Member

Luap99 commented Mar 26, 2025

#21448 already shows how to do it

It it is just a matter of someone finishing that work, i.e. writing tests and and adding documentation.

@hmenke
Copy link
Author

hmenke commented Mar 26, 2025

I applied #21448 to Podman 5.4.1, but still nothing. My custom CdiSpecDir is not considered. It only looks at /etc/cdi.

$ mkdir /tmp/cdi
$ nvidia-ctk cdi generate -output /tmp/cdi/nvidia.yaml
$ cat ~/.config/containers/containers.conf
[engine]
cdi_spec_dirs = ["/tmp/cdi"]
$ strace -f podman --log-level=debug --cdi-spec-dir /tmp/cdi run --gpus all --rm alpine nvidia-smi |& grep -i cdi
execve("/soft/bin/podman", ["podman", "--log-level=debug", "--cdi-spec-dir", "/tmp/cdi", "run", "--gpus", "all", "--rm", "alpine", "nvidia-smi"], 0x7ffc12cbba50 /* 210 vars */) = 0
read(4, "podman\0--log-level=debug\0--cdi-s"..., 512) = 87
[pid 112223] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 168time="2025-03-26T19:46:04+01:00" level=debug msg="Called run.PersistentPreRunE(podman --log-level=debug --cdi-spec-dir /tmp/cdi run --gpus all --rm alpine nvidia-smi)"
[pid 112237] <... read resumed>"podman\0--log-level=debug\0--cdi-s"..., 512) = 87
[pid 112237] execve("/proc/self/exe", ["podman", "--log-level=debug", "--cdi-spec-dir", "/tmp/cdi", "run", "--gpus", "all", "--rm", "alpine", "nvidia-smi"], 0x10f8c0c0 /* 214 vars */ <unfinished ...>
[pid 112237] <... read resumed>"podman\0--log-level=debug\0--cdi-s"..., 512) = 87
[pid 112237] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 168time="2025-03-26T19:46:04+01:00" level=debug msg="Called run.PersistentPreRunE(podman --log-level=debug --cdi-spec-dir /tmp/cdi run --gpus all --rm alpine nvidia-smi)"
[pid 112243] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 92time="2025-03-26T19:46:04+01:00" level=debug msg="Identified CDI device nvidia.com/gpu=all"
[pid 112249] newfstatat(AT_FDCWD, "/etc/cdi", 0xc0005c1ca8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
[pid 112249] newfstatat(AT_FDCWD, "/etc/cdi", 0xc0005c1d78, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
[pid 112241] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 137time="2025-03-26T19:46:04+01:00" level=debug msg="ExitCode msg: \"setting up cdi devices: unresolvable cdi devices nvidia.com/gpu=all\""
[pid 112241] write(2, "Error: setting up CDI devices: u"..., 75Error: setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all

@jankaluza
Copy link
Member

Yeah, I think the pull-request needs more work. I'm now working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants