cdi_spec_dirs from containers.conf is not respected #25691

hmenke · 2025-03-26T14:23:13Z

In my ~/.config/containers/containers.conf I have

[engine]
cdi_spec_dirs = ["/tmp/test/cdi"]

but Podman doesn't even try to open that directory. It only looks at the hardcoded default /etc/cdi.

$ strace -f -e trace=openat podman run --gpus all debian:bookworm sh -c exit |& grep cdi
[pid 403372] openat(AT_FDCWD, "/etc/cdi", O_RDONLY|O_CLOEXEC) = 8
[pid 403372] openat(AT_FDCWD, "/etc/cdi", O_RDONLY|O_CLOEXEC) = 8

Originally posted by @hmenke in containers/common#1834 (comment)

The text was updated successfully, but these errors were encountered:

Luap99 · 2025-03-26T14:34:38Z

Note this is a podman issue as this was simply never merged into podman.
#21448

hmenke · 2025-03-26T14:43:20Z

I thought Podman uses config.Default() from c/common/pkg/config. At the same time helper_binaries_dir, which plays a similar role in the config hierarchy as cdi_spec_dirs does appear to work for me.

Quick verification:

$ mkdir -p /tmp/test/libexec
$ cat ~/.config/containers/containers.conf 
[engine]
helper_binaries_dir = ["/tmp/test/libexec"]
$ podman run debian:bookworm sh -c exit
Error: could not find "netavark" in one of [/tmp/test/libexec].  To resolve this error, set the helper_binaries_dir key in the `[engine]` section of containers.conf to the directory containing your helper binaries.

Of course now there is an error that netavark cannot be found because the directory /tmp/test/libexec is empty, but it is to show that helper_binaries_dir is correctly forwarded.

hmenke · 2025-03-26T15:26:48Z

I strongly suspect that we need to somehow forward the CdiSpecDirs when creating a new cdi cache here, because this is also the location where the error Error: setting up CDI devices: unresolvable CDI devices is triggered from:

podman/libpod/container_internal_common.go

Lines 640 to 653 in f5ab9d1

    
           if len(c.config.CDIDevices) > 0 { 
        
           	registry, err := cdi.NewCache( 
        
           		cdi.WithAutoRefresh(false), 
        
           	) 
        
           	if err != nil { 
        
           		return nil, nil, fmt.Errorf("creating CDI registry: %w", err) 
        
           	} 
        
           	if err := registry.Refresh(); err != nil { 
        
           		logrus.Debugf("The following error was triggered when refreshing the CDI registry: %v", err) 
        
           	} 
        
           	if _, err := registry.InjectDevices(g.Config, c.config.CDIDevices...); err != nil { 
        
           		return nil, nil, fmt.Errorf("setting up CDI devices: %w", err) 
        
           	} 
        
           }

What I have in mind is something like this

diff --git a/libpod/container_internal_common.go b/libpod/container_internal_common.go
index 017a01e5b..8674e477a 100644
--- a/libpod/container_internal_common.go
+++ b/libpod/container_internal_common.go
@@ -640,6 +640,7 @@ func (c *Container) generateSpec(ctx context.Context) (s *spec.Spec, cleanupFunc
 	if len(c.config.CDIDevices) > 0 {
 		registry, err := cdi.NewCache(
 			cdi.WithAutoRefresh(false),
+			cdi.WithSpecDirs(...CdiSpecDirs),
 		)
 		if err != nil {
 			return nil, nil, fmt.Errorf("creating CDI registry: %w", err)

but where to get CdiSpecDirs from? It is no part of the ContainerConfig referred to by c.config.

Luap99 · 2025-03-26T16:14:00Z

#21448 already shows how to do it

It it is just a matter of someone finishing that work, i.e. writing tests and and adding documentation.

hmenke · 2025-03-26T18:49:27Z

I applied #21448 to Podman 5.4.1, but still nothing. My custom CdiSpecDir is not considered. It only looks at /etc/cdi.

$ mkdir /tmp/cdi
$ nvidia-ctk cdi generate -output /tmp/cdi/nvidia.yaml
$ cat ~/.config/containers/containers.conf
[engine]
cdi_spec_dirs = ["/tmp/cdi"]
$ strace -f podman --log-level=debug --cdi-spec-dir /tmp/cdi run --gpus all --rm alpine nvidia-smi |& grep -i cdi
execve("/soft/bin/podman", ["podman", "--log-level=debug", "--cdi-spec-dir", "/tmp/cdi", "run", "--gpus", "all", "--rm", "alpine", "nvidia-smi"], 0x7ffc12cbba50 /* 210 vars */) = 0
read(4, "podman\0--log-level=debug\0--cdi-s"..., 512) = 87
[pid 112223] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 168time="2025-03-26T19:46:04+01:00" level=debug msg="Called run.PersistentPreRunE(podman --log-level=debug --cdi-spec-dir /tmp/cdi run --gpus all --rm alpine nvidia-smi)"
[pid 112237] <... read resumed>"podman\0--log-level=debug\0--cdi-s"..., 512) = 87
[pid 112237] execve("/proc/self/exe", ["podman", "--log-level=debug", "--cdi-spec-dir", "/tmp/cdi", "run", "--gpus", "all", "--rm", "alpine", "nvidia-smi"], 0x10f8c0c0 /* 214 vars */ <unfinished ...>
[pid 112237] <... read resumed>"podman\0--log-level=debug\0--cdi-s"..., 512) = 87
[pid 112237] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 168time="2025-03-26T19:46:04+01:00" level=debug msg="Called run.PersistentPreRunE(podman --log-level=debug --cdi-spec-dir /tmp/cdi run --gpus all --rm alpine nvidia-smi)"
[pid 112243] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 92time="2025-03-26T19:46:04+01:00" level=debug msg="Identified CDI device nvidia.com/gpu=all"
[pid 112249] newfstatat(AT_FDCWD, "/etc/cdi", 0xc0005c1ca8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
[pid 112249] newfstatat(AT_FDCWD, "/etc/cdi", 0xc0005c1d78, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
[pid 112241] write(2, "time=\"2025-03-26T19:46:04+01:00\""..., 137time="2025-03-26T19:46:04+01:00" level=debug msg="ExitCode msg: \"setting up cdi devices: unresolvable cdi devices nvidia.com/gpu=all\""
[pid 112241] write(2, "Error: setting up CDI devices: u"..., 75Error: setting up CDI devices: unresolvable CDI devices nvidia.com/gpu=all

jankaluza · 2025-03-27T13:56:32Z

Yeah, I think the pull-request needs more work. I'm now working on it.

Luap99 transferred this issue from containers/common Mar 26, 2025

Luap99 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 26, 2025

jankaluza mentioned this issue Mar 28, 2025

Add cdi-spec-dir option to top level options #25717

Merged

openshift-merge-bot bot closed this as completed in #25717 Mar 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cdi_spec_dirs from containers.conf is not respected #25691

cdi_spec_dirs from containers.conf is not respected #25691

hmenke commented Mar 26, 2025

Luap99 commented Mar 26, 2025

hmenke commented Mar 26, 2025 •

edited

Loading

hmenke commented Mar 26, 2025

Luap99 commented Mar 26, 2025

hmenke commented Mar 26, 2025

jankaluza commented Mar 27, 2025

cdi_spec_dirs from containers.conf is not respected #25691

cdi_spec_dirs from containers.conf is not respected #25691

Comments

hmenke commented Mar 26, 2025

Luap99 commented Mar 26, 2025

hmenke commented Mar 26, 2025 • edited Loading

hmenke commented Mar 26, 2025

Luap99 commented Mar 26, 2025

hmenke commented Mar 26, 2025

jankaluza commented Mar 27, 2025

hmenke commented Mar 26, 2025 •

edited

Loading