How to utilise M-series MacOS GPU cores in local Podman Kind cluster? #25999

Mo0rBy · 2025-04-28T13:07:27Z

Mo0rBy
Apr 28, 2025

Hello!

I have successfully followed instructions to get Podman running with libkrun/krunkit to get GPU access for containers running via Podman (ggml-org/llama.cpp#12985).

I've found that using ramalama is the easiest way to run AI models with GPU acceleration in a container running via Podman, easier than Ollama. Not exactly sure why this is, but I think it is related to the patched MESA driver that's needed (if you don't know what I mean, read the link above please).

I am now trying to get GPU access for containers running in a local Kind cluster using Podman as the container engine.
I am struggling to find any information on how to do this other than by using something like nvidia-container-toolkit and adding something like this to the resources requests/limits:

resources:
  limits:
    nvidia.com/gpu: 1

But this is not possible on an Apple Silicon machine as they do not have explicit GPU compute units, it's treated as 1 CPU.

At the moment, I believe I am stuck with running an AI container (ramalama, ollama, whatever really) on Podman, next to running my Kind cluster and then setting up the networking so that my Kind cluster can communicate with the separate container running directly in Podman.

However, if there is a way to expose the GPU cores through Kind (like it has been done for Podman), then I definitely want to go down that route.

Mo0rBy · 2025-04-28T14:02:02Z

Mo0rBy
Apr 28, 2025
Author

It turns out this was quite easy actually.

So when using Podman with the libkrun provider, and running an AI workload container, like ramalama, when executing the podman run command, you need to add the --device /dev/dri flag so that the Podman container can access the GPU cores correctly.

I tried mounting this filepath onto my single node Kind cluster in a custom kind-config.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: kind-cluster
nodes:
  - role: control-plane
    extraMounts:
      - hostPath: /dev/dri
        containerPath: /dev/dri

In addition, I added a volume and volumeMount to my ramalama deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ramalama
  labels:
    app: ramalama
spec:
  selector:
    matchLabels:
      app: ramalama
  template:
    metadata:
      labels:
        app: ramalama
    spec:
      volumes:
        - name: gpudir
          hostPath:
            path: /dev/dri
      containers:
      - image: quay.io/ramalama/ramalama
        name: ramalama
        command: ["sleep", "infinity"]
        volumeMounts:
            - name: gpudir
              mountPath: /dev/dri

Then I just shelled into the ramalama container, ran a model (llama2 to be specific), gave it a prompt and saw in MacOS Activity Monitor that the GPU was being utilised as expected.

Hope this helps someone!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to utilise M-series MacOS GPU cores in local Podman Kind cluster? #25999

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to utilise M-series MacOS GPU cores in local Podman Kind cluster? #25999

Mo0rBy Apr 28, 2025

Replies: 1 comment

Mo0rBy Apr 28, 2025 Author

Mo0rBy
Apr 28, 2025

Mo0rBy
Apr 28, 2025
Author