Replies: 1 comment
-
It turns out this was quite easy actually. So when using Podman with the libkrun provider, and running an AI workload container, like ramalama, when executing the I tried mounting this filepath onto my single node Kind cluster in a custom kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: kind-cluster
nodes:
- role: control-plane
extraMounts:
- hostPath: /dev/dri
containerPath: /dev/dri In addition, I added a apiVersion: apps/v1
kind: Deployment
metadata:
name: ramalama
labels:
app: ramalama
spec:
selector:
matchLabels:
app: ramalama
template:
metadata:
labels:
app: ramalama
spec:
volumes:
- name: gpudir
hostPath:
path: /dev/dri
containers:
- image: quay.io/ramalama/ramalama
name: ramalama
command: ["sleep", "infinity"]
volumeMounts:
- name: gpudir
mountPath: /dev/dri Then I just shelled into the ramalama container, ran a model (llama2 to be specific), gave it a prompt and saw in MacOS Activity Monitor that the GPU was being utilised as expected. Hope this helps someone! |
Beta Was this translation helpful? Give feedback.
-
Hello!
I have successfully followed instructions to get Podman running with
libkrun
/krunkit
to get GPU access for containers running via Podman (ggml-org/llama.cpp#12985).I've found that using
ramalama
is the easiest way to run AI models with GPU acceleration in a container running via Podman, easier than Ollama. Not exactly sure why this is, but I think it is related to the patched MESA driver that's needed (if you don't know what I mean, read the link above please).I am now trying to get GPU access for containers running in a local Kind cluster using Podman as the container engine.
I am struggling to find any information on how to do this other than by using something like
nvidia-container-toolkit
and adding something like this to the resources requests/limits:But this is not possible on an Apple Silicon machine as they do not have explicit GPU compute units, it's treated as 1 CPU.
At the moment, I believe I am stuck with running an AI container (ramalama, ollama, whatever really) on Podman, next to running my Kind cluster and then setting up the networking so that my Kind cluster can communicate with the separate container running directly in Podman.
However, if there is a way to expose the GPU cores through Kind (like it has been done for Podman), then I definitely want to go down that route.
Beta Was this translation helpful? Give feedback.
All reactions