-
Notifications
You must be signed in to change notification settings - Fork 1.1k
CUDA error: invalid device function #1372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Interestingly, I am using Nvidia RTX 4090, and I've got the exact same error during inference!! How can this be possible?! I've create a brand new venv to reproduce this bug:
Then I use Then, I try to run this
The output is:
I tried other models, all same error. === I thought it was an upstream issue, however, also strangely, I can compile and run |
Did you find a solution to this? |
I have the same issue with inference on 4090 GPUs. |
Prerequisites
ROCm 6
Expected Behavior
Attempting to utitilize llama_cpp_python in OobaBooga Webui
Current Behavior
It loads the model into VRAM. Then upon trying to infer;
gml_cuda_compute_forward: GET_ROWS failed
CUDA error: invalid device function
current device: 0, in function ggml_cuda_compute_forward at /tmp/pip-install-7xdln0go/llama-cpp-python_0bc0f935a20b4d68b0bf4ef217f92000/vendor/llama.cpp/ggml-cuda.cu:2300
err
GGML_ASSERT: /tmp/pip-install-7xdln0go/llama-cpp-python_0bc0f935a20b4d68b0bf4ef217f92000/vendor/llama.cpp/ggml-cuda.cu:60: !"CUDA error"
And I notice in the output tensorcores=true.
I have the latest llama.cpp compiled and running with no problems. It also says tensorcores=true in it's output.
Environment and Context
Python 3.11 venv. Manjaro. Torch for ROCm 6.0
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 3800X 8-Core Processor
CPU family: 23
Model: 113
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 48%
CPU max MHz: 4558.8862
CPU min MHz: 2200.0000
BogoMIPS: 7803.32
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall
nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy
svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perf
ctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 sm
ep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mb
m_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb
_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid ove
rflow_recov succor smca sev sev_es
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 4 MiB (8 instances)
L3: 32 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Spec rstack overflow: Mitigation; Safe RET
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not aff
ected
Srbds: Not affected
Tsx async abort: Not affected
$ uname -a
6.6.26-1-MANJARO #1 SMP PREEMPT_DYNAMIC Wed Apr 10 20:11:08 UTC 2024 x86_64 GNU/Linux
Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
step Install with "CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ HSA_OVERRIDE_GFX_VERSION=10.3.0 CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python --no-cache-dir --force-reinstall --no-cache"
Fire up OobaBooga, load a model, and (core dumped).
Note: Many issues seem to be regarding functional or performance issues / differences with
llama.cpp
. In these cases we need to confirm that you're comparing against the version ofllama.cpp
that was built with your python package, and which parameters you're passing to the context.Try the following:
git clone https://github.com/abetlen/llama-cpp-python
cd llama-cpp-python
rm -rf _skbuild/
# delete any old buildspython -m pip install .
cd ./vendor/llama.cpp
cmake
llama.cpp./main
with the same arguments you previously passed to llama-cpp-python and see if you can reproduce the issue. If you can, log an issue with llama.cppFailure Logs
Please include any relevant log snippets or files. If it works under one configuration but not under another, please provide logs for both configurations and their corresponding outputs so it is easy to see where behavior changes.
Also, please try to avoid using screenshots if at all possible. Instead, copy/paste the console output and use Github's markdown to cleanly format your logs for easy readability.
Example environment info:
llama-cpp-python$ python3 --version
Python 3.11.9
llama-cpp-python$ pip list | egrep "uvicorn|fastapi|sse-starlette|numpy"
fastapi 0.110.2
numpy 1.26.4
sse-starlette 1.6.5
uvicorn 0.29.0
Same result if llama_cpp_python installed from cloned git by pip.
The text was updated successfully, but these errors were encountered: