llama-cpp-python-0.1.65 and below crashes (memory issue?) and v0.1.66-0.1.70 errors out with GPU #477
Closed
4 tasks done
Labels
model
Model specific issue
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
On v0.1.65 i expect GPU should work.
Current Behavior
My Kernel crashes presumably due to a memory issue. On v0.1.66-0.1.70 the model fails to load.
Environment and Context
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU @ 2.30GHz
Stepping: 0
CPU MHz: 2299.998
BogoMIPS: 4599.99
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 256 KiB
L1i cache: 256 KiB
L2 cache: 2 MiB
L3 cache: 45 MiB
NUMA node0 CPU(s): 0-15
$ uname -a
Linux username-tensorflow-gpu 5.10.0-23-cloud-amd64 #1 SMP Debian 5.10.179-1 (2023-05-12) x86_64 GNU/Linux
$ python3 --version
Python 3.10.10
$ make --version
GNU Make 4.3
Built for x86_64-pc-linux-gnu
$ g++ --version
g++ (Debian 10.2.1-6) 10.2.1 20210110
Failure Information (for bugs)
Kernel crashes, or model fails to load.
Crash:
Fails to load:
Steps to Reproduce
Use CUDA 12.1 and try run below:
Please help! I am using the model from https://huggingface.co/frankenstyle/ggml-q4-models/tree/main/models/llama/7B.
Thank you!
The text was updated successfully, but these errors were encountered: