Name and Version
b7502
Operating systems
Windows
GGML backends
Vulkan
Hardware
Ryzen AI 370HX with 64GB LPDDR5X, and 890m iGPU
Models
Command: ${llamasvr} -m ${mpath}\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_XL.gguf --jinja --no-mmap --ctx-size 20000 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.00
Issue also happens with GLM-4.5-Air, and LLama3.3-70B.
Problem description & steps to reproduce
After version b7502, and all the way to current b7642 today, any models loaded into memory will throw:
.........srv log_server_r: request: GET /health 127.0.0.1 503
................srv log_server_r: request: GET /health 127.0.0.1 503
..............srv log_server_r: request: GET /health 127.0.0.1 503
..........llama_model_load: error loading model: vk::Queue::submit: ErrorOutOfDeviceMemory
llama_model_load_from_file_impl: failed to load model
srv log_server_r: request: GET /health 127.0.0.1 503
common_init_from_params: failed to load model 'C:\Users\PC\llama-cpp\models\\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_XL.gguf'
srv load_model: failed to load model, 'C:\Users\PC\llama-cpp\models\\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_XL.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading error
I have 32GB of my system RAM allocated in BIOS to the 890m iGPU. Models fitting inside the 32GB, run fine. Models larger than that cause issues. If I remove mmap, the model loads, but it goes into disk for whatever reason, which takes a really long time. Prior versions do not have this issue.
First Bad Commit
b7502
Relevant log output
Logs
Name and Version
b7502
Operating systems
Windows
GGML backends
Vulkan
Hardware
Ryzen AI 370HX with 64GB LPDDR5X, and 890m iGPU
Models
Command: ${llamasvr} -m ${mpath}\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_XL.gguf --jinja --no-mmap --ctx-size 20000 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.00
Issue also happens with GLM-4.5-Air, and LLama3.3-70B.
Problem description & steps to reproduce
After version b7502, and all the way to current b7642 today, any models loaded into memory will throw:
I have 32GB of my system RAM allocated in BIOS to the 890m iGPU. Models fitting inside the 32GB, run fine. Models larger than that cause issues. If I remove mmap, the model loads, but it goes into disk for whatever reason, which takes a really long time. Prior versions do not have this issue.
First Bad Commit
b7502
Relevant log output
Logs