Eval bug: OOM Error for Vulkan Backend on models that don't strictly fit in VRAM

### Name and Version

[b7502](https://github.com/ggml-org/llama.cpp/releases/tag/b7502)

### Operating systems

Windows

### GGML backends

Vulkan

### Hardware

Ryzen AI 370HX with 64GB LPDDR5X, and 890m iGPU

### Models

Command: ${llamasvr} -m ${mpath}\\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_XL.gguf --jinja --no-mmap --ctx-size 20000 --temp 0.7 --top-p 0.8 --top-k 20 --min-p 0.00

Issue also happens with GLM-4.5-Air, and LLama3.3-70B.

### Problem description & steps to reproduce

After version b7502, and all the way to current b7642 today, any models loaded into memory will throw:

```
.........srv  log_server_r: request: GET /health 127.0.0.1 503

................srv  log_server_r: request: GET /health 127.0.0.1 503

..............srv  log_server_r: request: GET /health 127.0.0.1 503

..........llama_model_load: error loading model: vk::Queue::submit: ErrorOutOfDeviceMemory

llama_model_load_from_file_impl: failed to load model

srv  log_server_r: request: GET /health 127.0.0.1 503

common_init_from_params: failed to load model 'C:\Users\PC\llama-cpp\models\\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_XL.gguf'

srv    load_model: failed to load model, 'C:\Users\PC\llama-cpp\models\\Qwen3-Next-80B-A3B-Instruct-UD-Q4_K_XL.gguf'

srv    operator(): operator(): cleaning up before exit...

main: exiting due to model loading error
```


I have 32GB of my system RAM allocated in BIOS to the 890m iGPU. Models fitting inside the 32GB, run fine. Models larger than that cause issues. If I remove mmap, the model loads, but it goes into disk for whatever reason, which takes a really long time. Prior versions do not have this issue.

### First Bad Commit

[b7502](https://github.com/ggml-org/llama.cpp/releases/tag/b7502)

### Relevant log output

<details>
<summary>Logs</summary>


```console

```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: OOM Error for Vulkan Backend on models that don't strictly fit in VRAM #18642

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: OOM Error for Vulkan Backend on models that don't strictly fit in VRAM #18642

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions