Eval bug: F16-K + TURBO-V missing from fattn dispatch, crashes at fattn.cu:339 on sm_89 and sm_121

### Name and Version

TheTom/llama-cpp-turboquant
- Commit 8590cbf (tested on RTX 4090 sm_89)
- HEAD 107362298 (tested on DGX Spark GB10 sm_121)
Both builds include PR #82 cherry-picked and -DGGML_CUDA_FA_ALL_QUANTS=ON


### Operating systems

Linux

### GGML backends

CUDA

### Hardware

Reproduced on two hardware configurations:
- NVIDIA GB10 Grace Blackwell (sm_121, 128GB unified memory, CUDA 13.0.2, aarch64 Ubuntu 24.04)
- NVIDIA RTX 4090 (sm_89, 24GB discrete VRAM, CUDA 12.4, x86_64 Ubuntu 22.04, RunPod hosts in Iowa US and Timisoara Romania)


### Models

Qwen3-30B-A3B Q4_K_M and UD-Q4_K_XL variants from https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF


### Problem description & steps to reproduce

Running llama-bench with flash attention enabled and any turbo V-cache type paired with f16 K-cache crashes consistently across both architectures.

Reproduce on any CUDA hardware (sm_89 or sm_121 both affected):

./build/bin/llama-bench \
  -m Qwen3-30B-A3B-Q4_K_M.gguf \
  -fa 1 -t 1 -ngl 99 \
  -p 0 -n 128 -pg 8192,128 \
  -ctk f16 -ctv turbo3

Expected: benchmark runs on GPU.
Actual: crash at fattn.cu:339 with ggml_cuda_flash_attn_ext then GGML_ABORT("fatal error").

Root cause analysis:

Line 339 in ggml/src/ggml-cuda/fattn.cu is the GGML_ABORT that fires when no K/V type case matches in the dispatch. Checking the cases above it, F16 K-cache has cases for:
- F16 + F16 (line 244, 300)
- F16 + Q4_0 (line 252)
- F16 + Q4_1 (line 260)
- F16 + Q5_0 (line 268)
- F16 + Q5_1 (line 276)
- F16 + Q8_0 (line 284, added by PR #82)
- F16 + BF16 (line 292)

Missing:
- F16 + TURBO2_0
- F16 + TURBO3_0
- F16 + TURBO4_0

Symmetric turbo (turbo3/turbo3, turbo4/turbo4) cases exist. Turbo/q8_0 asymmetric exists. But F16-K + turbo-V asymmetric has no dispatch case, so it hits the abort.

Same class of missing-case bug that PR #82 fixed for F16+Q8_0, just not extended to the turbo V-cache variants.

Proposed fix:
Add FATTN_VEC_CASES_ALL_D entries for F16 + TURBO2_0, F16 + TURBO3_0, F16 + TURBO4_0 in fattn.cu around line 284 where the existing F16-K asymmetric cases live. Happy to submit a PR if the fix is as simple as it looks and there is no deeper reason these were intentionally excluded.

Context: This was found during validation of PR #82 on DGX Spark GB10 unified memory, posted in Discussion #20969.


### First Bad Commit

Not a regression. These dispatch cases appear to have never been added.


### Relevant log output

<details>
<summary>Logs</summary>

```console
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24090 MiB):
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes, VRAM: 24090 MiB
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB | ... |    f16 |    f16 |  1 |    pp8192+tg128 |       4872.72 ± 15.83 |
| qwen3moe 30B.A3B Q4_K - Medium |  17.28 GiB | ... |    f16 |   q8_0 |  1 |   pp32768+tg128 |        4306.00 ± 5.17 |
/workspace/llama-cpp-turboquant/ggml/src/ggml-cuda/fattn.cu:339: fatal error
/workspace/llama-cpp-turboquant/build/bin/libggml-base.so.0(ggml_print_backtrace+0x21f)
/workspace/llama-cpp-turboquant/build/bin/libggml-base.so.0(ggml_abort+0x152)
/workspace/llama-cpp-turboquant/build/bin/libggml-cuda.so.0(_Z24ggml_cuda_flash_attn_extR25ggml_backend_cuda_contextP11ggml_tensor+0x6fe)
/workspace/llama-cpp-turboquant/build/bin/libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x81f)
/workspace/llama-cpp-turboquant/build/bin/libllama.so.0(llama_decode+0x10)

Same signature reproduced on sm_121 (DGX Spark GB10) with HEAD 107362298.
</details> ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: F16-K + TURBO-V missing from fattn dispatch, crashes at fattn.cu:339 on sm_89 and sm_121 #83

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Eval bug: F16-K + TURBO-V missing from fattn dispatch, crashes at fattn.cu:339 on sm_89 and sm_121 #83

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions