Skip to content

Misc. bug: Qwen 3.5 over RPC+Vulkan generates gibberish #22235

@anakayub

Description

@anakayub

Name and Version

8870

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

One one node out of 4:
GGML_RPC_DEBUG=1 ./build/bin/rpc-server -H 192.168.68.66 -d Vulkan0 -p 50020 -c

The server:
/Users/amir/llama.cpp/build/bin/llama-server \
-m /Users/amir/Library/Caches/llama.cpp/Qwen3.5-27B-UD-Q5_K_XL.gguf \
-mg 0 -t 8 -tb 8 --port 8080 --host 0.0.0.0 -np 1 -ngl 99 \
-fa 0 -c 170000 -n -1 \
--rpc 192.168.68.73:50010,192.168.68.73:50011,192.168.68.66:50021,192.168.68.66:50020 \
-ts 51,7,3,3,23,13 \
-sps 0.1 -to 1200 -b 128 -ub 32 \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.00 \
--repeat-penalty 1.0 \
--cache-ram 40960

Problem description & steps to reproduce

My current working setup is a server with 4 RPC nodes, running Qwen 3.5 27B.

I'm on 8334 with PR20518 edits.

With the latest version, there is gibberish with my standard prompt (for daily work I use AnythingLLM; the llama.cpp webserver is just to show bugs):

Image

On one of the nodes, tests show that it can run both Qwen3.5-27B-UD-Q5_K_XL.gguf and llama 3 (the oldest node, the trashcan Mac Pro with D700):
Qwen 3.5:

Image

Llama 3:

Image

From one of the rpc nodes:
From one of the iMac Pro's rpc (Vega 56):
get_tensor] buffer: 0x600003a64300, data: 0xa1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600003a64300, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600003a64300, data: 0xa1000, offset: 0, size: 4
[set_tensor] buffer: 0x600003a64300, data: 0xa1010, offset: 0, size: 16
[set_tensor] buffer: 0x600003a64300, data: 0xa1210, offset: 0, size: 8
[set_tensor] buffer: 0x600003a64300, data: 0xa1310, offset: 0, size: 8192
[set_tensor] buffer: 0x600003a64300, data: 0xe1310, offset: 0, size: 1024
[graph_recompute] device: 0
[get_tensor] buffer: 0x600003a64300, data: 0xa1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600003a64300, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600003a64300, data: 0xa1000, offset: 0, size: 4
[set_tensor] buffer: 0x600003a64300, data: 0xa1010, offset: 0, size: 16
[set_tensor] buffer: 0x600003a64300, data: 0xa1210, offset: 0, size: 8
[set_tensor] buffer: 0x600003a64300, data: 0xa1310, offset: 0, size: 8192
[set_tensor] buffer: 0x600003a64300, data: 0xe1310, offset: 0, size: 1024
[graph_recompute] device: 0
[get_tensor] buffer: 0x600003a64300, data: 0xa1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600003a64300, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600003a64300, data: 0xa1000, offset: 0, size: 4
[set_tensor] buffer: 0x600003a64300, data: 0xa1010, offset: 0, size: 16
[set_tensor] buffer: 0x600003a64300, data: 0xa1210, offset: 0, size: 8
[set_tensor] buffer: 0x600003a64300, data: 0xa1310, offset: 0, size: 8192
[set_tensor] buffer: 0x600003a64300, data: 0xe1310, offset: 0, size: 1024
[graph_recompute] device: 0
[get_tensor] buffer: 0x600003a64300, data: 0xa1000, offset: 0, size: 20480

From one of the D700's:
set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600002ec4000, data: 0xa1000, offset: 0, size: 4
[graph_recompute] device: 0
[get_tensor] buffer: 0x600002ec4000, data: 0x1000, offset: 0, size: 20480

And the other D700:
From the other Mac Pro (2 D700's, remember):
graph_recompute] device: 0
[get_tensor] buffer: 0x600000bdc000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600000bdc000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600000bdc000, data: 0xa1000, offset: 0, size: 4
[set_tensor] buffer: 0x600000bdc000, data: 0xa1010, offset: 0, size: 16
[set_tensor] buffer: 0x600000bdc000, data: 0xa1210, offset: 0, size: 8
[set_tensor] buffer: 0x600000bdc000, data: 0xa1310, offset: 0, size: 8192
[set_tensor] buffer: 0x600000bdc000, data: 0xe1310, offset: 0, size: 1024
[graph_recompute] device: 0
[get_tensor] buffer: 0x600000bdc000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600000bdc000, data: 0x1000, offset: 0, size: 20480
[set_tensor] buffer: 0x600000bdc000, data: 0xa1000, offset: 0, size: 4
[set_tensor] buffer: 0x600000bdc000, data: 0xa1010, offset: 0, size: 16
[set_tensor] buffer: 0x600000bdc000, data: 0xa1210, offset: 0, size: 8
[set_tensor] buffer: 0x600000bdc000, data: 0xa1310, offset: 0, size: 8192
[set_tensor] buffer: 0x600000bdc000, data: 0xe1310, offset: 0, size: 1024
[graph_recompute] device: 0
[get_tensor] buffer: 0x600000bdc000, data: 0x1000, offset: 0, size: 20480

Do advise if you need more information

First Bad Commit

I'm unable to identify the first bad commit

Relevant log output

Logs

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions