Flat KV cache layout by kdamaszk · Pull Request #1106 · HabanaAI/vllm-fork

kdamaszk · 2025-04-16T12:00:45Z

Change KV cache layout to (num_blocks * block_size, num_kv_heads, head_size). This will improve the performance as update will be done on 1D indices which will remove unnecessary transpose and memcopies due to low FCD.
Corresponding change: HabanaAI/vllm-hpu-extension#152

The influence of this feature could be observed on the below profilings, where Transpose is no longer observed and index_copy kernels replaced the slower ones scatters:

default

this PR

The gain from this feature is even better if we will use split_qkv, as whole index_copy is hidden under MME activity:

michalkuligowski · 2025-04-17T12:05:01Z

/run-gaudi-tests

michalkuligowski · 2025-04-18T10:50:11Z

/run-gaudi-tests

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk · 2025-05-09T10:11:10Z

Gain is visible also on prompts:

default
this PR

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk · 2025-05-09T10:56:12Z

/run-gaudi-tests

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk · 2025-05-12T08:19:28Z

/run-gaudi-tests

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk · 2025-05-12T12:54:32Z

/run-gaudi-tests

kdamaszk · 2025-05-13T13:57:48Z

/run-gaudi-tests

kdamaszk · 2025-05-14T08:06:08Z

/run-gaudi-tests

kdamaszk · 2025-05-20T09:56:04Z

/run-gaudi-tests

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk · 2025-05-20T11:04:51Z

/run-gaudi-tests

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 9aae441 to 52e356a Compare April 16, 2025 12:13

kdamaszk mentioned this pull request Apr 16, 2025

Flat KV cache layout HabanaAI/vllm-hpu-extension#152

Merged

kdamaszk marked this pull request as ready for review April 17, 2025 07:00

kdamaszk requested review from afierka-intel, kzawora-intel, mgawarkiewicz, michalkuligowski and vivekgoe as code owners April 17, 2025 07:00

kdamaszk marked this pull request as draft April 17, 2025 07:16

mswiniarsk requested changes Apr 17, 2025

View reviewed changes

Comment thread vllm/attention/backends/hpu_attn.py Outdated

Comment thread vllm/attention/backends/hpu_attn.py Outdated

Comment thread vllm/attention/backends/hpu_attn.py Outdated

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch 2 times, most recently from be02bd1 to 7c66881 Compare April 17, 2025 10:33

michalkuligowski approved these changes Apr 17, 2025

View reviewed changes

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch 2 times, most recently from d111a87 to a7de3d2 Compare May 7, 2025 09:03

kdamaszk added 4 commits May 8, 2025 13:33

Flat KV cache layout

7fc0c83

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

Fix APC

cf9b8e8

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

Fix multimodal

f13d580

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

Fix V1, rename to block_indices_with_offsets

b26ce1b

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from a7de3d2 to 9737693 Compare May 8, 2025 11:11

Fix mllama

55c69c2

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 9737693 to 55c69c2 Compare May 8, 2025 11:29

madamczyk-intel requested changes May 8, 2025

View reviewed changes

Comment thread vllm/v1/worker/hpu_model_runner.py Outdated

kdamaszk added 3 commits May 8, 2025 16:30

Apply review comments

fc903de

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

Fix mllama

92f7e46

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

Merge branch 'habana_main' into dev/kdamaszke/flat_kv_cache

df6aaff

Fix precommit

f09bf1a

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk added 2 commits May 12, 2025 09:36

Fix mllama again

5b56d87

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

Merge branch 'habana_main' into dev/kdamaszke/flat_kv_cache

ec35cff

Fix deepseek

aa6fc1c

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

Merge branch 'habana_main' into dev/kdamaszke/flat_kv_cache

4f7ef4a

Merge branch 'habana_main' into dev/kdamaszke/flat_kv_cache

815737e

kdamaszk requested review from madamczyk-intel and mswiniarsk May 19, 2025 08:02

Merge branch 'habana_main' into dev/kdamaszke/flat_kv_cache

6bce0ba

Update extension

d9ac926

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

kdamaszk marked this pull request as ready for review May 20, 2025 11:04

kdamaszk requested review from jikunshang, mgawarkiewicz-intel and xuechendi as code owners May 20, 2025 11:04

kzawora-intel approved these changes May 20, 2025

View reviewed changes

madamczyk-intel approved these changes May 21, 2025

View reviewed changes

mswiniarsk approved these changes May 21, 2025

View reviewed changes

kdamaszk merged commit 1dc9b66 into habana_main May 21, 2025
46 checks passed

kdamaszk deleted the dev/kdamaszke/flat_kv_cache branch May 21, 2025 09:11

iboiko-habana mentioned this pull request May 25, 2025

Align max_block calculation with get_kv_cache_shape changes #1312

Merged

zhenwei-intel mentioned this pull request Jul 30, 2025

fix acc issue caused by recv kv cache from mooncake for pd disaggregation #1686

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flat KV cache layout#1106

Flat KV cache layout#1106
kdamaszk merged 16 commits intohabana_mainfrom
dev/kdamaszke/flat_kv_cache

kdamaszk commented Apr 16, 2025 •

edited by github-actions Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michalkuligowski commented Apr 17, 2025

Uh oh!

michalkuligowski commented Apr 18, 2025

Uh oh!

Uh oh!

kdamaszk commented May 9, 2025

Uh oh!

kdamaszk commented May 9, 2025

Uh oh!

kdamaszk commented May 12, 2025

Uh oh!

kdamaszk commented May 12, 2025

Uh oh!

kdamaszk commented May 13, 2025

Uh oh!

kdamaszk commented May 14, 2025

Uh oh!

kdamaszk commented May 20, 2025

Uh oh!

kdamaszk commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kdamaszk commented Apr 16, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michalkuligowski commented Apr 17, 2025

Uh oh!

michalkuligowski commented Apr 18, 2025

Uh oh!

Uh oh!

kdamaszk commented May 9, 2025

Uh oh!

kdamaszk commented May 9, 2025

Uh oh!

kdamaszk commented May 12, 2025

Uh oh!

kdamaszk commented May 12, 2025

Uh oh!

kdamaszk commented May 13, 2025

Uh oh!

kdamaszk commented May 14, 2025

Uh oh!

kdamaszk commented May 20, 2025

Uh oh!

kdamaszk commented May 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kdamaszk commented Apr 16, 2025 •

edited by github-actions Bot

Loading