Skip to content

Flat KV cache layout#1106

Merged
kdamaszk merged 16 commits intohabana_mainfrom
dev/kdamaszke/flat_kv_cache
May 21, 2025
Merged

Flat KV cache layout#1106
kdamaszk merged 16 commits intohabana_mainfrom
dev/kdamaszke/flat_kv_cache

Conversation

@kdamaszk
Copy link
Copy Markdown

@kdamaszk kdamaszk commented Apr 16, 2025

Change KV cache layout to (num_blocks * block_size, num_kv_heads, head_size). This will improve the performance as update will be done on 1D indices which will remove unnecessary transpose and memcopies due to low FCD.
Corresponding change: HabanaAI/vllm-hpu-extension#152

The influence of this feature could be observed on the below profilings, where Transpose is no longer observed and index_copy kernels replaced the slower ones scatters:

  • default
image
  • this PR
image

The gain from this feature is even better if we will use split_qkv, as whole index_copy is hidden under MME activity:
image

@kdamaszk kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 9aae441 to 52e356a Compare April 16, 2025 12:13
@kdamaszk kdamaszk marked this pull request as ready for review April 17, 2025 07:00
@kdamaszk kdamaszk marked this pull request as draft April 17, 2025 07:16
Comment thread vllm/attention/backends/hpu_attn.py Outdated
Comment thread vllm/attention/backends/hpu_attn.py Outdated
Comment thread vllm/attention/backends/hpu_attn.py Outdated
@kdamaszk kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch 2 times, most recently from be02bd1 to 7c66881 Compare April 17, 2025 10:33
@michalkuligowski
Copy link
Copy Markdown

/run-gaudi-tests

1 similar comment
@michalkuligowski
Copy link
Copy Markdown

/run-gaudi-tests

@kdamaszk kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch 2 times, most recently from d111a87 to a7de3d2 Compare May 7, 2025 09:03
kdamaszk added 4 commits May 8, 2025 13:33
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
@kdamaszk kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from a7de3d2 to 9737693 Compare May 8, 2025 11:11
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
@kdamaszk kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 9737693 to 55c69c2 Compare May 8, 2025 11:29
Comment thread vllm/v1/worker/hpu_model_runner.py Outdated
kdamaszk added 3 commits May 8, 2025 16:30
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
@kdamaszk
Copy link
Copy Markdown
Author

kdamaszk commented May 9, 2025

Gain is visible also on prompts:

  • default
    image

  • this PR
    image

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
@kdamaszk
Copy link
Copy Markdown
Author

kdamaszk commented May 9, 2025

/run-gaudi-tests

@kdamaszk
Copy link
Copy Markdown
Author

/run-gaudi-tests

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
@kdamaszk
Copy link
Copy Markdown
Author

/run-gaudi-tests

@kdamaszk
Copy link
Copy Markdown
Author

/run-gaudi-tests

@kdamaszk
Copy link
Copy Markdown
Author

/run-gaudi-tests

@kdamaszk
Copy link
Copy Markdown
Author

/run-gaudi-tests

Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>
@kdamaszk kdamaszk marked this pull request as ready for review May 20, 2025 11:04
@kdamaszk
Copy link
Copy Markdown
Author

/run-gaudi-tests

@kdamaszk kdamaszk merged commit 1dc9b66 into habana_main May 21, 2025
46 checks passed
@kdamaszk kdamaszk deleted the dev/kdamaszke/flat_kv_cache branch May 21, 2025 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants