Skip to content

Flat KV cache layout#152

Merged
kdamaszk merged 7 commits into
mainfrom
dev/kdamaszke/flat_kv_cache
May 20, 2025
Merged

Flat KV cache layout#152
kdamaszk merged 7 commits into
mainfrom
dev/kdamaszke/flat_kv_cache

Conversation

@kdamaszk
Copy link
Copy Markdown
Contributor

@kdamaszk kdamaszk commented Apr 16, 2025

Change the layout of KV cache to (num_blocks * block_size, num_kv_heads, head_size). This will improve the performance as update will be done on 1D indices which will remove unnecessary transpose and memcopies due to low FCD.
Corresponding change: HabanaAI/vllm-fork#1106

Comment thread vllm_hpu_extension/utils.py Outdated
@kdamaszk kdamaszk requested a review from mswiniarsk as a code owner May 8, 2025 13:31
@kdamaszk kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 16e4622 to e14cc97 Compare May 9, 2025 13:22
@kdamaszk kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 277f5fe to 6ebec4b Compare May 14, 2025 07:59
@kdamaszk kdamaszk requested a review from madamczyk-intel May 19, 2025 08:02
@kdamaszk kdamaszk merged commit 987d9d2 into main May 20, 2025
@kdamaszk kdamaszk deleted the dev/kdamaszke/flat_kv_cache branch May 20, 2025 11:03
kwisniewski98 pushed a commit that referenced this pull request Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants