Flat KV cache layout by kdamaszk · Pull Request #152 · HabanaAI/vllm-hpu-extension

kdamaszk · 2025-04-16T11:46:19Z

Change the layout of KV cache to (num_blocks * block_size, num_kv_heads, head_size). This will improve the performance as update will be done on 1D indices which will remove unnecessary transpose and memcopies due to low FCD.
Corresponding change: HabanaAI/vllm-fork#1106

…_cache

issue fixed

kdamaszk mentioned this pull request Apr 16, 2025

Flat KV cache layout HabanaAI/vllm-fork#1106

Merged

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 8084403 to 9238f7a Compare April 16, 2025 12:10

kdamaszk marked this pull request as ready for review April 17, 2025 06:59

kdamaszk requested review from afierka-intel, kzawora-intel, mgawarkiewicz, michalkuligowski and tzielinski-habana as code owners April 17, 2025 06:59

michalkuligowski approved these changes Apr 17, 2025

View reviewed changes

kdamaszk requested review from jikunshang, madamczyk-intel, mgawarkiewicz-intel and xuechendi as code owners May 6, 2025 12:33

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 6244ad1 to 6a1204c Compare May 7, 2025 08:55

madamczyk-intel previously requested changes May 8, 2025

View reviewed changes

Comment thread vllm_hpu_extension/utils.py Outdated

kdamaszk requested a review from mswiniarsk as a code owner May 8, 2025 13:31

kdamaszk added 4 commits May 9, 2025 16:21

Flat KV cache layout

e736d2b

Fix APC

2312d33

Rename to block_indices_with_offsets

a02f652

Apply review comments

e14cc97

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 16e4622 to e14cc97 Compare May 9, 2025 13:22

kdamaszk added 3 commits May 12, 2025 15:50

Fix flat_pa_mla

7542185

Merge remote-tracking branch 'origin/main' into dev/kdamaszke/flat_kv…

8d92f78

…_cache

Move flatten outside of fetch_from_cache

6ebec4b

kdamaszk force-pushed the dev/kdamaszke/flat_kv_cache branch from 277f5fe to 6ebec4b Compare May 14, 2025 07:59

kdamaszk requested a review from madamczyk-intel May 19, 2025 08:02

kzawora-intel approved these changes May 20, 2025

View reviewed changes

kdamaszk merged commit 987d9d2 into main May 20, 2025

kdamaszk deleted the dev/kdamaszke/flat_kv_cache branch May 20, 2025 11:03

kwisniewski98 pushed a commit that referenced this pull request Jun 2, 2025

Flat KV cache layout (#152)

a4e4541

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flat KV cache layout#152

Flat KV cache layout#152
kdamaszk merged 7 commits into
mainfrom
dev/kdamaszke/flat_kv_cache

kdamaszk commented Apr 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kdamaszk commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kdamaszk commented Apr 16, 2025 •

edited

Loading