Skip to content

Update extend/decode attention kernel for CPU in sgl-kernel and add UTs#6405

Merged
zhyncs merged 9 commits intosgl-project:mainfrom
yanbing-j:yanbing/attention_kernel
May 20, 2025
Merged

Update extend/decode attention kernel for CPU in sgl-kernel and add UTs#6405
zhyncs merged 9 commits intosgl-project:mainfrom
yanbing-j:yanbing/attention_kernel

Conversation

@yanbing-j
Copy link
Contributor

Motivation

This PR is a follow-up on #2807 and #5150 to update extend/decode attention kernel for CPU. We fuse set_kv_buffer in decode attention, in order to reduce the overhead. We also add correspondig UTs test_extend.py/test_decode.py for extend/decode attention kernels for CPU.

Modifications

Checklist

@yanbing-j yanbing-j force-pushed the yanbing/attention_kernel branch 2 times, most recently from 4666cf8 to 816b4f7 Compare May 19, 2025 08:22
@mingfeima mingfeima marked this pull request as ready for review May 20, 2025 01:51
@zhyncs zhyncs merged commit 32cc66e into sgl-project:main May 20, 2025
@mingfeima mingfeima added sgl-kernel intel cpu cpu backend performance optimization labels May 21, 2025
Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu cpu backend performance optimization intel sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants