[CPU][INT4] Add INT4 kernels for CPU by jianan-gu · Pull Request #8226 · sgl-project/sglang

jianan-gu · 2025-07-21T11:53:44Z

This PR implements CPU int4 kernels, which are called by CPU AWQ frontend https://github.com/sgl-project/sglang/pull/8225/files

Including: AWQLinear and AWQMoE

gemini-code-assist

Summary of Changes

Hello @jianan-gu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new CPU kernels for performing linear operations with 4-bit quantized weights (INT4). It provides two main approaches: one for A16W4 (16-bit activations, 4-bit weights) and another for A8W4 (8-bit activations, 4-bit weights), including utilities for weight packing and activation quantization. This significantly expands the CPU's capability for efficient inference with quantized models.

Highlights

A16W4 Linear Kernel: Implemented a tinygemm_kernel for BFloat16 and Half activations with quint4x2 (packed INT4) weights, supporting both a custom tinygemm_kernel_nn (for smaller M) and a brgemm path (for larger M) leveraging Intel's optimized GEMM library. This is found in sgl-kernel/csrc/cpu/gemm_int4_w4a16.cpp.
A8W4 Linear Kernel: Introduced a new set of kernels for 8-bit quantized activations and 4-bit weights, including per-token symmetric quantization for activations and specialized _dequant_gemm_accum functions for the matrix multiplication and dequantization. This is found in sgl-kernel/csrc/cpu/gemm_int4_w4a8.cpp.
Weight Packing Utility: Added convert_int4_weight_packed to pre-process and optimize INT4 weights, scales, and zero points into a packed format suitable for efficient CPU execution, including VNNI4 reordering for AVX512. This utility is part of the A8W4 implementation in sgl-kernel/csrc/cpu/gemm_int4_w4a8.cpp.
CPU Parallelism Enhancements: Introduced generic parallel_2d and adjust_num_threads utilities in sgl-kernel/csrc/cpu/common.h to improve thread blocking and utilization for 2D parallel computations, which are leveraged by the new GEMM kernels.
AVX512 Optimizations: The new kernels heavily utilize AVX512 intrinsics (e.g., _mm512_dpbf16_ps, _mm512_dpbusd_epi32) for high-performance computation on supported CPUs, particularly evident in both gemm_int4_w4a16.cpp and gemm_int4_w4a8.cpp.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces AWQ linear kernels for CPU, including 2D parallelization helpers and kernels for INT4 weight GEMM with both 16-bit (w4a16) and 8-bit (w4a8) activations. It's recommended to address the potential runtime error with at::Half instantiation and verify the correctness of compensation calculation and symmetric quantization data type.

sgl-kernel/csrc/cpu/gemm_int4_w4a16.cpp

sgl-kernel/csrc/cpu/gemm_int4_w4a8.cpp

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

jianan-gu · 2025-12-25T07:24:18Z

/rerun-failed-ci

jianan-gu · 2025-12-26T01:23:23Z

/rerun-failed-ci

jianan-gu · 2025-12-26T08:14:44Z

/rerun-failed-ci

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

add int4 cpu kernels

106de7b

gemini-code-assist bot reviewed Jul 21, 2025

View reviewed changes

sgl-kernel/csrc/cpu/gemm_int4_w4a16.cpp Outdated Show resolved Hide resolved

sgl-kernel/csrc/cpu/gemm_int4_w4a8.cpp Outdated Show resolved Hide resolved

sgl-kernel/csrc/cpu/gemm_int4_w4a8.cpp Show resolved Hide resolved

jianan-gu and others added 2 commits July 21, 2025 19:56

refine notes

c1f6f43

update ActDtype for sym

48982ec

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

jianan-gu changed the title ~~[CPU][INT4] Add AWQ Linear kernels for CPU~~ [CPU][INT4] Add INT4 Linear kernels for CPU Jul 21, 2025

jianan-gu mentioned this pull request Jul 21, 2025

[CPU][INT4] Add AWQ frontend support for CPU #8225

Open

jianan-gu added 2 commits July 21, 2025 09:07

add ut

59d9066

remove unused codes

13e1e3c

jianan-gu marked this pull request as ready for review July 22, 2025 01:47

jianan-gu requested review from BBuf, FlamingoPg, HaiShaw, HandH1998, Ying1123, ispobock, merrymercy, yizhang2077 and zhyncs as code owners July 22, 2025 01:47

mingfeima mentioned this pull request Jul 23, 2025

[Roadmap] CPU Backend Optimization (2025 H2) #8281

Closed

2 tasks

jianan-gu added 2 commits July 24, 2025 10:47

Merge branch 'main' into cpu_int4_kernel

dd5ef21

add moe int4 kernels (#49)

380dafa

jianan-gu requested review from ch-wan and kushanam as code owners July 30, 2025 05:30

jianan-gu changed the title ~~[CPU][INT4] Add INT4 Linear kernels for CPU~~ [CPU][INT4] Add INT4 kernels for CPU Jul 30, 2025

mingfeima marked this pull request as draft August 1, 2025 00:06

mingfeima added intel cpu cpu backend performance optimization labels Aug 1, 2025

jianan-gu added 2 commits August 25, 2025 10:32

refinements to align with frontends

8b69897

refine model-level MoE cpu API

c5aa5d7

jianan-gu added 2 commits December 1, 2025 01:34

Merge remote-tracking branch 'origin/main' into cpu_int4_kernel

c03197c

Merge branch 'main' into cpu_int4_kernel

5fe39f9

mingfeima approved these changes Dec 4, 2025

View reviewed changes

jianan-gu added 2 commits December 5, 2025 16:17

Merge branch 'main' into cpu_int4_kernel

3cee7d2

Merge branch 'main' into cpu_int4_kernel

0edb07a

jianan-gu requested a review from fzyzcjy as a code owner December 10, 2025 04:30

jianan-gu added 10 commits December 9, 2025 23:48

minor refine after rebase

905d7f5

final refinement

72b75f4

Merge branch 'main' into cpu_int4_kernel

f8dbb57

Merge branch 'main' into cpu_int4_kernel

626f744

Merge branch 'main' into cpu_int4_kernel

e96e6e9

Merge branch 'main' into cpu_int4_kernel

77dc799

Merge branch 'main' into cpu_int4_kernel

9aaf7c2

Merge branch 'main' into cpu_int4_kernel

936a69c

Merge branch 'main' into cpu_int4_kernel

cbe57cd

Merge branch 'main' into cpu_int4_kernel

ab85684

jianan-gu and others added 6 commits December 26, 2025 16:49

Merge branch 'main' into cpu_int4_kernel

a113cb4

Merge branch 'main' into cpu_int4_kernel

819cb15

Merge branch 'main' into cpu_int4_kernel

bc11072

Merge branch 'main' into cpu_int4_kernel

47f0e97

Merge branch 'main' into cpu_int4_kernel

28d4d63

Merge branch 'main' into cpu_int4_kernel

4c44339

Kangyan-Zhou merged commit c35aa02 into sgl-project:main Jan 30, 2026
24 of 40 checks passed

charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Jan 30, 2026

[CPU][INT4] Add INT4 kernels for CPU (sgl-project#8226)

7f8b303

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

jianan-gu added a commit to jianan-gu/sglang that referenced this pull request Feb 3, 2026

[CPU][INT4] Add INT4 kernels for CPU (sgl-project#8226)

0c17147

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026

[CPU][INT4] Add INT4 kernels for CPU (sgl-project#8226)

9c3fe01

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

[CPU][INT4] Add INT4 kernels for CPU (sgl-project#8226)

80ea3dd

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU][INT4] Add INT4 kernels for CPU #8226

[CPU][INT4] Add INT4 kernels for CPU #8226
Kangyan-Zhou merged 48 commits intosgl-project:mainfrom
jianan-gu:cpu_int4_kernel

jianan-gu commented Jul 21, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jianan-gu commented Dec 25, 2025

Uh oh!

jianan-gu commented Dec 26, 2025

Uh oh!

jianan-gu commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

jianan-gu commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jianan-gu commented Dec 25, 2025

Uh oh!

jianan-gu commented Dec 26, 2025

Uh oh!

jianan-gu commented Dec 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

jianan-gu commented Jul 21, 2025 •

edited

Loading