Improve docs & Rename Gemini -> VertexAI by merrymercy · Pull Request #19 · sgl-project/sglang

merrymercy · 2024-01-17T10:23:22Z

No description provided.

* support W8A8 for DeepSeek-R1 * add assertion on cpu_has_amx_support() * refactor device check in MoE

Support V2 V3 R1 style MLA decode

* support W8A8 for DeepSeek-R1 * add assertion on cpu_has_amx_support() * refactor device check in MoE

Co-authored-by: Robert Ru <key4ng@Roberts-Mac-Studio.local>

opt layerwise prefill

* port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func

* port layernorm 3d * apply layernorm * support for bias * fix * intf fix * add support for CPU * fix tp=3/6 padding issue in encoder vision * fix tp=3/6 padding issue in qwen3-omni * refactor code * add mrope * change attention_mask shape to use flash attn * add kernel apply_rotary_pos_emb_cpu * replace nn.Linear with ReplicatedLinear * enable torch.compile * construct mask using query.dtype instead of bool on CPU * add fast path for sparse attention * fix double free segfault by wrong setting of BLOCK_M * improve extend kernel performance for long context length * update test_extend.py * update comment * fix topk softmax performance issue * port optimization for image preprocessor in Qwen2VLImageProcessorFast * apply optimization for image preprocessor * update docker file * optimize conv3d used in patch embedding * resolve conflict * apply optimized conv3d * apply optimization for flash_attn_varlen_func (sgl-project#19) * port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func * remove contiguous before rope (sgl-project#20) * Revert "resolve conflict" This reverts commit 7622f6d. * fix after rebase * Update pyproject_cpu.toml * Update xeon.Dockerfile * minor fix after rebase * rope: add support for bf16 sincos (sgl-project#102) * format * Update xeon.Dockerfile * odd tp for cpu * Apply linear_gelu_linear and fix numa memory bind (sgl-project#22) * [CPU] Optimize small oc GEMM for Qwen3-next on CPU (sgl-project#12446) Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> * port linear_gelu_linear kernel * apply linear_gelu_linear for TP=1 * fix numa memory bind * apply parallel partition patch --------- Co-authored-by: jianan-gu <jianan.gu@intel.com> * Revert "Fix: test_vlm_offline_throughput output throughput (sgl-project#13279)" (sgl-project#101) This reverts commit 7ee3e36. * fix input dtype mismatch issue * apply optimized layernorm --------- Co-authored-by: Zheng, Beilei <beilei.zheng@intel.com> Co-authored-by: ZailiWang <zaili.wang@intel.com> Co-authored-by: mingfeima <mingfei.ma@intel.com> Co-authored-by: jianan-gu <jianan.gu@intel.com>

merrymercy added 6 commits January 17, 2024 09:49

update readme

7e1c169

rename

088ed61

rename gemini -> vertex ai

c67aae8

update

c9b343e

update server args

bcde1a4

update

fde082d

merrymercy merged commit bf51ddc into main Jan 17, 2024

merrymercy deleted the docs branch January 17, 2024 10:54

CSEEduanyu mentioned this pull request Jan 26, 2025

[Bug] NCCL Crash with SIGSEGV Frequently when deploying deepseek v3 #2803

Closed

5 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Improve docs & Rename Gemini -> VertexAI (sgl-project#19)

53df8ba

yanbing-j pushed a commit to yanbing-j/sglang that referenced this pull request Mar 25, 2025

support W8A8 for DeepSeek-R1 on CPU (sgl-project#19)

e279a4d

* support W8A8 for DeepSeek-R1 * add assertion on cpu_has_amx_support() * refactor device check in MoE

pi314ever pushed a commit to pi314ever/sglang that referenced this pull request May 16, 2025

Added device agnostic changes in Llava model script. (sgl-project#19)

27688bf

yuleiqin mentioned this pull request May 26, 2025

[Bug] main pd version Exception: Failed to encode tensor map: 700 #6590

Closed

5 tasks

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 28, 2025

support W8A8 for DeepSeek-R1 on CPU (sgl-project#19)

6b9994d

* support W8A8 for DeepSeek-R1 * add assertion on cpu_has_amx_support() * refactor device check in MoE

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request May 28, 2025

support W8A8 for DeepSeek-R1 on CPU (sgl-project#19)

a1864ab

* support W8A8 for DeepSeek-R1 * add assertion on cpu_has_amx_support() * refactor device check in MoE

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Jun 6, 2025

support W8A8 for DeepSeek-R1 on CPU (sgl-project#19)

1c0bb9f

* support W8A8 for DeepSeek-R1 * add assertion on cpu_has_amx_support() * refactor device check in MoE

pengxin99 pushed a commit to pengxin99/sglang that referenced this pull request Jun 19, 2025

MLA decode esimd kernel integration (sgl-project#19)

9e39542

Support V2 V3 R1 style MLA decode

Zhou-sx mentioned this pull request Jun 19, 2025

[Bug] Deepseek EP + DP Fail and Accuracy Crush #7041

Closed

5 tasks

chunyuan-w added a commit to chunyuan-w/sglang that referenced this pull request Jun 25, 2025

support W8A8 for DeepSeek-R1 on CPU (sgl-project#19)

ccae96b

* support W8A8 for DeepSeek-R1 * add assertion on cpu_has_amx_support() * refactor device check in MoE

ericschreiber mentioned this pull request Aug 13, 2025

[Bug] CUDA error: uncorrectable ECC error encountered when using HiCache with xPyD disaggregation. #9151

Closed

5 tasks

gaolaobao mentioned this pull request Aug 25, 2025

[Bug] RTX 5060: RMSNorm failed, same as the #7249 issue, when running qwen2.5-0.5b-instruct model. #9600

Closed

5 tasks

key4ng added a commit to key4ng/sglang that referenced this pull request Nov 9, 2025

[Documentation] Setup MkDocs for genai-bench (sgl-project#19)

d636615

Co-authored-by: Robert Ru <key4ng@Roberts-Mac-Studio.local>

0xymoro mentioned this pull request Nov 10, 2025

[Bug] 0.5.5 custom all reduce crashing #13016

Closed

5 tasks

wzrf pushed a commit to wzrf/sglang-fusionrag that referenced this pull request Feb 8, 2026

Merge pull request sgl-project#19 from kvcache-ai/layerwise-prefill-opt

384f824

opt layerwise prefill

sywangyi pushed a commit to sywangyi/sglang that referenced this pull request Feb 26, 2026

apply optimization for flash_attn_varlen_func (sgl-project#19)

72c4e75

* port optimization for flash_attn_varlen_func * apply flash_attn_varlen_func

putdanil mentioned this pull request Mar 4, 2026

[Bug] FLUX.2-dev FP8 transformer crashes with 4 reference images during denoising #19873

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve docs & Rename Gemini -> VertexAI#19

Improve docs & Rename Gemini -> VertexAI#19
merrymercy merged 6 commits intomainfrom
docs

merrymercy commented Jan 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

merrymercy commented Jan 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant