Add Jet-Nemotron by futrime · Pull Request #12448 · sgl-project/sglang

futrime · 2025-10-31T08:01:17Z

Motivation

To add support for Jet-Nemotron.

Modifications

Added Jet-Nemotron implementation.
Registered Jet-Nemotron as hybrid GDN attention model.
Added Jet-Nemotron configuration.

Accuracy Tests

Model	GSM8K	MMLU
Jet-Nemotron-2B	0.762	0.622
Jet-Nemotron-4B	0.792	0.675

Benchmarking and Profiling

$ python3 -m sglang.bench_serving --backend sglang --num-prompt 100
============ Serving Benchmark Result ============
Backend:                                 sglang    
Traffic request rate:                    inf       
Max request concurrency:                 not set   
Successful requests:                     100       
Benchmark duration (s):                  36.61     
Total input tokens:                      33839     
Total input text tokens:                 33839     
Total input vision tokens:               0         
Total generated tokens:                  21640     
Total generated tokens (retokenized):    12768     
Request throughput (req/s):              2.73      
Input token throughput (tok/s):          924.20    
Output token throughput (tok/s):         591.03    
Total token throughput (tok/s):          1515.23   
Concurrency:                             39.95     
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   14626.36  
Median E2E Latency (ms):                 13954.04  
---------------Time to First Token----------------
Mean TTFT (ms):                          1397.39   
Median TTFT (ms):                        1266.72   
P99 TTFT (ms):                           2969.48   
---------------Inter-Token Latency----------------
Mean ITL (ms):                           61.49     
Median ITL (ms):                         65.26     
P95 ITL (ms):                            93.59     
P99 ITL (ms):                            205.48    
Max ITL (ms):                            1660.42   
==================================================

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

…n JetBlock

…mentation

…aAttnBackendBase

… in ModelRunner

zhaochenyang20 · 2025-11-07T16:35:15Z

Nice done!

zhaochenyang20 · 2025-11-07T16:39:56Z

python/sglang/test/test_utils.py


 # Nightly tests
-DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_TP1 = "meta-llama/Llama-3.1-8B-Instruct,mistralai/Mistral-7B-Instruct-v0.3,deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct,google/gemma-2-27b-it"
+DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_TP1 = "meta-llama/Llama-3.1-8B-Instruct,mistralai/Mistral-7B-Instruct-v0.3,deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct,google/gemma-2-27b-it,jet-ai/Jet-Nemotron-2B"


I tends not to change this 😂

…a2_cache_params method

zhaochenyang20 · 2025-11-08T18:10:14Z

Only two CI left. Let's wait and see.

zhaochenyang20 · 2025-11-09T18:06:44Z

congrats!

futrime added 11 commits October 31, 2025 03:08

feat: add Jet-Nemotron basic arch

0de5d82

Add JetBlock and JetNemotronCache implementations

68f599e

Try to add linear attn but half way

d0b5a45

refactor: remove JetBlock implementation

34d1e8b

refactor: remove JetNemotronConfig import

be3c2d9

feat: add JetNemotron configuration and attention backend support

66718c9

feat: add initializer_range to JetNemotronConfig and update imports i…

cfcbd51

…n JetBlock

feat: add rms_norm_eps to JetNemotronConfig and update JetBlock imple…

519ed53

…mentation

Almost done

e443530

refactor: remove JetBlockAttnBackend references and replace with Mamb…

bcda661

…aAttnBackendBase

refactor: remove jet_block_config and update hybrid_gdn_config checks…

b6988fd

… in ModelRunner

futrime marked this pull request as ready for review November 3, 2025 13:35

futrime requested review from Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners November 3, 2025 13:35

futrime marked this pull request as draft November 3, 2025 13:38

futrime added 3 commits November 4, 2025 07:23

acc matched

ad5d393

docs: add Jet-Nemotron model to generative models documentation

7a113a1

feat: add Jet-Nemotron model to nightly evaluation test models

37daede

futrime marked this pull request as ready for review November 4, 2025 07:53

Merge branch 'main' into jet-nemotron

e8a03d2

zhaochenyang20 requested changes Nov 7, 2025

View reviewed changes

futrime added 2 commits November 8, 2025 00:53

fix: streamline conv_state handling in JetBlock class

fa9cf75

feat: use SGLang's Column and Row parallel

39ed7fa

futrime requested a review from Fridge003 as a code owner November 8, 2025 03:22

github-actions bot added the documentation Improvements or additions to documentation label Nov 8, 2025

zhaochenyang20 added the run-ci label Nov 8, 2025

zhaochenyang20 approved these changes Nov 8, 2025

View reviewed changes

Ying1123 approved these changes Nov 8, 2025

View reviewed changes

Fridge003 and others added 2 commits November 7, 2025 21:34

Merge branch 'main' into jet-nemotron

893e437

refactor: move import statement for get_attention_tp_size inside mamb…

de3dee8

…a2_cache_params method

Fridge003 merged commit 3633f8b into sgl-project:main Nov 9, 2025
185 of 201 checks passed

alex-t-hu mentioned this pull request Nov 11, 2025

Jet-Nemotron — EAGLE3 + Varlen Dynamic Conv #13025

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add Jet-Nemotron#12448

Add Jet-Nemotron#12448
Fridge003 merged 19 commits intosgl-project:mainfrom
futrime:jet-nemotron

futrime commented Oct 31, 2025 •

edited

Loading

Uh oh!

zhaochenyang20 commented Nov 7, 2025

Uh oh!

zhaochenyang20 Nov 7, 2025

Uh oh!

zhaochenyang20 commented Nov 8, 2025

Uh oh!

Uh oh!

zhaochenyang20 commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

futrime commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

zhaochenyang20 commented Nov 7, 2025

Uh oh!

zhaochenyang20 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Nov 8, 2025

Uh oh!

Uh oh!

zhaochenyang20 commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

futrime commented Oct 31, 2025 •

edited

Loading