Conversation
Collaborator
|
Nice done! |
zhaochenyang20
requested changes
Nov 7, 2025
|
|
||
| # Nightly tests | ||
| DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_TP1 = "meta-llama/Llama-3.1-8B-Instruct,mistralai/Mistral-7B-Instruct-v0.3,deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct,google/gemma-2-27b-it" | ||
| DEFAULT_MODEL_NAME_FOR_NIGHTLY_EVAL_TP1 = "meta-llama/Llama-3.1-8B-Instruct,mistralai/Mistral-7B-Instruct-v0.3,deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct,google/gemma-2-27b-it,jet-ai/Jet-Nemotron-2B" |
Collaborator
There was a problem hiding this comment.
I tends not to change this 😂
zhaochenyang20
approved these changes
Nov 8, 2025
Ying1123
approved these changes
Nov 8, 2025
Collaborator
|
Only two CI left. Let's wait and see. |
Collaborator
|
congrats! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
To add support for Jet-Nemotron.
Modifications
Accuracy Tests
Benchmarking and Profiling
$ python3 -m sglang.bench_serving --backend sglang --num-prompt 100 ============ Serving Benchmark Result ============ Backend: sglang Traffic request rate: inf Max request concurrency: not set Successful requests: 100 Benchmark duration (s): 36.61 Total input tokens: 33839 Total input text tokens: 33839 Total input vision tokens: 0 Total generated tokens: 21640 Total generated tokens (retokenized): 12768 Request throughput (req/s): 2.73 Input token throughput (tok/s): 924.20 Output token throughput (tok/s): 591.03 Total token throughput (tok/s): 1515.23 Concurrency: 39.95 ----------------End-to-End Latency---------------- Mean E2E Latency (ms): 14626.36 Median E2E Latency (ms): 13954.04 ---------------Time to First Token---------------- Mean TTFT (ms): 1397.39 Median TTFT (ms): 1266.72 P99 TTFT (ms): 2969.48 ---------------Inter-Token Latency---------------- Mean ITL (ms): 61.49 Median ITL (ms): 65.26 P95 ITL (ms): 93.59 P99 ITL (ms): 205.48 Max ITL (ms): 1660.42 ==================================================Checklist