add return logprobs by pathfinder-pf · Pull Request #310 · sgl-project/sglang-jax

pathfinder-pf · 2025-11-04T05:32:51Z

This pr includes return input and outputs logprobs with float32 when set return_logprobs=true, note: performance doesn't optimized because of many condition judge in logits_process, there will occurs cache miss case.

func test

environment

v6e-1

test

server mode

server command
JAX_COMPILATION_CACHE_DIR=/tmp/jit_cache python3 -u -m sgl_jax.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --dist-init-addr=0.0.0.0:10011 --nnodes=1 --tp-size=1 --device=tpu --random-seed=27 --node-rank=0 --mem-fraction-static=0.8 --chunked-prefill-size=8192 --download-dir=/tmp --dtype=bfloat16 --precompile-bs-paddings 1 64 --max-running-requests 64 --max-total-tokens 257536 --skip-server-warmup --attention-backend=fa --precompile-token-paddings 8192 --page-size=64 --disable-overlap-schedule
eval & result
curl localhost:30000/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "qwen",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"temperature": 0.7,
"logprobs":true,
"top_logprobs": 1
}'

{"id":"94dd97af5eab43b5a474d866ee808683","object":"chat.completion","created":1761877995,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","content":"Alright, the user just said \"Hello!\" so I should respond in a friendly and approachable way. I want to make sure they feel welcome to continue the conversation. Maybe I can greet them back and offer my help. Something like, \"Hello! How can I assist you today?\" That should cover it nicely.\n</think>\n\nHello! How can I assist you today?","reasoning_content":null,"tool_calls":null},"logprobs":{"content":[{"token":"Alright","bytes":[65,108,114,105,103,104,116],"logprob":-0.22265625,"top_logprobs":[{"token":"Alright","bytes":[65,108,114,105,103,104,116],"logprob":-0.22265625},{"token":"Okay","bytes":[79,107,97,121],"logprob":-1.7265625},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":-3.96875}]},{"token":",","bytes":[44],"logprob":0.0,"top_logprobs":[{"token":",","bytes":[44],"logprob":0.0},{"token":"!","bytes":[33],"logprob":-9.875},{"token":" hello","bytes":[32,104,101,108,108,111],"logprob":-17.25}]},{"token":" the","bytes":[32,116,104,101],"logprob":-0.060546875,"top_logprobs":[{"token":" the","bytes":[32,116,104,101],"logprob":-0.060546875},{"token":" someone","bytes":[32,115,111,109,101,111,110,101],"logprob":-2.9375},{"token":" so","bytes":[32,115,111],"logprob":-5.0625}]},{"token":" user","bytes":[32,117,115,101,114],"logprob":0.0,"top_logprobs":[{"token":" user","bytes":[32,117,115,101,114],"logprob":0.0},{"token":" person","bytes":[32,112,101,114,115,111,110],"logprob":-13.25},{"token":"用户","bytes":[231,148,168,230,136,183],"logprob":-14.625}]},{"token":" just","bytes":[32,106,117,115,116],"logprob":-0.87109375,"top_logprobs":[{"token":" just","bytes":[32,106,117,115,116],"logprob":-0.87109375},{"token":" said","bytes":[32,115,97,105,100],"logprob":-0.99609375},{"token":" greeted","bytes":[32,103,114,101,101,116,101,100],"logprob":-2.25}]},{"token":" said","bytes":[32,115,97,105,100],"logprob":-0.007781982421875,"top_logprobs":[{"token":" said","bytes":[32,115,97,105,100],"logprob":-0.007781982421875},{"token":" sent","bytes":[32,115,101,110,116],"logprob":-4.875},{"token":" greeted","bytes":[32,103,114,101,101,116,101,100],"logprob":-5.875}]},{"token":" \"","bytes":[32,34],"logprob":-0.007781982421875,"top_logprobs":[{"token":" \"","bytes":[32,34],"logprob":-0.007781982421875},{"token":" Hello","bytes":[32,72,101,108,108,111],"logprob":-5.375},{"token":" hello","bytes":[32,104,101,108,108,111],"logprob":-6.375}]},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0,"top_logprobs":[{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0},{"token":"hello","bytes":[104,101,108,108,111],"logprob":-10.75},{"token":">Hello","bytes":[62,72,101,108,108,111],"logprob":-13.625}]},{"token":"!\"","bytes":[33,34],"logprob":-0.03076171875,"top_logprobs":[{"token":"!\"","bytes":[33,34],"logprob":-0.03076171875},{"token":"!\".","bytes":[33,34,46],"logprob":-3.78125},{"token":"!\",","bytes":[33,34,44],"logprob":-5.78125}]},{"token":" so","bytes":[32,115,111],"logprob":-2.78125,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":-0.53515625},{"token":" So","bytes":[32,83,111],"logprob":-2.28125},{"token":" I","bytes":[32,73],"logprob":-2.78125}]},{"token":" I","bytes":[32,73],"logprob":-0.023193359375,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":-0.023193359375},{"token":" that","bytes":[32,116,104,97,116],"logprob":-4.15625},{"token":" they","bytes":[32,116,104,101,121],"logprob":-5.28125}]},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.0155029296875,"top_logprobs":[{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.0155029296875},{"token":" need","bytes":[32,110,101,101,100],"logprob":-4.375},{"token":"'m","bytes":[39,109],"logprob":-5.625}]},{"token":" respond","bytes":[32,114,101,115,112,111,110,100],"logprob":-0.007781982421875,"top_logprobs":[{"token":" respond","bytes":[32,114,101,115,112,111,110,100],"logprob":-0.007781982421875},{"token":" keep","bytes":[32,107,101,101,112],"logprob":-5.5},{"token":" start","bytes":[32,115,116,97,114,116],"logprob":-6.0}]},{"token":" in","bytes":[32,105,110],"logprob":-0.06787109375,"top_logprobs":[{"token":" in","bytes":[32,105,110],"logprob":-0.06787109375},{"token":" warmly","bytes":[32,119,97,114,109,108,121],"logprob":-3.6875},{"token":" appropriately","bytes":[32,97,112,112,114,111,112,114,105,97,116,101,108,121],"logprob":-4.1875}]},{"token":" a","bytes":[32,97],"logprob":0.0,"top_logprobs":[{"token":" a","bytes":[32,97],"logprob":0.0},{"token":" kind","bytes":[32,107,105,110,100],"logprob":-7.125},{"token":" the","bytes":[32,116,104,101],"logprob":-9.0}]},{"token":" friendly","bytes":[32,102,114,105,101,110,100,108,121],"logprob":0.0,"top_logprobs":[{"token":" friendly","bytes":[32,102,114,105,101,110,100,108,121],"logprob":0.0},{"token":" warm","bytes":[32,119,97,114,109],"logprob":-7.125},{"token":" welcoming","bytes":[32,119,101,108,99,111,109,105,110,103],"logprob":-9.0}]},{"token":" and","bytes":[32,97,110,100],"logprob":-0.007781982421875,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":-0.007781982421875},{"token":" manner","bytes":[32,109,97,110,110,101,114],"logprob":-5.875},{"token":" way","bytes":[32,119,97,121],"logprob":-6.5}]},{"token":" approach","bytes":[32,97,112,112,114,111,97,99,104],"logprob":-1.7265625,"top_logprobs":[{"token":" welcoming","bytes":[32,119,101,108,99,111,109,105,110,103],"logprob":-0.22265625},{"token":" approach","bytes":[32,97,112,112,114,111,97,99,104],"logprob":-1.7265625},{"token":" open","bytes":[32,111,112,101,110],"logprob":-3.96875}]},{"token":"able","bytes":[97,98,108,101],"logprob":0.0,"top_logprobs":[{"token":"able","bytes":[97,98,108,101],"logprob":0.0},{"token":"-oriented","bytes":[45,111,114,105,101,110,116,101,100],"logprob":-15.75},{"token":"ful","bytes":[102,117,108],"logprob":-16.25}]},{"token":" way","bytes":[32,119,97,121],"logprob":-0.03076171875,"top_logprobs":[{"token":" way","bytes":[32,119,97,121],"logprob":-0.03076171875},{"token":" manner","bytes":[32,109,97,110,110,101,114],"logprob":-3.53125},{"token":" tone","bytes":[32,116,111,110,101],"logprob":-12.0}]},{"token":".","bytes":[46],"logprob":-0.12451171875,"top_logprobs":[{"token":".","bytes":[46],"logprob":-0.12451171875},{"token":".\n\n","bytes":[46,10,10],"logprob":-2.125},{"token":".\n","bytes":[46,10],"logprob":-7.75}]},{"token":" I","bytes":[32,73],"logprob":-0.73046875,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":-0.73046875},{"token":" Maybe","bytes":[32,77,97,121,98,101],"logprob":-0.73046875},{"token":" Let","bytes":[32,76,101,116],"logprob":-3.734375}]},{"token":" want","bytes":[32,119,97,110,116],"logprob":-0.023193359375,"top_logprobs":[{"token":" want","bytes":[32,119,97,110,116],"logprob":-0.023193359375},{"token":" need","bytes":[32,110,101,101,100],"logprob":-4.28125},{"token":"'ll","bytes":[39,108,108],"logprob":-4.53125}]},{"token":" to","bytes":[32,116,111],"logprob":0.0,"top_logprobs":[{"token":" to","bytes":[32,116,111],"logprob":0.0},{"token":" them","bytes":[32,116,104,101,109],"logprob":-8.5},{"token":" it","bytes":[32,105,116],"logprob":-17.25}]},{"token":" make","bytes":[32,109,97,107,101],"logprob":-0.0155029296875,"top_logprobs":[{"token":" make","bytes":[32,109,97,107,101],"logprob":-0.0155029296875},{"token":" let","bytes":[32,108,101,116],"logprob":-4.625},{"token":" keep","bytes":[32,107,101,101,112],"logprob":-6.125}]},{"token":" sure","bytes":[32,115,117,114,101],"logprob":-0.023193359375,"top_logprobs":[{"token":" sure","bytes":[32,115,117,114,101],"logprob":-0.023193359375},{"token":" them","bytes":[32,116,104,101,109],"logprob":-3.890625},{"token":" it","bytes":[32,105,116],"logprob":-12.0}]},{"token":" they","bytes":[32,116,104,101,121],"logprob":-0.0155029296875,"top_logprobs":[{"token":" they","bytes":[32,116,104,101,121],"logprob":-0.0155029296875},{"token":" I","bytes":[32,73],"logprob":-4.25},{"token":" to","bytes":[32,116,111],"logprob":-7.625}]},{"token":" feel","bytes":[32,102,101,101,108],"logprob":0.0,"top_logprobs":[{"token":" feel","bytes":[32,102,101,101,108],"logprob":0.0},{"token":"'re","bytes":[39,114,101],"logprob":-8.375},{"token":" have","bytes":[32,104,97,118,101],"logprob":-8.75}]},{"token":" welcome","bytes":[32,119,101,108,99,111,109,101],"logprob":-0.007781982421875,"top_logprobs":[{"token":" welcome","bytes":[32,119,101,108,99,111,109,101],"logprob":-0.007781982421875},{"token":" comfortable","bytes":[32,99,111,109,102,111,114,116,97,98,108,101],"logprob":-4.75},{"token":" heard","bytes":[32,104,101,97,114,100],"logprob":-10.0}]},{"token":" to","bytes":[32,116,111],"logprob":-0.486328125,"top_logprobs":[{"token":" to","bytes":[32,116,111],"logprob":-0.486328125},{"token":" and","bytes":[32,97,110,100],"logprob":-1.234375},{"token":".","bytes":[46],"logprob":-2.484375}]},{"token":" continue","bytes":[32,99,111,110,116,105,110,117,101],"logprob":-2.578125,"top_logprobs":[{"token":" ask","bytes":[32,97,115,107],"logprob":-0.08251953125},{"token":" continue","bytes":[32,99,111,110,116,105,110,117,101],"logprob":-2.578125},{"token":" share","bytes":[32,115,104,97,114,101],"logprob":-5.46875}]},{"token":" the","bytes":[32,116,104,101],"logprob":-0.42578125,"top_logprobs":[{"token":" the","bytes":[32,116,104,101],"logprob":-0.42578125},{"token":" talking","bytes":[32,116,97,108,107,105,110,103],"logprob":-1.1796875},{"token":" having","bytes":[32,104,97,118,105,110,103],"logprob":-3.796875}]},{"token":" conversation","bytes":[32,99,111,110,118,101,114,115,97,116,105,111,110],"logprob":0.0,"top_logprobs":[{"token":" conversation","bytes":[32,99,111,110,118,101,114,115,97,116,105,111,110],"logprob":0.0},{"token":" chat","bytes":[32,99,104,97,116],"logprob":-8.875},{"token":" discussion","bytes":[32,100,105,115,99,117,115,115,105,111,110],"logprob":-9.875}]},{"token":".","bytes":[46],"logprob":-0.08984375,"top_logprobs":[{"token":".","bytes":[46],"logprob":-0.08984375},{"token":".\n\n","bytes":[46,10,10],"logprob":-2.59375},{"token":".\n","bytes":[46,10],"logprob":-5.21875}]},{"token":" Maybe","bytes":[32,77,97,121,98,101],"logprob":-0.0155029296875,"top_logprobs":[{"token":" Maybe","bytes":[32,77,97,121,98,101],"logprob":-0.0155029296875},{"token":" I","bytes":[32,73],"logprob":-4.25},{"token":" Let","bytes":[32,76,101,116],"logprob":-5.5}]},{"token":" I","bytes":[32,73],"logprob":-0.0458984375,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":-0.0458984375},{"token":" something","bytes":[32,115,111,109,101,116,104,105,110,103],"logprob":-4.03125},{"token":" say","bytes":[32,115,97,121],"logprob":-4.40625}]},{"token":" can","bytes":[32,99,97,110],"logprob":-0.2041015625,"top_logprobs":[{"token":" can","bytes":[32,99,97,110],"logprob":-0.2041015625},{"token":"'ll","bytes":[39,108,108],"logprob":-1.703125},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-6.84375}]},{"token":" greet","bytes":[32,103,114,101,101,116],"logprob":-0.81640625,"top_logprobs":[{"token":" greet","bytes":[32,103,114,101,101,116],"logprob":-0.81640625},{"token":" ask","bytes":[32,97,115,107],"logprob":-1.0703125},{"token":" offer","bytes":[32,111,102,102,101,114],"logprob":-2.0625}]},{"token":" them","bytes":[32,116,104,101,109],"logprob":0.0,"top_logprobs":[{"token":" them","bytes":[32,116,104,101,109],"logprob":0.0},{"token":" back","bytes":[32,98,97,99,107],"logprob":-17.25},{"token":" him","bytes":[32,104,105,109],"logprob":-19.0}]},{"token":" back","bytes":[32,98,97,99,107],"logprob":0.0,"top_logprobs":[{"token":" back","bytes":[32,98,97,99,107],"logprob":0.0},{"token":" and","bytes":[32,97,110,100],"logprob":-6.5},{"token":" again","bytes":[32,97,103,97,105,110],"logprob":-7.25}]},{"token":" and","bytes":[32,97,110,100],"logprob":0.0,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":0.0},{"token":" in","bytes":[32,105,110],"logprob":-7.0},{"token":" with","bytes":[32,119,105,116,104],"logprob":-8.25}]},{"token":" offer","bytes":[32,111,102,102,101,114],"logprob":-0.44140625,"top_logprobs":[{"token":" offer","bytes":[32,111,102,102,101,114],"logprob":-0.44140625},{"token":" ask","bytes":[32,97,115,107],"logprob":-1.1875},{"token":" let","bytes":[32,108,101,116],"logprob":-2.9375}]},{"token":" my","bytes":[32,109,121],"logprob":-1.1796875,"top_logprobs":[{"token":" assistance","bytes":[32,97,115,115,105,115,116,97,110,99,101],"logprob":-0.9296875},{"token":" my","bytes":[32,109,121],"logprob":-1.1796875},{"token":" help","bytes":[32,104,101,108,112],"logprob":-1.5546875}]},{"token":" help","bytes":[32,104,101,108,112],"logprob":-0.22265625,"top_logprobs":[{"token":" help","bytes":[32,104,101,108,112],"logprob":-0.22265625},{"token":" assistance","bytes":[32,97,115,115,105,115,116,97,110,99,101],"logprob":-1.6015625},{"token":" services","bytes":[32,115,101,114,118,105,99,101,115],"logprob":-11.25}]},{"token":".","bytes":[46],"logprob":-0.03076171875,"top_logprobs":[{"token":".","bytes":[46],"logprob":-0.03076171875},{"token":" with","bytes":[32,119,105,116,104],"logprob":-4.15625},{"token":".\n","bytes":[46,10],"logprob":-5.15625}]},{"token":" Something","bytes":[32,83,111,109,101,116,104,105,110,103],"logprob":-1.28125,"top_logprobs":[{"token":" Let","bytes":[32,76,101,116],"logprob":-1.15625},{"token":" Keeping","bytes":[32,75,101,101,112,105,110,103],"logprob":-1.15625},{"token":" Something","bytes":[32,83,111,109,101,116,104,105,110,103],"logprob":-1.28125}]},{"token":" like","bytes":[32,108,105,107,101],"logprob":-0.007781982421875,"top_logprobs":[{"token":" like","bytes":[32,108,105,107,101],"logprob":-0.007781982421875},{"token":" simple","bytes":[32,115,105,109,112,108,101],"logprob":-5.5},{"token":" along","bytes":[32,97,108,111,110,103],"logprob":-7.5}]},{"token":",","bytes":[44],"logprob":-0.038330078125,"top_logprobs":[{"token":",","bytes":[44],"logprob":-0.038330078125},{"token":" that","bytes":[32,116,104,97,116],"logprob":-3.53125},{"token":" \"","bytes":[32,34],"logprob":-6.03125}]},{"token":" \"","bytes":[32,34],"logprob":0.0,"top_logprobs":[{"token":" \"","bytes":[32,34],"logprob":0.0},{"token":" “","bytes":[32,226,128,156],"logprob":-14.25},{"token":" hey","bytes":[32,104,101,121],"logprob":-14.5}]},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0,"top_logprobs":[{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0},{"token":"Hi","bytes":[72,105],"logprob":-9.0},{"token":"Hey","bytes":[72,101,121],"logprob":-9.0}]},{"token":"!","bytes":[33],"logprob":0.0,"top_logprobs":[{"token":"!","bytes":[33],"logprob":0.0},{"token":" again","bytes":[32,97,103,97,105,110],"logprob":-8.875},{"token":",","bytes":[44],"logprob":-9.75}]},{"token":" How","bytes":[32,72,111,119],"logprob":0.0,"top_logprobs":[{"token":" How","bytes":[32,72,111,119],"logprob":0.0},{"token":" I","bytes":[32,73],"logprob":-9.875},{"token":" Welcome","bytes":[32,87,101,108,99,111,109,101],"logprob":-10.125}]},{"token":" can","bytes":[32,99,97,110],"logprob":0.0,"top_logprobs":[{"token":" can","bytes":[32,99,97,110],"logprob":0.0},{"token":" could","bytes":[32,99,111,117,108,100],"logprob":-17.25},{"token":"'s","bytes":[39,115],"logprob":-17.625}]},{"token":" I","bytes":[32,73],"logprob":0.0,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":0.0},{"token":"I","bytes":[73],"logprob":-18.5},{"token":" me","bytes":[32,109,101],"logprob":-20.375}]},{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0,"top_logprobs":[{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0},{"token":" help","bytes":[32,104,101,108,112],"logprob":-10.25},{"token":" effectively","bytes":[32,101,102,102,101,99,116,105,118,101,108,121],"logprob":-14.125}]},{"token":" you","bytes":[32,121,111,117],"logprob":0.0,"top_logprobs":[{"token":" you","bytes":[32,121,111,117],"logprob":0.0},{"token":"你","bytes":[228,189,160],"logprob":-16.625},{"token":" You","bytes":[32,89,111,117],"logprob":-17.875}]},{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0,"top_logprobs":[{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0},{"token":"?\"","bytes":[63,34],"logprob":-14.5},{"token":" tonight","bytes":[32,116,111,110,105,103,104,116],"logprob":-15.75}]},{"token":"?\"","bytes":[63,34],"logprob":-0.0155029296875,"top_logprobs":[{"token":"?\"","bytes":[63,34],"logprob":-0.0155029296875},{"token":"?\"\n\n","bytes":[63,34,10,10],"logprob":-4.75},{"token":"?","bytes":[63],"logprob":-5.75}]},{"token":" That","bytes":[32,84,104,97,116],"logprob":-0.1982421875,"top_logprobs":[{"token":" That","bytes":[32,84,104,97,116],"logprob":-0.1982421875},{"token":" that","bytes":[32,116,104,97,116],"logprob":-2.328125},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-3.078125}]},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.023193359375,"top_logprobs":[{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.023193359375},{"token":" sounds","bytes":[32,115,111,117,110,100,115],"logprob":-4.03125},{"token":" seems","bytes":[32,115,101,101,109,115],"logprob":-5.28125}]},{"token":" cover","bytes":[32,99,111,118,101,114],"logprob":-0.294921875,"top_logprobs":[{"token":" cover","bytes":[32,99,111,118,101,114],"logprob":-0.294921875},{"token":" do","bytes":[32,100,111],"logprob":-1.921875},{"token":" keep","bytes":[32,107,101,101,112],"logprob":-2.671875}]},{"token":" it","bytes":[32,105,116],"logprob":0.0,"top_logprobs":[{"token":" it","bytes":[32,105,116],"logprob":0.0},{"token":" the","bytes":[32,116,104,101],"logprob":-8.5},{"token":" both","bytes":[32,98,111,116,104],"logprob":-10.125}]},{"token":" nicely","bytes":[32,110,105,99,101,108,121],"logprob":-5.25,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":-0.2353515625},{"token":".\n","bytes":[46,10],"logprob":-1.734375},{"token":" without","bytes":[32,119,105,116,104,111,117,116],"logprob":-4.125}]},{"token":".\n","bytes":[46,10],"logprob":-0.05322265625,"top_logprobs":[{"token":".\n","bytes":[46,10],"logprob":-0.05322265625},{"token":" and","bytes":[32,97,110,100],"logprob":-3.296875},{"token":".","bytes":[46],"logprob":-4.5625}]},{"token":"</think>","bytes":[60,47,116,104,105,110,107,62],"logprob":0.0,"top_logprobs":[{"token":"</think>","bytes":[60,47,116,104,105,110,107,62],"logprob":0.0},{"token":"</","bytes":[60,47],"logprob":-21.125},{"token":"******\n","bytes":[42,42,42,42,42,42,10],"logprob":-26.375}]},{"token":"\n\n","bytes":[10,10],"logprob":0.0,"top_logprobs":[{"token":"\n\n","bytes":[10,10],"logprob":0.0},{"token":"<｜end▁of▁sentence｜>","bytes":[60,239,189,156,101,110,100,226,150,129,111,102,226,150,129,115,101,110,116,101,110,99,101,239,189,156,62],"logprob":-18.5},{"token":" \n\n","bytes":[32,10,10],"logprob":-24.375}]},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0,"top_logprobs":[{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0},{"token":"Hi","bytes":[72,105],"logprob":-11.625},{"token":" hello","bytes":[32,104,101,108,108,111],"logprob":-14.25}]},{"token":"!","bytes":[33],"logprob":0.0,"top_logprobs":[{"token":"!","bytes":[33],"logprob":0.0},{"token":"！","bytes":[239,188,129],"logprob":-15.375},{"token":"!\n\n","bytes":[33,10,10],"logprob":-16.25}]},{"token":" How","bytes":[32,72,111,119],"logprob":0.0,"top_logprobs":[{"token":" How","bytes":[32,72,111,119],"logprob":0.0},{"token":" Welcome","bytes":[32,87,101,108,99,111,109,101],"logprob":-10.0},{"token":" �","bytes":[32,239,191,189],"logprob":-10.875}]},{"token":" can","bytes":[32,99,97,110],"logprob":0.0,"top_logprobs":[{"token":" can","bytes":[32,99,97,110],"logprob":0.0},{"token":" Can","bytes":[32,67,97,110],"logprob":-17.625},{"token":" are","bytes":[32,97,114,101],"logprob":-19.75}]},{"token":" I","bytes":[32,73],"logprob":0.0,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":0.0},{"token":" we","bytes":[32,119,101],"logprob":-19.375},{"token":" me","bytes":[32,109,101],"logprob":-19.75}]},{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0,"top_logprobs":[{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0},{"token":" help","bytes":[32,104,101,108,112],"logprob":-10.625},{"token":" assistance","bytes":[32,97,115,115,105,115,116,97,110,99,101],"logprob":-14.0}]},{"token":" you","bytes":[32,121,111,117],"logprob":0.0,"top_logprobs":[{"token":" you","bytes":[32,121,111,117],"logprob":0.0},{"token":"你","bytes":[228,189,160],"logprob":-19.0},{"token":" You","bytes":[32,89,111,117],"logprob":-21.25}]},{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0,"top_logprobs":[{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0},{"token":"today","bytes":[116,111,100,97,121],"logprob":-19.625},{"token":"今天","bytes":[228,187,138,229,164,169],"logprob":-20.625}]},{"token":"?","bytes":[63],"logprob":0.0,"top_logprobs":[{"token":"?","bytes":[63],"logprob":0.0},{"token":"?\n\n","bytes":[63,10,10],"logprob":-17.25},{"token":"?\"","bytes":[63,34],"logprob":-19.0}]},{"token":"<｜end▁of▁sentence｜>","bytes":[60,239,189,156,101,110,100,226,150,129,111,102,226,150,129,115,101,110,116,101,110,99,101,239,189,156,62],"logprob":0.0,"top_logprobs":[{"token":"<｜end▁of▁sentence｜>","bytes":[60,239,189,156,101,110,100,226,150,129,111,102,226,150,129,115,101,110,116,101,110,99,101,239,189,156,62],"logprob":0.0},{"token":" �","bytes":[32,239,191,189],"logprob":-9.5},{"token":" If","bytes":[32,73,102],"logprob":-10.0}]}]},"finish_reason":"stop","matched_stop":151643,"hidden_states":null}],"usage":{"prompt_tokens":7,"total_tokens":83,"completion_tokens":76,"prompt_tokens_details":null}}

engine mode

code

from sgl_jax.srt.entrypoints.engine import Engine
if __name__ == '__main__':
    engine = Engine(model_path = 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', trust_remote_code = True, dist_init_addr = '0.0.0.0:10011', nnodes = 1 , tp_size = 1, device = 'tpu' ,random_seed = 3, node_rank = 0, mem_fraction_static = 0.4, chunked_prefill_size = 8192, download_dir = '/tmp', dtype = 'bfloat16', precompile_bs_paddings = [64], max_running_requests = 64, skip_server_warmup = True, attention_backend = 'fa',precompile_token_paddings = [8192], page_size = 64 ,log_requests = True, log_requests_level = 3)
    output = engine.generate(prompt = ["please introduce yourself<think>"], sampling_params = {"n":1, "temperature": 0.7}, return_logprob=True, top_logprobs_num=2, logprob_start_len=1, token_ids_logprob=[10])
    print(len(list(output)), output)

output

{'text': "\nI'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. I'll do my best to help you.\n</think>\n\nI'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. I'll do my best to help you.", 'output_ids': [198, 40, 2776, 18183, 39350, 10911, 16, 11, 458, 15235, 17847, 3465, 23242, 553, 279, 8453, 8188, 18183, 39350, 13, 358, 3278, 653, 847, 1850, 311, 1492, 498, 624, 151649, 271, 40, 2776, 18183, 39350, 10911, 16, 11, 458, 15235, 17847, 3465, 23242, 553, 279, 8453, 8188, 18183, 39350, 13, 358, 3278, 653, 847, 1850, 311, 1492, 498, 13], 'meta_info': {'id': '73013ffafdfe48f6817a13d608f25265', 'finish_reason': {'type': 'stop', 'matched': 151643}, 'prompt_tokens': 5, 'input_token_logprobs': [(None, 30021, 'please'), (-8.5625, 19131, ' introduce'), (-4.84375, 6133, ' yourself'), (-14.5, 151648, '<think>')], 'output_token_logprobs': [(-0.017578125, 198, '\n'), (-0.248046875, 40, 'I'), (-4.5299530029296875e-05, 2776, "'m"), (0.00032806396484375, 18183, ' Deep'), (0.0, 39350, 'Seek'), (0.0, 10911, '-R'), (-4.5299530029296875e-05, 16, '1'), (0.00032806396484375, 11, ','), (0.0033416748046875, 458, ' an'), (-0.00799560546875, 15235, ' AI'), (0.0033416748046875, 17847, ' assistant'), (0.0, 3465, ' created'), (-0.01202392578125, 23242, ' exclusively'), (0.00347900390625, 553, ' by'), (0.00186920166015625, 279, ' the'), (-0.0026092529296875, 8453, ' Chinese'), (7.581710815429688e-05, 8188, ' Company'), (7.581710815429688e-05, 18183, ' Deep'), (7.581710815429688e-05, 39350, 'Seek'), (7.581710815429688e-05, 13, '.'), (-0.00531005859375, 358, ' I'), (-0.2158203125, 3278, "'ll"), (7.581710815429688e-05, 653, ' do'), (0.0, 847, ' my'), (-0.0026092529296875, 1850, ' best'), (7.581710815429688e-05, 311, ' to'), (7.581710815429688e-05, 1492, ' help'), (0.00347900390625, 498, ' you'), (0.0, 624, '.\n'), (0.0033416748046875, 151649, '</think>'), (0.0014801025390625, 271, '\n\n'), (0.0033416748046875, 40, 'I'), (-0.0026092529296875, 2776, "'m"), (0.0, 18183, ' Deep'), (0.0, 39350, 'Seek'), (0.0033416748046875, 10911, '-R'), (0.0033416748046875, 16, '1'), (0.0033416748046875, 11, ','), (-4.5299530029296875e-05, 458, ' an'), (-0.0026092529296875, 15235, ' AI'), (0.00347900390625, 17847, ' assistant'), (7.581710815429688e-05, 3465, ' created'), (-0.0026092529296875, 23242, ' exclusively'), (0.0033416748046875, 553, ' by'), (0.0014801025390625, 279, ' the'), (0.0, 8453, ' Chinese'), (7.581710815429688e-05, 8188, ' Company'), (-0.0026092529296875, 18183, ' Deep'), (0.00347900390625, 39350, 'Seek'), (-0.0026092529296875, 13, '.'), (-4.5299530029296875e-05, 358, ' I'), (0.00347900390625, 3278, "'ll"), (0.00347900390625, 653, ' do'), (0.0, 847, ' my'), (-4.5299530029296875e-05, 1850, ' best'), (0.00347900390625, 311, ' to'), (0.0014801025390625, 1492, ' help'), (0.0, 498, ' you'), (0.00347900390625, 13, '.'), (-4.5299530029296875e-05, 151643, '<｜end▁of▁sentence｜>')], 'input_top_logprobs': [None, [(-1.0703125, 10339, ' explain'), (-1.3203125, 1492, ' help')], [(-0.6484375, 279, ' the'), (-2.28125, 264, ' a')], [(-0.70703125, 438, ' as'), (-1.578125, 323, ' and')]], 'output_top_logprobs': [[(-0.017578125, 198, '\n'), (-4.46875, 151648, '<think>')], [(-0.248046875, 40, 'I'), (-2.390625, 91786, 'Greetings')], [(-4.5299530029296875e-05, 2776, "'m"), (-6.78125, 4249, '’m')], [(0.00032806396484375, 18183, ' Deep'), (-8.5, 458, ' an')], [(0.0, 39350, 'Seek'), (-16.375, 585, 'ak')], [(0.0, 10911, '-R'), (-11.625, 431, ' R')], [(-4.5299530029296875e-05, 16, '1'), (-14.5625, 17, '2')], [(0.00032806396484375, 11, ','), (-6.0625, 3465, ' created')], [(0.0033416748046875, 458, ' an'), (-9.4375, 264, ' a')], [(-0.00799560546875, 15235, ' AI'), (-4.65625, 20443, ' artificial')], [(0.0033416748046875, 17847, ' assistant'), (-9.8125, 21388, ' Assistant')], [(0.0, 3465, ' created'), (-6.25, 28135, ' independently')], [(-0.01202392578125, 23242, ' exclusively'), (-4.46875, 553, ' by')], [(0.00347900390625, 553, ' by'), (-23.0, 1694, 'by')], [(0.00186920166015625, 279, ' the'), (-14.375, 18183, ' Deep')], [(-0.0026092529296875, 8453, ' Chinese'), (-13.125, 15819, ' Meta')], [(7.581710815429688e-05, 8188, ' Company'), (-6.59375, 2813, ' company')], [(7.581710815429688e-05, 18183, ' Deep'), (-11.875, 33464, 'Deep')], [(7.581710815429688e-05, 39350, 'Seek'), (-14.625, 11056, 'Speed')], [(7.581710815429688e-05, 13, '.'), (-13.3125, 1212, ' under')], [(-0.00531005859375, 358, ' I'), (-6.25, 31733, ' Feel')], [(-0.2158203125, 3278, "'ll"), (-2.1875, 2776, "'m")], [(7.581710815429688e-05, 653, ' do'), (-12.3125, 3730, ' doing')], [(0.0, 847, ' my'), (-14.4375, 97611, '我的')], [(-0.0026092529296875, 1850, ' best'), (-10.1875, 14470, 'Best')], [(7.581710815429688e-05, 311, ' to'), (-20.625, 369, ' for')], [(7.581710815429688e-05, 1492, ' help'), (-6.78125, 7789, ' assist')], [(0.00347900390625, 498, ' you'), (-11.0625, 56568, '你')], [(0.0, 624, '.\n'), (-12.875, 382, '.\n\n')], [(0.0033416748046875, 151649, '</think>'), (-18.875, 522, '</')], [(0.0014801025390625, 271, '\n\n'), (-20.375, 151643, '<｜end▁of▁sentence｜>')], [(0.0033416748046875, 40, 'I'), (-10.6875, 33464, 'Deep')], [(-0.0026092529296875, 2776, "'m"), (-15.875, 4249, '’m')], [(0.0, 18183, ' Deep'), (-11.875, 33464, 'Deep')], [(0.0, 39350, 'Seek'), (-13.0625, 2859, 'Query')], [(0.0033416748046875, 10911, '-R'), (-24.875, 2568, '_R')], [(0.0033416748046875, 16, '1'), (-23.25, 17, '2')], [(0.0033416748046875, 11, ','), (-13.375, 3837, '，')], [(-4.5299530029296875e-05, 458, ' an'), (-18.875, 697, ' your')], [(-0.0026092529296875, 15235, ' AI'), (-16.75, 15469, 'AI')], [(0.00347900390625, 17847, ' assistant'), (-13.5625, 21388, ' Assistant')], [(7.581710815429688e-05, 3465, ' created'), (-18.25, 4290, ' Created')], [(-0.0026092529296875, 23242, ' exclusively'), (-15.875, 11689, ' specifically')], [(0.0033416748046875, 553, ' by'), (-26.375, 1694, 'by')], [(0.0014801025390625, 279, ' the'), (-19.125, 18183, ' Deep')], [(0.0, 8453, ' Chinese'), (-13.5625, 44923, 'Chinese')], [(7.581710815429688e-05, 8188, ' Company'), (-14.125, 14491, 'Company')], [(-0.0026092529296875, 18183, ' Deep'), (-12.6875, 33464, 'Deep')], [(0.00347900390625, 39350, 'Seek'), (-14.4375, 10887, ' seeking')], [(-0.0026092529296875, 13, '.'), (-14.125, 1773, '。')], [(-4.5299530029296875e-05, 358, ' I'), (-15.5625, 151643, '<｜end▁of▁sentence｜>')], [(0.00347900390625, 3278, "'ll"), (-11.25, 4700, '’ll')], [(0.00347900390625, 653, ' do'), (-19.25, 3730, ' doing')], [(0.0, 847, ' my'), (-20.125, 2408, 'my')], [(-4.5299530029296875e-05, 1850, ' best'), (-17.875, 15862, 'best')], [(0.00347900390625, 311, ' to'), (-21.375, 1492, ' help')], [(0.0014801025390625, 1492, ' help'), (-14.8125, 7789, ' assist')], [(0.0, 498, ' you'), (-14.125, 56568, '你')], [(0.00347900390625, 13, '.'), (-18.5, 382, '.\n\n')], [(-4.5299530029296875e-05, 151643, '<｜end▁of▁sentence｜>'), (-18.0, 31733, ' Feel')]], 'input_token_ids_logprobs': [None, [(-19.125, 10, '+')], [(-15.25, 10, '+')], [(-16.375, 10, '+')]], 'output_token_ids_logprobs': [[(-36.75, 10, '+')], [(-29.625, 10, '+')], [(-30.375, 10, '+')], [(-29.125, 10, '+')], [(-35.5, 10, '+')], [(-33.25, 10, '+')], [(-29.0, 10, '+')], [(-23.25, 10, '+')], [(-42.25, 10, '+')], [(-34.5, 10, '+')], [(-32.5, 10, '+')], [(-34.5, 10, '+')], [(-29.875, 10, '+')], [(-48.5, 10, '+')], [(-36.75, 10, '+')], [(-34.0, 10, '+')], [(-28.5, 10, '+')], [(-33.25, 10, '+')], [(-33.5, 10, '+')], [(-34.25, 10, '+')], [(-36.5, 10, '+')], [(-29.875, 10, '+')], [(-41.25, 10, '+')], [(-53.0, 10, '+')], [(-40.75, 10, '+')], [(-46.25, 10, '+')], [(-45.25, 10, '+')], [(-39.5, 10, '+')], [(-34.25, 10, '+')], [(-36.25, 10, '+')], [(-50.0, 10, '+')], [(-34.0, 10, '+')], [(-42.5, 10, '+')], [(-39.75, 10, '+')], [(-35.0, 10, '+')], [(-40.0, 10, '+')], [(-46.5, 10, '+')], [(-37.0, 10, '+')], [(-50.25, 10, '+')], [(-40.25, 10, '+')], [(-35.25, 10, '+')], [(-40.5, 10, '+')], [(-39.0, 10, '+')], [(-47.25, 10, '+')], [(-45.25, 10, '+')], [(-39.5, 10, '+')], [(-41.75, 10, '+')], [(-43.25, 10, '+')], [(-38.25, 10, '+')], [(-39.25, 10, '+')], [(-51.25, 10, '+')], [(-44.25, 10, '+')], [(-47.0, 10, '+')], [(-53.0, 10, '+')], [(-43.0, 10, '+')], [(-44.0, 10, '+')], [(-48.5, 10, '+')], [(-46.5, 10, '+')], [(-36.75, 10, '+')], [(-43.5, 10, '+')]], 'completion_tokens': 60, 'cached_tokens': 0, 'cache_miss_count': 0, 'e2e_latency': 61.070454597473145}}]

performance & accuracy

environment

v6e-4

performace & accuracy

launch server
JAX_COMPILATION_CACHE_DIR=/tmp/jit_cache python3 -u -m sgl_jax.launch_server --model-path Qwen/Qwen3-8B --trust-remote-code --dist-init-addr=0.0.0.0:10011 --nnodes=1 --tp-size=1 --device=tpu --random-seed=3 --node-rank=0 --mem-fraction-static=0.8 --chunked-prefill-size=8192 --download-dir=/tmp --dtype=bfloat16 --precompile-bs-paddings 1 64 --max-running-requests 64 --max-total-tokens 257536 --skip-server-warmup --attention-backend=fa --precompile-token-paddings 8192 --page-size=64

performace

command(input=1k,output=1k,batch_size=8)
python3 -m sgl_jax.bench_serving --backend sgl-jax --dataset-name random --num-prompts 24 --random-input 1024 --random-output 1024 --random-range-ratio 1 --max-concurrency 8 --warmup-requests 0
result (current branch vs main)

accuracy

command(evalscope==0.17.1)
evalscope eval --model Qwen/Qwen3-8B --api-url http://127.0.0.1:30000/v1 --api-key EMPTY --eval-type server --datasets gsm8k --eval-batch-size 64
result( current branch vs main)

gemini-code-assist · 2025-11-04T05:32:55Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

python/sgl_jax/srt/layers/logits_processor.py

python/sgl_jax/srt/managers/tp_worker.py

JamesBrianD · 2025-11-04T07:00:47Z

Benchmark information is required. Thanks a lot!

python/sgl_jax/srt/layers/sampler.py

aolemila · 2025-11-04T07:18:50Z

The following content are necessary for this PR. @pathfinder-pf

Cache Miss

Please add tests in test_features.py. You can follow def test_cache_miss_prefill(self) and def test_cache_miss_decode(self) to add separate parameters to test whether return_logprob would result in cache_miss or not.

Note: Please set logprob_start_len to -1 or 1, top_logprobs_num to be greater than 1, token_ids_logprob to be not None.

Feature

Please add the following tests in CI to ensure the feature works.

Note: Please test the following cases when batch_size = 1 and batch_size > 1.

case1: return_logprobs = True && logprob_start_len = -1, ensure only return logprobs for output_ids.
case2: return_logprobs = True && logprob_start_len = 1 && top_logprobs_num = 3 && token_ids_logprob = [xxx, xxx, xxx], please ensure the output is expected.

Accuracy

How to ensure the return logprobs are expected? It needs a discussion.
We can refer to https://github.com/sgl-project/sglang/pull/10230/files#diff-74d422b689aa0e95e1425839e9d1f6dede379dd885199d718a4d16eff21ff714.

Profile and Benchmark

Profile: Please ensure the gap between two forwards on TPU does expand when return_logprob is False.
Benchmark: Please compare with TPU benchmark results for the release blog (2025-10-29-sglang-jax) #297 using Instructions for Qwen3-32B on TPU v6e-4 #270 and ensure the benchmark does not become worse.

Baseline

Please add baselines for three scenarios in blog.
Note: need discussions.

test/srt/test_logprobs.py

Co-authored-by: pathfinder-fp <slackexplorer@gmail.com>

pathfinder-pf force-pushed the feat/return_logprobs branch 2 times, most recently from b0f1fc4 to 99f1a85 Compare November 4, 2025 05:56

aolemila reviewed Nov 4, 2025

View reviewed changes

python/sgl_jax/srt/layers/logits_processor.py Show resolved Hide resolved

python/sgl_jax/srt/managers/tp_worker.py Show resolved Hide resolved

JamesBrianD reviewed Nov 4, 2025

View reviewed changes

python/sgl_jax/srt/layers/sampler.py Show resolved Hide resolved

aolemila mentioned this pull request Nov 7, 2025

Development Roadmap (2025 Q4) #190

Open

pathfinder-pf force-pushed the feat/return_logprobs branch 8 times, most recently from 1991b5a to 6e256a2 Compare November 13, 2025 07:36

pathfinder-pf force-pushed the feat/return_logprobs branch 6 times, most recently from cf1042b to c76c7b7 Compare November 18, 2025 06:35

return logprobs

2aa824b

pathfinder-pf force-pushed the feat/return_logprobs branch from c76c7b7 to 2aa824b Compare November 18, 2025 08:25

aolemila reviewed Nov 18, 2025

View reviewed changes

test/srt/test_logprobs.py Show resolved Hide resolved

aolemila reviewed Nov 18, 2025

View reviewed changes

test/srt/test_logprobs.py Show resolved Hide resolved

test/srt/test_logprobs.py Show resolved Hide resolved

test/srt/test_logprobs.py Show resolved Hide resolved

test/srt/test_logprobs.py Show resolved Hide resolved

aolemila approved these changes Nov 18, 2025

View reviewed changes

pathfinder-pf merged commit eced7f4 into sgl-project:main Nov 18, 2025
16 checks passed

pathfinder-pf mentioned this pull request Nov 19, 2025

[Bug] when the parameter return_logprobs is set to true, the server does not work #237

Closed

5 tasks

lzlfwow pushed a commit that referenced this pull request Dec 4, 2025

return logprobs (#310)

4c2a5a1

Co-authored-by: pathfinder-fp <slackexplorer@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add return logprobs#310

add return logprobs#310
pathfinder-pf merged 1 commit intosgl-project:mainfrom
primatrix:feat/return_logprobs

pathfinder-pf commented Nov 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

JamesBrianD commented Nov 4, 2025

Uh oh!

Uh oh!

aolemila commented Nov 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

pathfinder-pf commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

func test

environment

test

performance & accuracy

environment

performace & accuracy

performace

accuracy

Uh oh!

gemini-code-assist bot commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

JamesBrianD commented Nov 4, 2025

Uh oh!

Uh oh!

aolemila commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Cache Miss

Feature

Accuracy

Profile and Benchmark

Baseline

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

pathfinder-pf commented Nov 4, 2025 •

edited

Loading

aolemila commented Nov 4, 2025 •

edited

Loading