Skip to content

add return logprobs#310

Merged
pathfinder-pf merged 1 commit intosgl-project:mainfrom
primatrix:feat/return_logprobs
Nov 18, 2025
Merged

add return logprobs#310
pathfinder-pf merged 1 commit intosgl-project:mainfrom
primatrix:feat/return_logprobs

Conversation

@pathfinder-pf
Copy link
Collaborator

@pathfinder-pf pathfinder-pf commented Nov 4, 2025

This pr includes return input and outputs logprobs with float32 when set return_logprobs=true, note: performance doesn't optimized because of many condition judge in logits_process, there will occurs cache miss case.

func test

environment

v6e-1

test

server mode

  • server command
    JAX_COMPILATION_CACHE_DIR=/tmp/jit_cache python3 -u -m sgl_jax.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --dist-init-addr=0.0.0.0:10011 --nnodes=1 --tp-size=1 --device=tpu --random-seed=27 --node-rank=0 --mem-fraction-static=0.8 --chunked-prefill-size=8192 --download-dir=/tmp --dtype=bfloat16 --precompile-bs-paddings 1 64 --max-running-requests 64 --max-total-tokens 257536 --skip-server-warmup --attention-backend=fa --precompile-token-paddings 8192 --page-size=64 --disable-overlap-schedule
  • eval & result
    curl localhost:30000/v1/chat/completions
    -H "Content-Type: application/json"
    -d '{
    "model": "qwen",
    "messages": [
    {
    "role": "user",
    "content": "Hello!"
    }
    ],
    "temperature": 0.7,
    "logprobs":true,
    "top_logprobs": 1
    }'
{"id":"94dd97af5eab43b5a474d866ee808683","object":"chat.completion","created":1761877995,"model":"qwen","choices":[{"index":0,"message":{"role":"assistant","content":"Alright, the user just said \"Hello!\" so I should respond in a friendly and approachable way. I want to make sure they feel welcome to continue the conversation. Maybe I can greet them back and offer my help. Something like, \"Hello! How can I assist you today?\" That should cover it nicely.\n</think>\n\nHello! How can I assist you today?","reasoning_content":null,"tool_calls":null},"logprobs":{"content":[{"token":"Alright","bytes":[65,108,114,105,103,104,116],"logprob":-0.22265625,"top_logprobs":[{"token":"Alright","bytes":[65,108,114,105,103,104,116],"logprob":-0.22265625},{"token":"Okay","bytes":[79,107,97,121],"logprob":-1.7265625},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":-3.96875}]},{"token":",","bytes":[44],"logprob":0.0,"top_logprobs":[{"token":",","bytes":[44],"logprob":0.0},{"token":"!","bytes":[33],"logprob":-9.875},{"token":" hello","bytes":[32,104,101,108,108,111],"logprob":-17.25}]},{"token":" the","bytes":[32,116,104,101],"logprob":-0.060546875,"top_logprobs":[{"token":" the","bytes":[32,116,104,101],"logprob":-0.060546875},{"token":" someone","bytes":[32,115,111,109,101,111,110,101],"logprob":-2.9375},{"token":" so","bytes":[32,115,111],"logprob":-5.0625}]},{"token":" user","bytes":[32,117,115,101,114],"logprob":0.0,"top_logprobs":[{"token":" user","bytes":[32,117,115,101,114],"logprob":0.0},{"token":" person","bytes":[32,112,101,114,115,111,110],"logprob":-13.25},{"token":"用户","bytes":[231,148,168,230,136,183],"logprob":-14.625}]},{"token":" just","bytes":[32,106,117,115,116],"logprob":-0.87109375,"top_logprobs":[{"token":" just","bytes":[32,106,117,115,116],"logprob":-0.87109375},{"token":" said","bytes":[32,115,97,105,100],"logprob":-0.99609375},{"token":" greeted","bytes":[32,103,114,101,101,116,101,100],"logprob":-2.25}]},{"token":" said","bytes":[32,115,97,105,100],"logprob":-0.007781982421875,"top_logprobs":[{"token":" said","bytes":[32,115,97,105,100],"logprob":-0.007781982421875},{"token":" sent","bytes":[32,115,101,110,116],"logprob":-4.875},{"token":" greeted","bytes":[32,103,114,101,101,116,101,100],"logprob":-5.875}]},{"token":" \"","bytes":[32,34],"logprob":-0.007781982421875,"top_logprobs":[{"token":" \"","bytes":[32,34],"logprob":-0.007781982421875},{"token":" Hello","bytes":[32,72,101,108,108,111],"logprob":-5.375},{"token":" hello","bytes":[32,104,101,108,108,111],"logprob":-6.375}]},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0,"top_logprobs":[{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0},{"token":"hello","bytes":[104,101,108,108,111],"logprob":-10.75},{"token":">Hello","bytes":[62,72,101,108,108,111],"logprob":-13.625}]},{"token":"!\"","bytes":[33,34],"logprob":-0.03076171875,"top_logprobs":[{"token":"!\"","bytes":[33,34],"logprob":-0.03076171875},{"token":"!\".","bytes":[33,34,46],"logprob":-3.78125},{"token":"!\",","bytes":[33,34,44],"logprob":-5.78125}]},{"token":" so","bytes":[32,115,111],"logprob":-2.78125,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":-0.53515625},{"token":" So","bytes":[32,83,111],"logprob":-2.28125},{"token":" I","bytes":[32,73],"logprob":-2.78125}]},{"token":" I","bytes":[32,73],"logprob":-0.023193359375,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":-0.023193359375},{"token":" that","bytes":[32,116,104,97,116],"logprob":-4.15625},{"token":" they","bytes":[32,116,104,101,121],"logprob":-5.28125}]},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.0155029296875,"top_logprobs":[{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.0155029296875},{"token":" need","bytes":[32,110,101,101,100],"logprob":-4.375},{"token":"'m","bytes":[39,109],"logprob":-5.625}]},{"token":" respond","bytes":[32,114,101,115,112,111,110,100],"logprob":-0.007781982421875,"top_logprobs":[{"token":" respond","bytes":[32,114,101,115,112,111,110,100],"logprob":-0.007781982421875},{"token":" keep","bytes":[32,107,101,101,112],"logprob":-5.5},{"token":" start","bytes":[32,115,116,97,114,116],"logprob":-6.0}]},{"token":" in","bytes":[32,105,110],"logprob":-0.06787109375,"top_logprobs":[{"token":" in","bytes":[32,105,110],"logprob":-0.06787109375},{"token":" warmly","bytes":[32,119,97,114,109,108,121],"logprob":-3.6875},{"token":" appropriately","bytes":[32,97,112,112,114,111,112,114,105,97,116,101,108,121],"logprob":-4.1875}]},{"token":" a","bytes":[32,97],"logprob":0.0,"top_logprobs":[{"token":" a","bytes":[32,97],"logprob":0.0},{"token":" kind","bytes":[32,107,105,110,100],"logprob":-7.125},{"token":" the","bytes":[32,116,104,101],"logprob":-9.0}]},{"token":" friendly","bytes":[32,102,114,105,101,110,100,108,121],"logprob":0.0,"top_logprobs":[{"token":" friendly","bytes":[32,102,114,105,101,110,100,108,121],"logprob":0.0},{"token":" warm","bytes":[32,119,97,114,109],"logprob":-7.125},{"token":" welcoming","bytes":[32,119,101,108,99,111,109,105,110,103],"logprob":-9.0}]},{"token":" and","bytes":[32,97,110,100],"logprob":-0.007781982421875,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":-0.007781982421875},{"token":" manner","bytes":[32,109,97,110,110,101,114],"logprob":-5.875},{"token":" way","bytes":[32,119,97,121],"logprob":-6.5}]},{"token":" approach","bytes":[32,97,112,112,114,111,97,99,104],"logprob":-1.7265625,"top_logprobs":[{"token":" welcoming","bytes":[32,119,101,108,99,111,109,105,110,103],"logprob":-0.22265625},{"token":" approach","bytes":[32,97,112,112,114,111,97,99,104],"logprob":-1.7265625},{"token":" open","bytes":[32,111,112,101,110],"logprob":-3.96875}]},{"token":"able","bytes":[97,98,108,101],"logprob":0.0,"top_logprobs":[{"token":"able","bytes":[97,98,108,101],"logprob":0.0},{"token":"-oriented","bytes":[45,111,114,105,101,110,116,101,100],"logprob":-15.75},{"token":"ful","bytes":[102,117,108],"logprob":-16.25}]},{"token":" way","bytes":[32,119,97,121],"logprob":-0.03076171875,"top_logprobs":[{"token":" way","bytes":[32,119,97,121],"logprob":-0.03076171875},{"token":" manner","bytes":[32,109,97,110,110,101,114],"logprob":-3.53125},{"token":" tone","bytes":[32,116,111,110,101],"logprob":-12.0}]},{"token":".","bytes":[46],"logprob":-0.12451171875,"top_logprobs":[{"token":".","bytes":[46],"logprob":-0.12451171875},{"token":".\n\n","bytes":[46,10,10],"logprob":-2.125},{"token":".\n","bytes":[46,10],"logprob":-7.75}]},{"token":" I","bytes":[32,73],"logprob":-0.73046875,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":-0.73046875},{"token":" Maybe","bytes":[32,77,97,121,98,101],"logprob":-0.73046875},{"token":" Let","bytes":[32,76,101,116],"logprob":-3.734375}]},{"token":" want","bytes":[32,119,97,110,116],"logprob":-0.023193359375,"top_logprobs":[{"token":" want","bytes":[32,119,97,110,116],"logprob":-0.023193359375},{"token":" need","bytes":[32,110,101,101,100],"logprob":-4.28125},{"token":"'ll","bytes":[39,108,108],"logprob":-4.53125}]},{"token":" to","bytes":[32,116,111],"logprob":0.0,"top_logprobs":[{"token":" to","bytes":[32,116,111],"logprob":0.0},{"token":" them","bytes":[32,116,104,101,109],"logprob":-8.5},{"token":" it","bytes":[32,105,116],"logprob":-17.25}]},{"token":" make","bytes":[32,109,97,107,101],"logprob":-0.0155029296875,"top_logprobs":[{"token":" make","bytes":[32,109,97,107,101],"logprob":-0.0155029296875},{"token":" let","bytes":[32,108,101,116],"logprob":-4.625},{"token":" keep","bytes":[32,107,101,101,112],"logprob":-6.125}]},{"token":" sure","bytes":[32,115,117,114,101],"logprob":-0.023193359375,"top_logprobs":[{"token":" sure","bytes":[32,115,117,114,101],"logprob":-0.023193359375},{"token":" them","bytes":[32,116,104,101,109],"logprob":-3.890625},{"token":" it","bytes":[32,105,116],"logprob":-12.0}]},{"token":" they","bytes":[32,116,104,101,121],"logprob":-0.0155029296875,"top_logprobs":[{"token":" they","bytes":[32,116,104,101,121],"logprob":-0.0155029296875},{"token":" I","bytes":[32,73],"logprob":-4.25},{"token":" to","bytes":[32,116,111],"logprob":-7.625}]},{"token":" feel","bytes":[32,102,101,101,108],"logprob":0.0,"top_logprobs":[{"token":" feel","bytes":[32,102,101,101,108],"logprob":0.0},{"token":"'re","bytes":[39,114,101],"logprob":-8.375},{"token":" have","bytes":[32,104,97,118,101],"logprob":-8.75}]},{"token":" welcome","bytes":[32,119,101,108,99,111,109,101],"logprob":-0.007781982421875,"top_logprobs":[{"token":" welcome","bytes":[32,119,101,108,99,111,109,101],"logprob":-0.007781982421875},{"token":" comfortable","bytes":[32,99,111,109,102,111,114,116,97,98,108,101],"logprob":-4.75},{"token":" heard","bytes":[32,104,101,97,114,100],"logprob":-10.0}]},{"token":" to","bytes":[32,116,111],"logprob":-0.486328125,"top_logprobs":[{"token":" to","bytes":[32,116,111],"logprob":-0.486328125},{"token":" and","bytes":[32,97,110,100],"logprob":-1.234375},{"token":".","bytes":[46],"logprob":-2.484375}]},{"token":" continue","bytes":[32,99,111,110,116,105,110,117,101],"logprob":-2.578125,"top_logprobs":[{"token":" ask","bytes":[32,97,115,107],"logprob":-0.08251953125},{"token":" continue","bytes":[32,99,111,110,116,105,110,117,101],"logprob":-2.578125},{"token":" share","bytes":[32,115,104,97,114,101],"logprob":-5.46875}]},{"token":" the","bytes":[32,116,104,101],"logprob":-0.42578125,"top_logprobs":[{"token":" the","bytes":[32,116,104,101],"logprob":-0.42578125},{"token":" talking","bytes":[32,116,97,108,107,105,110,103],"logprob":-1.1796875},{"token":" having","bytes":[32,104,97,118,105,110,103],"logprob":-3.796875}]},{"token":" conversation","bytes":[32,99,111,110,118,101,114,115,97,116,105,111,110],"logprob":0.0,"top_logprobs":[{"token":" conversation","bytes":[32,99,111,110,118,101,114,115,97,116,105,111,110],"logprob":0.0},{"token":" chat","bytes":[32,99,104,97,116],"logprob":-8.875},{"token":" discussion","bytes":[32,100,105,115,99,117,115,115,105,111,110],"logprob":-9.875}]},{"token":".","bytes":[46],"logprob":-0.08984375,"top_logprobs":[{"token":".","bytes":[46],"logprob":-0.08984375},{"token":".\n\n","bytes":[46,10,10],"logprob":-2.59375},{"token":".\n","bytes":[46,10],"logprob":-5.21875}]},{"token":" Maybe","bytes":[32,77,97,121,98,101],"logprob":-0.0155029296875,"top_logprobs":[{"token":" Maybe","bytes":[32,77,97,121,98,101],"logprob":-0.0155029296875},{"token":" I","bytes":[32,73],"logprob":-4.25},{"token":" Let","bytes":[32,76,101,116],"logprob":-5.5}]},{"token":" I","bytes":[32,73],"logprob":-0.0458984375,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":-0.0458984375},{"token":" something","bytes":[32,115,111,109,101,116,104,105,110,103],"logprob":-4.03125},{"token":" say","bytes":[32,115,97,121],"logprob":-4.40625}]},{"token":" can","bytes":[32,99,97,110],"logprob":-0.2041015625,"top_logprobs":[{"token":" can","bytes":[32,99,97,110],"logprob":-0.2041015625},{"token":"'ll","bytes":[39,108,108],"logprob":-1.703125},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-6.84375}]},{"token":" greet","bytes":[32,103,114,101,101,116],"logprob":-0.81640625,"top_logprobs":[{"token":" greet","bytes":[32,103,114,101,101,116],"logprob":-0.81640625},{"token":" ask","bytes":[32,97,115,107],"logprob":-1.0703125},{"token":" offer","bytes":[32,111,102,102,101,114],"logprob":-2.0625}]},{"token":" them","bytes":[32,116,104,101,109],"logprob":0.0,"top_logprobs":[{"token":" them","bytes":[32,116,104,101,109],"logprob":0.0},{"token":" back","bytes":[32,98,97,99,107],"logprob":-17.25},{"token":" him","bytes":[32,104,105,109],"logprob":-19.0}]},{"token":" back","bytes":[32,98,97,99,107],"logprob":0.0,"top_logprobs":[{"token":" back","bytes":[32,98,97,99,107],"logprob":0.0},{"token":" and","bytes":[32,97,110,100],"logprob":-6.5},{"token":" again","bytes":[32,97,103,97,105,110],"logprob":-7.25}]},{"token":" and","bytes":[32,97,110,100],"logprob":0.0,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":0.0},{"token":" in","bytes":[32,105,110],"logprob":-7.0},{"token":" with","bytes":[32,119,105,116,104],"logprob":-8.25}]},{"token":" offer","bytes":[32,111,102,102,101,114],"logprob":-0.44140625,"top_logprobs":[{"token":" offer","bytes":[32,111,102,102,101,114],"logprob":-0.44140625},{"token":" ask","bytes":[32,97,115,107],"logprob":-1.1875},{"token":" let","bytes":[32,108,101,116],"logprob":-2.9375}]},{"token":" my","bytes":[32,109,121],"logprob":-1.1796875,"top_logprobs":[{"token":" assistance","bytes":[32,97,115,115,105,115,116,97,110,99,101],"logprob":-0.9296875},{"token":" my","bytes":[32,109,121],"logprob":-1.1796875},{"token":" help","bytes":[32,104,101,108,112],"logprob":-1.5546875}]},{"token":" help","bytes":[32,104,101,108,112],"logprob":-0.22265625,"top_logprobs":[{"token":" help","bytes":[32,104,101,108,112],"logprob":-0.22265625},{"token":" assistance","bytes":[32,97,115,115,105,115,116,97,110,99,101],"logprob":-1.6015625},{"token":" services","bytes":[32,115,101,114,118,105,99,101,115],"logprob":-11.25}]},{"token":".","bytes":[46],"logprob":-0.03076171875,"top_logprobs":[{"token":".","bytes":[46],"logprob":-0.03076171875},{"token":" with","bytes":[32,119,105,116,104],"logprob":-4.15625},{"token":".\n","bytes":[46,10],"logprob":-5.15625}]},{"token":" Something","bytes":[32,83,111,109,101,116,104,105,110,103],"logprob":-1.28125,"top_logprobs":[{"token":" Let","bytes":[32,76,101,116],"logprob":-1.15625},{"token":" Keeping","bytes":[32,75,101,101,112,105,110,103],"logprob":-1.15625},{"token":" Something","bytes":[32,83,111,109,101,116,104,105,110,103],"logprob":-1.28125}]},{"token":" like","bytes":[32,108,105,107,101],"logprob":-0.007781982421875,"top_logprobs":[{"token":" like","bytes":[32,108,105,107,101],"logprob":-0.007781982421875},{"token":" simple","bytes":[32,115,105,109,112,108,101],"logprob":-5.5},{"token":" along","bytes":[32,97,108,111,110,103],"logprob":-7.5}]},{"token":",","bytes":[44],"logprob":-0.038330078125,"top_logprobs":[{"token":",","bytes":[44],"logprob":-0.038330078125},{"token":" that","bytes":[32,116,104,97,116],"logprob":-3.53125},{"token":" \"","bytes":[32,34],"logprob":-6.03125}]},{"token":" \"","bytes":[32,34],"logprob":0.0,"top_logprobs":[{"token":" \"","bytes":[32,34],"logprob":0.0},{"token":" “","bytes":[32,226,128,156],"logprob":-14.25},{"token":" hey","bytes":[32,104,101,121],"logprob":-14.5}]},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0,"top_logprobs":[{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0},{"token":"Hi","bytes":[72,105],"logprob":-9.0},{"token":"Hey","bytes":[72,101,121],"logprob":-9.0}]},{"token":"!","bytes":[33],"logprob":0.0,"top_logprobs":[{"token":"!","bytes":[33],"logprob":0.0},{"token":" again","bytes":[32,97,103,97,105,110],"logprob":-8.875},{"token":",","bytes":[44],"logprob":-9.75}]},{"token":" How","bytes":[32,72,111,119],"logprob":0.0,"top_logprobs":[{"token":" How","bytes":[32,72,111,119],"logprob":0.0},{"token":" I","bytes":[32,73],"logprob":-9.875},{"token":" Welcome","bytes":[32,87,101,108,99,111,109,101],"logprob":-10.125}]},{"token":" can","bytes":[32,99,97,110],"logprob":0.0,"top_logprobs":[{"token":" can","bytes":[32,99,97,110],"logprob":0.0},{"token":" could","bytes":[32,99,111,117,108,100],"logprob":-17.25},{"token":"'s","bytes":[39,115],"logprob":-17.625}]},{"token":" I","bytes":[32,73],"logprob":0.0,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":0.0},{"token":"I","bytes":[73],"logprob":-18.5},{"token":" me","bytes":[32,109,101],"logprob":-20.375}]},{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0,"top_logprobs":[{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0},{"token":" help","bytes":[32,104,101,108,112],"logprob":-10.25},{"token":" effectively","bytes":[32,101,102,102,101,99,116,105,118,101,108,121],"logprob":-14.125}]},{"token":" you","bytes":[32,121,111,117],"logprob":0.0,"top_logprobs":[{"token":" you","bytes":[32,121,111,117],"logprob":0.0},{"token":"你","bytes":[228,189,160],"logprob":-16.625},{"token":" You","bytes":[32,89,111,117],"logprob":-17.875}]},{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0,"top_logprobs":[{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0},{"token":"?\"","bytes":[63,34],"logprob":-14.5},{"token":" tonight","bytes":[32,116,111,110,105,103,104,116],"logprob":-15.75}]},{"token":"?\"","bytes":[63,34],"logprob":-0.0155029296875,"top_logprobs":[{"token":"?\"","bytes":[63,34],"logprob":-0.0155029296875},{"token":"?\"\n\n","bytes":[63,34,10,10],"logprob":-4.75},{"token":"?","bytes":[63],"logprob":-5.75}]},{"token":" That","bytes":[32,84,104,97,116],"logprob":-0.1982421875,"top_logprobs":[{"token":" That","bytes":[32,84,104,97,116],"logprob":-0.1982421875},{"token":" that","bytes":[32,116,104,97,116],"logprob":-2.328125},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-3.078125}]},{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.023193359375,"top_logprobs":[{"token":" should","bytes":[32,115,104,111,117,108,100],"logprob":-0.023193359375},{"token":" sounds","bytes":[32,115,111,117,110,100,115],"logprob":-4.03125},{"token":" seems","bytes":[32,115,101,101,109,115],"logprob":-5.28125}]},{"token":" cover","bytes":[32,99,111,118,101,114],"logprob":-0.294921875,"top_logprobs":[{"token":" cover","bytes":[32,99,111,118,101,114],"logprob":-0.294921875},{"token":" do","bytes":[32,100,111],"logprob":-1.921875},{"token":" keep","bytes":[32,107,101,101,112],"logprob":-2.671875}]},{"token":" it","bytes":[32,105,116],"logprob":0.0,"top_logprobs":[{"token":" it","bytes":[32,105,116],"logprob":0.0},{"token":" the","bytes":[32,116,104,101],"logprob":-8.5},{"token":" both","bytes":[32,98,111,116,104],"logprob":-10.125}]},{"token":" nicely","bytes":[32,110,105,99,101,108,121],"logprob":-5.25,"top_logprobs":[{"token":" and","bytes":[32,97,110,100],"logprob":-0.2353515625},{"token":".\n","bytes":[46,10],"logprob":-1.734375},{"token":" without","bytes":[32,119,105,116,104,111,117,116],"logprob":-4.125}]},{"token":".\n","bytes":[46,10],"logprob":-0.05322265625,"top_logprobs":[{"token":".\n","bytes":[46,10],"logprob":-0.05322265625},{"token":" and","bytes":[32,97,110,100],"logprob":-3.296875},{"token":".","bytes":[46],"logprob":-4.5625}]},{"token":"</think>","bytes":[60,47,116,104,105,110,107,62],"logprob":0.0,"top_logprobs":[{"token":"</think>","bytes":[60,47,116,104,105,110,107,62],"logprob":0.0},{"token":"</","bytes":[60,47],"logprob":-21.125},{"token":"******\n","bytes":[42,42,42,42,42,42,10],"logprob":-26.375}]},{"token":"\n\n","bytes":[10,10],"logprob":0.0,"top_logprobs":[{"token":"\n\n","bytes":[10,10],"logprob":0.0},{"token":"<|end▁of▁sentence|>","bytes":[60,239,189,156,101,110,100,226,150,129,111,102,226,150,129,115,101,110,116,101,110,99,101,239,189,156,62],"logprob":-18.5},{"token":" \n\n","bytes":[32,10,10],"logprob":-24.375}]},{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0,"top_logprobs":[{"token":"Hello","bytes":[72,101,108,108,111],"logprob":0.0},{"token":"Hi","bytes":[72,105],"logprob":-11.625},{"token":" hello","bytes":[32,104,101,108,108,111],"logprob":-14.25}]},{"token":"!","bytes":[33],"logprob":0.0,"top_logprobs":[{"token":"!","bytes":[33],"logprob":0.0},{"token":"!","bytes":[239,188,129],"logprob":-15.375},{"token":"!\n\n","bytes":[33,10,10],"logprob":-16.25}]},{"token":" How","bytes":[32,72,111,119],"logprob":0.0,"top_logprobs":[{"token":" How","bytes":[32,72,111,119],"logprob":0.0},{"token":" Welcome","bytes":[32,87,101,108,99,111,109,101],"logprob":-10.0},{"token":" �","bytes":[32,239,191,189],"logprob":-10.875}]},{"token":" can","bytes":[32,99,97,110],"logprob":0.0,"top_logprobs":[{"token":" can","bytes":[32,99,97,110],"logprob":0.0},{"token":" Can","bytes":[32,67,97,110],"logprob":-17.625},{"token":" are","bytes":[32,97,114,101],"logprob":-19.75}]},{"token":" I","bytes":[32,73],"logprob":0.0,"top_logprobs":[{"token":" I","bytes":[32,73],"logprob":0.0},{"token":" we","bytes":[32,119,101],"logprob":-19.375},{"token":" me","bytes":[32,109,101],"logprob":-19.75}]},{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0,"top_logprobs":[{"token":" assist","bytes":[32,97,115,115,105,115,116],"logprob":0.0},{"token":" help","bytes":[32,104,101,108,112],"logprob":-10.625},{"token":" assistance","bytes":[32,97,115,115,105,115,116,97,110,99,101],"logprob":-14.0}]},{"token":" you","bytes":[32,121,111,117],"logprob":0.0,"top_logprobs":[{"token":" you","bytes":[32,121,111,117],"logprob":0.0},{"token":"你","bytes":[228,189,160],"logprob":-19.0},{"token":" You","bytes":[32,89,111,117],"logprob":-21.25}]},{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0,"top_logprobs":[{"token":" today","bytes":[32,116,111,100,97,121],"logprob":0.0},{"token":"today","bytes":[116,111,100,97,121],"logprob":-19.625},{"token":"今天","bytes":[228,187,138,229,164,169],"logprob":-20.625}]},{"token":"?","bytes":[63],"logprob":0.0,"top_logprobs":[{"token":"?","bytes":[63],"logprob":0.0},{"token":"?\n\n","bytes":[63,10,10],"logprob":-17.25},{"token":"?\"","bytes":[63,34],"logprob":-19.0}]},{"token":"<|end▁of▁sentence|>","bytes":[60,239,189,156,101,110,100,226,150,129,111,102,226,150,129,115,101,110,116,101,110,99,101,239,189,156,62],"logprob":0.0,"top_logprobs":[{"token":"<|end▁of▁sentence|>","bytes":[60,239,189,156,101,110,100,226,150,129,111,102,226,150,129,115,101,110,116,101,110,99,101,239,189,156,62],"logprob":0.0},{"token":" �","bytes":[32,239,191,189],"logprob":-9.5},{"token":" If","bytes":[32,73,102],"logprob":-10.0}]}]},"finish_reason":"stop","matched_stop":151643,"hidden_states":null}],"usage":{"prompt_tokens":7,"total_tokens":83,"completion_tokens":76,"prompt_tokens_details":null}}

engine mode

  • code
from sgl_jax.srt.entrypoints.engine import Engine
if __name__ == '__main__':
    engine = Engine(model_path = 'deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B', trust_remote_code = True, dist_init_addr = '0.0.0.0:10011', nnodes = 1 , tp_size = 1, device = 'tpu' ,random_seed = 3, node_rank = 0, mem_fraction_static = 0.4, chunked_prefill_size = 8192, download_dir = '/tmp', dtype = 'bfloat16', precompile_bs_paddings = [64], max_running_requests = 64, skip_server_warmup = True, attention_backend = 'fa',precompile_token_paddings = [8192], page_size = 64 ,log_requests = True, log_requests_level = 3)
    output = engine.generate(prompt = ["please introduce yourself<think>"], sampling_params = {"n":1, "temperature": 0.7}, return_logprob=True, top_logprobs_num=2, logprob_start_len=1, token_ids_logprob=[10])
    print(len(list(output)), output)
  • output
{'text': "\nI'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. I'll do my best to help you.\n</think>\n\nI'm DeepSeek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. I'll do my best to help you.", 'output_ids': [198, 40, 2776, 18183, 39350, 10911, 16, 11, 458, 15235, 17847, 3465, 23242, 553, 279, 8453, 8188, 18183, 39350, 13, 358, 3278, 653, 847, 1850, 311, 1492, 498, 624, 151649, 271, 40, 2776, 18183, 39350, 10911, 16, 11, 458, 15235, 17847, 3465, 23242, 553, 279, 8453, 8188, 18183, 39350, 13, 358, 3278, 653, 847, 1850, 311, 1492, 498, 13], 'meta_info': {'id': '73013ffafdfe48f6817a13d608f25265', 'finish_reason': {'type': 'stop', 'matched': 151643}, 'prompt_tokens': 5, 'input_token_logprobs': [(None, 30021, 'please'), (-8.5625, 19131, ' introduce'), (-4.84375, 6133, ' yourself'), (-14.5, 151648, '<think>')], 'output_token_logprobs': [(-0.017578125, 198, '\n'), (-0.248046875, 40, 'I'), (-4.5299530029296875e-05, 2776, "'m"), (0.00032806396484375, 18183, ' Deep'), (0.0, 39350, 'Seek'), (0.0, 10911, '-R'), (-4.5299530029296875e-05, 16, '1'), (0.00032806396484375, 11, ','), (0.0033416748046875, 458, ' an'), (-0.00799560546875, 15235, ' AI'), (0.0033416748046875, 17847, ' assistant'), (0.0, 3465, ' created'), (-0.01202392578125, 23242, ' exclusively'), (0.00347900390625, 553, ' by'), (0.00186920166015625, 279, ' the'), (-0.0026092529296875, 8453, ' Chinese'), (7.581710815429688e-05, 8188, ' Company'), (7.581710815429688e-05, 18183, ' Deep'), (7.581710815429688e-05, 39350, 'Seek'), (7.581710815429688e-05, 13, '.'), (-0.00531005859375, 358, ' I'), (-0.2158203125, 3278, "'ll"), (7.581710815429688e-05, 653, ' do'), (0.0, 847, ' my'), (-0.0026092529296875, 1850, ' best'), (7.581710815429688e-05, 311, ' to'), (7.581710815429688e-05, 1492, ' help'), (0.00347900390625, 498, ' you'), (0.0, 624, '.\n'), (0.0033416748046875, 151649, '</think>'), (0.0014801025390625, 271, '\n\n'), (0.0033416748046875, 40, 'I'), (-0.0026092529296875, 2776, "'m"), (0.0, 18183, ' Deep'), (0.0, 39350, 'Seek'), (0.0033416748046875, 10911, '-R'), (0.0033416748046875, 16, '1'), (0.0033416748046875, 11, ','), (-4.5299530029296875e-05, 458, ' an'), (-0.0026092529296875, 15235, ' AI'), (0.00347900390625, 17847, ' assistant'), (7.581710815429688e-05, 3465, ' created'), (-0.0026092529296875, 23242, ' exclusively'), (0.0033416748046875, 553, ' by'), (0.0014801025390625, 279, ' the'), (0.0, 8453, ' Chinese'), (7.581710815429688e-05, 8188, ' Company'), (-0.0026092529296875, 18183, ' Deep'), (0.00347900390625, 39350, 'Seek'), (-0.0026092529296875, 13, '.'), (-4.5299530029296875e-05, 358, ' I'), (0.00347900390625, 3278, "'ll"), (0.00347900390625, 653, ' do'), (0.0, 847, ' my'), (-4.5299530029296875e-05, 1850, ' best'), (0.00347900390625, 311, ' to'), (0.0014801025390625, 1492, ' help'), (0.0, 498, ' you'), (0.00347900390625, 13, '.'), (-4.5299530029296875e-05, 151643, '<|end▁of▁sentence|>')], 'input_top_logprobs': [None, [(-1.0703125, 10339, ' explain'), (-1.3203125, 1492, ' help')], [(-0.6484375, 279, ' the'), (-2.28125, 264, ' a')], [(-0.70703125, 438, ' as'), (-1.578125, 323, ' and')]], 'output_top_logprobs': [[(-0.017578125, 198, '\n'), (-4.46875, 151648, '<think>')], [(-0.248046875, 40, 'I'), (-2.390625, 91786, 'Greetings')], [(-4.5299530029296875e-05, 2776, "'m"), (-6.78125, 4249, '’m')], [(0.00032806396484375, 18183, ' Deep'), (-8.5, 458, ' an')], [(0.0, 39350, 'Seek'), (-16.375, 585, 'ak')], [(0.0, 10911, '-R'), (-11.625, 431, ' R')], [(-4.5299530029296875e-05, 16, '1'), (-14.5625, 17, '2')], [(0.00032806396484375, 11, ','), (-6.0625, 3465, ' created')], [(0.0033416748046875, 458, ' an'), (-9.4375, 264, ' a')], [(-0.00799560546875, 15235, ' AI'), (-4.65625, 20443, ' artificial')], [(0.0033416748046875, 17847, ' assistant'), (-9.8125, 21388, ' Assistant')], [(0.0, 3465, ' created'), (-6.25, 28135, ' independently')], [(-0.01202392578125, 23242, ' exclusively'), (-4.46875, 553, ' by')], [(0.00347900390625, 553, ' by'), (-23.0, 1694, 'by')], [(0.00186920166015625, 279, ' the'), (-14.375, 18183, ' Deep')], [(-0.0026092529296875, 8453, ' Chinese'), (-13.125, 15819, ' Meta')], [(7.581710815429688e-05, 8188, ' Company'), (-6.59375, 2813, ' company')], [(7.581710815429688e-05, 18183, ' Deep'), (-11.875, 33464, 'Deep')], [(7.581710815429688e-05, 39350, 'Seek'), (-14.625, 11056, 'Speed')], [(7.581710815429688e-05, 13, '.'), (-13.3125, 1212, ' under')], [(-0.00531005859375, 358, ' I'), (-6.25, 31733, ' Feel')], [(-0.2158203125, 3278, "'ll"), (-2.1875, 2776, "'m")], [(7.581710815429688e-05, 653, ' do'), (-12.3125, 3730, ' doing')], [(0.0, 847, ' my'), (-14.4375, 97611, '我的')], [(-0.0026092529296875, 1850, ' best'), (-10.1875, 14470, 'Best')], [(7.581710815429688e-05, 311, ' to'), (-20.625, 369, ' for')], [(7.581710815429688e-05, 1492, ' help'), (-6.78125, 7789, ' assist')], [(0.00347900390625, 498, ' you'), (-11.0625, 56568, '你')], [(0.0, 624, '.\n'), (-12.875, 382, '.\n\n')], [(0.0033416748046875, 151649, '</think>'), (-18.875, 522, '</')], [(0.0014801025390625, 271, '\n\n'), (-20.375, 151643, '<|end▁of▁sentence|>')], [(0.0033416748046875, 40, 'I'), (-10.6875, 33464, 'Deep')], [(-0.0026092529296875, 2776, "'m"), (-15.875, 4249, '’m')], [(0.0, 18183, ' Deep'), (-11.875, 33464, 'Deep')], [(0.0, 39350, 'Seek'), (-13.0625, 2859, 'Query')], [(0.0033416748046875, 10911, '-R'), (-24.875, 2568, '_R')], [(0.0033416748046875, 16, '1'), (-23.25, 17, '2')], [(0.0033416748046875, 11, ','), (-13.375, 3837, ',')], [(-4.5299530029296875e-05, 458, ' an'), (-18.875, 697, ' your')], [(-0.0026092529296875, 15235, ' AI'), (-16.75, 15469, 'AI')], [(0.00347900390625, 17847, ' assistant'), (-13.5625, 21388, ' Assistant')], [(7.581710815429688e-05, 3465, ' created'), (-18.25, 4290, ' Created')], [(-0.0026092529296875, 23242, ' exclusively'), (-15.875, 11689, ' specifically')], [(0.0033416748046875, 553, ' by'), (-26.375, 1694, 'by')], [(0.0014801025390625, 279, ' the'), (-19.125, 18183, ' Deep')], [(0.0, 8453, ' Chinese'), (-13.5625, 44923, 'Chinese')], [(7.581710815429688e-05, 8188, ' Company'), (-14.125, 14491, 'Company')], [(-0.0026092529296875, 18183, ' Deep'), (-12.6875, 33464, 'Deep')], [(0.00347900390625, 39350, 'Seek'), (-14.4375, 10887, ' seeking')], [(-0.0026092529296875, 13, '.'), (-14.125, 1773, '。')], [(-4.5299530029296875e-05, 358, ' I'), (-15.5625, 151643, '<|end▁of▁sentence|>')], [(0.00347900390625, 3278, "'ll"), (-11.25, 4700, '’ll')], [(0.00347900390625, 653, ' do'), (-19.25, 3730, ' doing')], [(0.0, 847, ' my'), (-20.125, 2408, 'my')], [(-4.5299530029296875e-05, 1850, ' best'), (-17.875, 15862, 'best')], [(0.00347900390625, 311, ' to'), (-21.375, 1492, ' help')], [(0.0014801025390625, 1492, ' help'), (-14.8125, 7789, ' assist')], [(0.0, 498, ' you'), (-14.125, 56568, '你')], [(0.00347900390625, 13, '.'), (-18.5, 382, '.\n\n')], [(-4.5299530029296875e-05, 151643, '<|end▁of▁sentence|>'), (-18.0, 31733, ' Feel')]], 'input_token_ids_logprobs': [None, [(-19.125, 10, '+')], [(-15.25, 10, '+')], [(-16.375, 10, '+')]], 'output_token_ids_logprobs': [[(-36.75, 10, '+')], [(-29.625, 10, '+')], [(-30.375, 10, '+')], [(-29.125, 10, '+')], [(-35.5, 10, '+')], [(-33.25, 10, '+')], [(-29.0, 10, '+')], [(-23.25, 10, '+')], [(-42.25, 10, '+')], [(-34.5, 10, '+')], [(-32.5, 10, '+')], [(-34.5, 10, '+')], [(-29.875, 10, '+')], [(-48.5, 10, '+')], [(-36.75, 10, '+')], [(-34.0, 10, '+')], [(-28.5, 10, '+')], [(-33.25, 10, '+')], [(-33.5, 10, '+')], [(-34.25, 10, '+')], [(-36.5, 10, '+')], [(-29.875, 10, '+')], [(-41.25, 10, '+')], [(-53.0, 10, '+')], [(-40.75, 10, '+')], [(-46.25, 10, '+')], [(-45.25, 10, '+')], [(-39.5, 10, '+')], [(-34.25, 10, '+')], [(-36.25, 10, '+')], [(-50.0, 10, '+')], [(-34.0, 10, '+')], [(-42.5, 10, '+')], [(-39.75, 10, '+')], [(-35.0, 10, '+')], [(-40.0, 10, '+')], [(-46.5, 10, '+')], [(-37.0, 10, '+')], [(-50.25, 10, '+')], [(-40.25, 10, '+')], [(-35.25, 10, '+')], [(-40.5, 10, '+')], [(-39.0, 10, '+')], [(-47.25, 10, '+')], [(-45.25, 10, '+')], [(-39.5, 10, '+')], [(-41.75, 10, '+')], [(-43.25, 10, '+')], [(-38.25, 10, '+')], [(-39.25, 10, '+')], [(-51.25, 10, '+')], [(-44.25, 10, '+')], [(-47.0, 10, '+')], [(-53.0, 10, '+')], [(-43.0, 10, '+')], [(-44.0, 10, '+')], [(-48.5, 10, '+')], [(-46.5, 10, '+')], [(-36.75, 10, '+')], [(-43.5, 10, '+')]], 'completion_tokens': 60, 'cached_tokens': 0, 'cache_miss_count': 0, 'e2e_latency': 61.070454597473145}}]

performance & accuracy

environment

v6e-4

performace & accuracy

  • launch server
    JAX_COMPILATION_CACHE_DIR=/tmp/jit_cache python3 -u -m sgl_jax.launch_server --model-path Qwen/Qwen3-8B --trust-remote-code --dist-init-addr=0.0.0.0:10011 --nnodes=1 --tp-size=1 --device=tpu --random-seed=3 --node-rank=0 --mem-fraction-static=0.8 --chunked-prefill-size=8192 --download-dir=/tmp --dtype=bfloat16 --precompile-bs-paddings 1 64 --max-running-requests 64 --max-total-tokens 257536 --skip-server-warmup --attention-backend=fa --precompile-token-paddings 8192 --page-size=64

performace

  • command(input=1k,output=1k,batch_size=8)
    python3 -m sgl_jax.bench_serving --backend sgl-jax --dataset-name random --num-prompts 24 --random-input 1024 --random-output 1024 --random-range-ratio 1 --max-concurrency 8 --warmup-requests 0
  • result (current branch vs main)
    image
    image

accuracy

  • command(evalscope==0.17.1)
    evalscope eval --model Qwen/Qwen3-8B --api-url http://127.0.0.1:30000/v1 --api-key EMPTY --eval-type server --datasets gsm8k --eval-batch-size 64
  • result( current branch vs main)
    image
    image

@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@pathfinder-pf pathfinder-pf force-pushed the feat/return_logprobs branch 2 times, most recently from b0f1fc4 to 99f1a85 Compare November 4, 2025 05:56
@JamesBrianD
Copy link
Collaborator

Benchmark information is required. Thanks a lot!

@aolemila
Copy link
Collaborator

aolemila commented Nov 4, 2025

The following content are necessary for this PR. @pathfinder-pf

Cache Miss

Please add tests in test_features.py. You can follow def test_cache_miss_prefill(self) and def test_cache_miss_decode(self) to add separate parameters to test whether return_logprob would result in cache_miss or not.

Note: Please set logprob_start_len to -1 or 1, top_logprobs_num to be greater than 1, token_ids_logprob to be not None.

Feature

Please add the following tests in CI to ensure the feature works.

Note: Please test the following cases when batch_size = 1 and batch_size > 1.

  • case1: return_logprobs = True && logprob_start_len = -1, ensure only return logprobs for output_ids.
  • case2: return_logprobs = True && logprob_start_len = 1 && top_logprobs_num = 3 && token_ids_logprob = [xxx, xxx, xxx], please ensure the output is expected.

Accuracy

How to ensure the return logprobs are expected? It needs a discussion.
We can refer to https://github.com/sgl-project/sglang/pull/10230/files#diff-74d422b689aa0e95e1425839e9d1f6dede379dd885199d718a4d16eff21ff714.

Profile and Benchmark

Baseline

Please add baselines for three scenarios in blog.
Note: need discussions.

@pathfinder-pf pathfinder-pf force-pushed the feat/return_logprobs branch 8 times, most recently from 1991b5a to 6e256a2 Compare November 13, 2025 07:36
@pathfinder-pf pathfinder-pf force-pushed the feat/return_logprobs branch 6 times, most recently from cf1042b to c76c7b7 Compare November 18, 2025 06:35
@pathfinder-pf pathfinder-pf merged commit eced7f4 into sgl-project:main Nov 18, 2025
16 checks passed
lzlfwow pushed a commit that referenced this pull request Dec 4, 2025
Co-authored-by: pathfinder-fp <slackexplorer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments