Skip to content

fix prompt_logprob crash when delayed sampling is on#1421

Merged
michalkuligowski merged 3 commits intoHabanaAI:habana_mainfrom
ccrhx4:fix_prompt_logprobs_crash
Jul 11, 2025
Merged

fix prompt_logprob crash when delayed sampling is on#1421
michalkuligowski merged 3 commits intoHabanaAI:habana_mainfrom
ccrhx4:fix_prompt_logprobs_crash

Conversation

@ccrhx4
Copy link
Copy Markdown

@ccrhx4 ccrhx4 commented Jun 13, 2025

vllm-fork crash when logporbs and prompt_logprobs are both request. This PR is to fix this issue. https://jira.habana-labs.com/browse/SW-231158

I have verified the fix with all kinds of lm_eval tasks, which request prompt_logprobs and logprobs or not.

VLLM_SKIP_WARMUP=true vllm serve meta-llama/Meta-Llama-3-8B

no_proxy=0.0.0.0 HF_ALLOW_CODE_EVAL=1 lm_eval --model local-completions --tasks lambada_openai,hellaswag,winogrande,piqa,mmlu,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge --model_args model=facebook/opt-125m,base_url=http://0.0.0.0:8000/v1/completions,trust_remote_code=True,max_gen_toks=1024 --confirm_run_unsafe_code

Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc 0.5000 ± 0.0146
none 0 acc_norm 0.5358 ± 0.0146
arc_easy 1 none 0 acc 0.8005 ± 0.0082
none 0 acc_norm 0.7765 ± 0.0085
boolq 2 none 0 acc 0.8107 ± 0.0069
hellaswag 1 none 0 acc 0.6013 ± 0.0049
none 0 acc_norm 0.7908 ± 0.0041
lambada_openai 1 none 0 acc 0.7677 ± 0.0059
none 0 perplexity 3.0996 ± 0.0571
mmlu 2 none acc 0.6211 ± 0.0038
- humanities 2 none acc 0.5492 ± 0.0066
- formal_logic 1 none 0 acc 0.3968 ± 0.0438
- high_school_european_history 1 none 0 acc 0.7515 ± 0.0337
- high_school_us_history 1 none 0 acc 0.8039 ± 0.0279
- high_school_world_history 1 none 0 acc 0.8186 ± 0.0251
- international_law 1 none 0 acc 0.7851 ± 0.0375
- jurisprudence 1 none 0 acc 0.7500 ± 0.0419
- logical_fallacies 1 none 0 acc 0.7239 ± 0.0351
- moral_disputes 1 none 0 acc 0.6994 ± 0.0247
- moral_scenarios 1 none 0 acc 0.2380 ± 0.0142
- philosophy 1 none 0 acc 0.7267 ± 0.0253
- prehistory 1 none 0 acc 0.7315 ± 0.0247
- professional_law 1 none 0 acc 0.4570 ± 0.0127
- world_religions 1 none 0 acc 0.8129 ± 0.0299
- other 2 none acc 0.7029 ± 0.0078
- business_ethics 1 none 0 acc 0.6000 ± 0.0492
- clinical_knowledge 1 none 0 acc 0.7358 ± 0.0271
- college_medicine 1 none 0 acc 0.6243 ± 0.0369
- global_facts 1 none 0 acc 0.2800 ± 0.0451
- human_aging 1 none 0 acc 0.6592 ± 0.0318
- management 1 none 0 acc 0.8544 ± 0.0349
- marketing 1 none 0 acc 0.8632 ± 0.0225
- medical_genetics 1 none 0 acc 0.8400 ± 0.0368
- miscellaneous 1 none 0 acc 0.8148 ± 0.0139
- nutrition 1 none 0 acc 0.7288 ± 0.0255
- professional_accounting 1 none 0 acc 0.4504 ± 0.0297
- professional_medicine 1 none 0 acc 0.7132 ± 0.0275
- virology 1 none 0 acc 0.5422 ± 0.0388
- social sciences 2 none acc 0.7335 ± 0.0078
- econometrics 1 none 0 acc 0.4211 ± 0.0464
- high_school_geography 1 none 0 acc 0.7727 ± 0.0299
- high_school_government_and_politics 1 none 0 acc 0.8549 ± 0.0254
- high_school_macroeconomics 1 none 0 acc 0.6282 ± 0.0245
- high_school_microeconomics 1 none 0 acc 0.6807 ± 0.0303
- high_school_psychology 1 none 0 acc 0.8220 ± 0.0164
- human_sexuality 1 none 0 acc 0.7634 ± 0.0373
- professional_psychology 1 none 0 acc 0.6928 ± 0.0187
- public_relations 1 none 0 acc 0.6909 ± 0.0443
- security_studies 1 none 0 acc 0.7429 ± 0.0280
- sociology 1 none 0 acc 0.8308 ± 0.0265
- us_foreign_policy 1 none 0 acc 0.8700 ± 0.0338
- stem 2 none acc 0.5379 ± 0.0086
- abstract_algebra 1 none 0 acc 0.3000 ± 0.0461
- anatomy 1 none 0 acc 0.6963 ± 0.0397
- astronomy 1 none 0 acc 0.6974 ± 0.0374
- college_biology 1 none 0 acc 0.7639 ± 0.0355
- college_chemistry 1 none 0 acc 0.4400 ± 0.0499
- college_computer_science 1 none 0 acc 0.4900 ± 0.0502
- college_mathematics 1 none 0 acc 0.3900 ± 0.0490
- college_physics 1 none 0 acc 0.3922 ± 0.0486
- computer_security 1 none 0 acc 0.7800 ± 0.0416
- conceptual_physics 1 none 0 acc 0.5447 ± 0.0326
- electrical_engineering 1 none 0 acc 0.6207 ± 0.0404
- elementary_mathematics 1 none 0 acc 0.4286 ± 0.0255
- high_school_biology 1 none 0 acc 0.7323 ± 0.0252
- high_school_chemistry 1 none 0 acc 0.5123 ± 0.0352
- high_school_computer_science 1 none 0 acc 0.6500 ± 0.0479
- high_school_mathematics 1 none 0 acc 0.4000 ± 0.0299
- high_school_physics 1 none 0 acc 0.3775 ± 0.0396
- high_school_statistics 1 none 0 acc 0.5324 ± 0.0340
- machine_learning 1 none 0 acc 0.4464 ± 0.0472
openbookqa 1 none 0 acc 0.3480 ± 0.0213
none 0 acc_norm 0.4440 ± 0.0222
piqa 1 none 0 acc 0.7943 ± 0.0094
none 0 acc_norm 0.8107 ± 0.0091
truthfulqa_mc1 2 none 0 acc 0.2717 ± 0.0156
winogrande 1 none 0 acc 0.7301 ± 0.0125
Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.6211 ± 0.0038
- humanities 2 none acc 0.5492 ± 0.0066
- other 2 none acc 0.7029 ± 0.0078
- social sciences 2 none acc 0.7335 ± 0.0078
- stem 2 none acc 0.5379 ± 0.0086

Signed-off-by: huanxing <huanxing.shen@intel.com>
Signed-off-by: huanxing <huanxing.shen@intel.com>
@michalkuligowski
Copy link
Copy Markdown

/run-gaudi-tests

@ccrhx4
Copy link
Copy Markdown
Author

ccrhx4 commented Jun 27, 2025

@michalkuligowski @kzawora-intel @madamczyk-intel Please review the PR and provide feedback. Without this fix, lm-eval will crash for some tasks, it breaks a lot of things.

@michalkuligowski michalkuligowski merged commit c61532e into HabanaAI:habana_main Jul 11, 2025
47 checks passed
ccrhx4 added a commit to ccrhx4/huanxing.vllm-fork that referenced this pull request Jul 16, 2025
vllm-fork crash when logporbs and prompt_logprobs are both request. This
PR is to fix this issue. https://jira.habana-labs.com/browse/SW-231158

I have verified the fix with all kinds of lm_eval tasks, which request
prompt_logprobs and logprobs or not.

VLLM_SKIP_WARMUP=true vllm serve meta-llama/Meta-Llama-3-8B

no_proxy=0.0.0.0 HF_ALLOW_CODE_EVAL=1 lm_eval --model local-completions
--tasks
lambada_openai,hellaswag,winogrande,piqa,mmlu,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge
--model_args
model=facebook/opt-125m,base_url=http://0.0.0.0:8000/v1/completions,trust_remote_code=True,max_gen_toks=1024
--confirm_run_unsafe_code

| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|

|---------------------------------------|------:|------|-----:|----------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ |0.5000|± |0.0146|
| | |none | 0|acc_norm |↑ |0.5358|± |0.0146|
|arc_easy | 1|none | 0|acc |↑ |0.8005|± |0.0082|
| | |none | 0|acc_norm |↑ |0.7765|± |0.0085|
|boolq | 2|none | 0|acc |↑ |0.8107|± |0.0069|
|hellaswag | 1|none | 0|acc |↑ |0.6013|± |0.0049|
| | |none | 0|acc_norm |↑ |0.7908|± |0.0041|
|lambada_openai | 1|none | 0|acc |↑ |0.7677|± |0.0059|
| | |none | 0|perplexity|↓ |3.0996|± |0.0571|
|mmlu | 2|none | |acc |↑ |0.6211|± |0.0038|
| - humanities | 2|none | |acc |↑ |0.5492|± |0.0066|
| - formal_logic | 1|none | 0|acc |↑ |0.3968|± |0.0438|
| - high_school_european_history | 1|none | 0|acc |↑ |0.7515|± |0.0337|
| - high_school_us_history | 1|none | 0|acc |↑ |0.8039|± |0.0279|
| - high_school_world_history | 1|none | 0|acc |↑ |0.8186|± |0.0251|
| - international_law | 1|none | 0|acc |↑ |0.7851|± |0.0375|
| - jurisprudence | 1|none | 0|acc |↑ |0.7500|± |0.0419|
| - logical_fallacies | 1|none | 0|acc |↑ |0.7239|± |0.0351|
| - moral_disputes | 1|none | 0|acc |↑ |0.6994|± |0.0247|
| - moral_scenarios | 1|none | 0|acc |↑ |0.2380|± |0.0142|
| - philosophy | 1|none | 0|acc |↑ |0.7267|± |0.0253|
| - prehistory | 1|none | 0|acc |↑ |0.7315|± |0.0247|
| - professional_law | 1|none | 0|acc |↑ |0.4570|± |0.0127|
| - world_religions | 1|none | 0|acc |↑ |0.8129|± |0.0299|
| - other | 2|none | |acc |↑ |0.7029|± |0.0078|
| - business_ethics | 1|none | 0|acc |↑ |0.6000|± |0.0492|
| - clinical_knowledge | 1|none | 0|acc |↑ |0.7358|± |0.0271|
| - college_medicine | 1|none | 0|acc |↑ |0.6243|± |0.0369|
| - global_facts | 1|none | 0|acc |↑ |0.2800|± |0.0451|
| - human_aging | 1|none | 0|acc |↑ |0.6592|± |0.0318|
| - management | 1|none | 0|acc |↑ |0.8544|± |0.0349|
| - marketing | 1|none | 0|acc |↑ |0.8632|± |0.0225|
| - medical_genetics | 1|none | 0|acc |↑ |0.8400|± |0.0368|
| - miscellaneous | 1|none | 0|acc |↑ |0.8148|± |0.0139|
| - nutrition | 1|none | 0|acc |↑ |0.7288|± |0.0255|
| - professional_accounting | 1|none | 0|acc |↑ |0.4504|± |0.0297|
| - professional_medicine | 1|none | 0|acc |↑ |0.7132|± |0.0275|
| - virology | 1|none | 0|acc |↑ |0.5422|± |0.0388|
| - social sciences | 2|none | |acc |↑ |0.7335|± |0.0078|
| - econometrics | 1|none | 0|acc |↑ |0.4211|± |0.0464|
| - high_school_geography | 1|none | 0|acc |↑ |0.7727|± |0.0299|
| - high_school_government_and_politics| 1|none | 0|acc |↑ |0.8549|±
|0.0254|
| - high_school_macroeconomics | 1|none | 0|acc |↑ |0.6282|± |0.0245|
| - high_school_microeconomics | 1|none | 0|acc |↑ |0.6807|± |0.0303|
| - high_school_psychology | 1|none | 0|acc |↑ |0.8220|± |0.0164|
| - human_sexuality | 1|none | 0|acc |↑ |0.7634|± |0.0373|
| - professional_psychology | 1|none | 0|acc |↑ |0.6928|± |0.0187|
| - public_relations | 1|none | 0|acc |↑ |0.6909|± |0.0443|
| - security_studies | 1|none | 0|acc |↑ |0.7429|± |0.0280|
| - sociology | 1|none | 0|acc |↑ |0.8308|± |0.0265|
| - us_foreign_policy | 1|none | 0|acc |↑ |0.8700|± |0.0338|
| - stem | 2|none | |acc |↑ |0.5379|± |0.0086|
| - abstract_algebra | 1|none | 0|acc |↑ |0.3000|± |0.0461|
| - anatomy | 1|none | 0|acc |↑ |0.6963|± |0.0397|
| - astronomy | 1|none | 0|acc |↑ |0.6974|± |0.0374|
| - college_biology | 1|none | 0|acc |↑ |0.7639|± |0.0355|
| - college_chemistry | 1|none | 0|acc |↑ |0.4400|± |0.0499|
| - college_computer_science | 1|none | 0|acc |↑ |0.4900|± |0.0502|
| - college_mathematics | 1|none | 0|acc |↑ |0.3900|± |0.0490|
| - college_physics | 1|none | 0|acc |↑ |0.3922|± |0.0486|
| - computer_security | 1|none | 0|acc |↑ |0.7800|± |0.0416|
| - conceptual_physics | 1|none | 0|acc |↑ |0.5447|± |0.0326|
| - electrical_engineering | 1|none | 0|acc |↑ |0.6207|± |0.0404|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.4286|± |0.0255|
| - high_school_biology | 1|none | 0|acc |↑ |0.7323|± |0.0252|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.5123|± |0.0352|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.6500|± |0.0479|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.4000|± |0.0299|
| - high_school_physics | 1|none | 0|acc |↑ |0.3775|± |0.0396|
| - high_school_statistics | 1|none | 0|acc |↑ |0.5324|± |0.0340|
| - machine_learning | 1|none | 0|acc |↑ |0.4464|± |0.0472|
|openbookqa | 1|none | 0|acc |↑ |0.3480|± |0.0213|
| | |none | 0|acc_norm |↑ |0.4440|± |0.0222|
|piqa | 1|none | 0|acc |↑ |0.7943|± |0.0094|
| | |none | 0|acc_norm |↑ |0.8107|± |0.0091|
|truthfulqa_mc1 | 2|none | 0|acc |↑ |0.2717|± |0.0156|
|winogrande | 1|none | 0|acc |↑ |0.7301|± |0.0125|

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.6211|±  |0.0038|
| - humanities     |      2|none  |      |acc   |↑  |0.5492|±  |0.0066|
| - other          |      2|none  |      |acc   |↑  |0.7029|±  |0.0078|
| - social sciences|      2|none  |      |acc   |↑  |0.7335|±  |0.0078|
| - stem           |      2|none  |      |acc   |↑  |0.5379|±  |0.0086|

---------

Signed-off-by: huanxing <huanxing.shen@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
MohitIntel pushed a commit that referenced this pull request Jul 24, 2025
vllm-fork crash when logporbs and prompt_logprobs are both request. This
PR is to fix this issue. https://jira.habana-labs.com/browse/SW-231158

I have verified the fix with all kinds of lm_eval tasks, which request
prompt_logprobs and logprobs or not.

VLLM_SKIP_WARMUP=true vllm serve meta-llama/Meta-Llama-3-8B

no_proxy=0.0.0.0 HF_ALLOW_CODE_EVAL=1 lm_eval --model local-completions
--tasks
lambada_openai,hellaswag,winogrande,piqa,mmlu,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge
--model_args
model=facebook/opt-125m,base_url=http://0.0.0.0:8000/v1/completions,trust_remote_code=True,max_gen_toks=1024
--confirm_run_unsafe_code


| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|

|---------------------------------------|------:|------|-----:|----------|---|-----:|---|-----:|
|arc_challenge | 1|none | 0|acc |↑ |0.5000|± |0.0146|
| | |none | 0|acc_norm |↑ |0.5358|± |0.0146|
|arc_easy | 1|none | 0|acc |↑ |0.8005|± |0.0082|
| | |none | 0|acc_norm |↑ |0.7765|± |0.0085|
|boolq | 2|none | 0|acc |↑ |0.8107|± |0.0069|
|hellaswag | 1|none | 0|acc |↑ |0.6013|± |0.0049|
| | |none | 0|acc_norm |↑ |0.7908|± |0.0041|
|lambada_openai | 1|none | 0|acc |↑ |0.7677|± |0.0059|
| | |none | 0|perplexity|↓ |3.0996|± |0.0571|
|mmlu | 2|none | |acc |↑ |0.6211|± |0.0038|
| - humanities | 2|none | |acc |↑ |0.5492|± |0.0066|
| - formal_logic | 1|none | 0|acc |↑ |0.3968|± |0.0438|
| - high_school_european_history | 1|none | 0|acc |↑ |0.7515|± |0.0337|
| - high_school_us_history | 1|none | 0|acc |↑ |0.8039|± |0.0279|
| - high_school_world_history | 1|none | 0|acc |↑ |0.8186|± |0.0251|
| - international_law | 1|none | 0|acc |↑ |0.7851|± |0.0375|
| - jurisprudence | 1|none | 0|acc |↑ |0.7500|± |0.0419|
| - logical_fallacies | 1|none | 0|acc |↑ |0.7239|± |0.0351|
| - moral_disputes | 1|none | 0|acc |↑ |0.6994|± |0.0247|
| - moral_scenarios | 1|none | 0|acc |↑ |0.2380|± |0.0142|
| - philosophy | 1|none | 0|acc |↑ |0.7267|± |0.0253|
| - prehistory | 1|none | 0|acc |↑ |0.7315|± |0.0247|
| - professional_law | 1|none | 0|acc |↑ |0.4570|± |0.0127|
| - world_religions | 1|none | 0|acc |↑ |0.8129|± |0.0299|
| - other | 2|none | |acc |↑ |0.7029|± |0.0078|
| - business_ethics | 1|none | 0|acc |↑ |0.6000|± |0.0492|
| - clinical_knowledge | 1|none | 0|acc |↑ |0.7358|± |0.0271|
| - college_medicine | 1|none | 0|acc |↑ |0.6243|± |0.0369|
| - global_facts | 1|none | 0|acc |↑ |0.2800|± |0.0451|
| - human_aging | 1|none | 0|acc |↑ |0.6592|± |0.0318|
| - management | 1|none | 0|acc |↑ |0.8544|± |0.0349|
| - marketing | 1|none | 0|acc |↑ |0.8632|± |0.0225|
| - medical_genetics | 1|none | 0|acc |↑ |0.8400|± |0.0368|
| - miscellaneous | 1|none | 0|acc |↑ |0.8148|± |0.0139|
| - nutrition | 1|none | 0|acc |↑ |0.7288|± |0.0255|
| - professional_accounting | 1|none | 0|acc |↑ |0.4504|± |0.0297|
| - professional_medicine | 1|none | 0|acc |↑ |0.7132|± |0.0275|
| - virology | 1|none | 0|acc |↑ |0.5422|± |0.0388|
| - social sciences | 2|none | |acc |↑ |0.7335|± |0.0078|
| - econometrics | 1|none | 0|acc |↑ |0.4211|± |0.0464|
| - high_school_geography | 1|none | 0|acc |↑ |0.7727|± |0.0299|
| - high_school_government_and_politics| 1|none | 0|acc |↑ |0.8549|±
|0.0254|
| - high_school_macroeconomics | 1|none | 0|acc |↑ |0.6282|± |0.0245|
| - high_school_microeconomics | 1|none | 0|acc |↑ |0.6807|± |0.0303|
| - high_school_psychology | 1|none | 0|acc |↑ |0.8220|± |0.0164|
| - human_sexuality | 1|none | 0|acc |↑ |0.7634|± |0.0373|
| - professional_psychology | 1|none | 0|acc |↑ |0.6928|± |0.0187|
| - public_relations | 1|none | 0|acc |↑ |0.6909|± |0.0443|
| - security_studies | 1|none | 0|acc |↑ |0.7429|± |0.0280|
| - sociology | 1|none | 0|acc |↑ |0.8308|± |0.0265|
| - us_foreign_policy | 1|none | 0|acc |↑ |0.8700|± |0.0338|
| - stem | 2|none | |acc |↑ |0.5379|± |0.0086|
| - abstract_algebra | 1|none | 0|acc |↑ |0.3000|± |0.0461|
| - anatomy | 1|none | 0|acc |↑ |0.6963|± |0.0397|
| - astronomy | 1|none | 0|acc |↑ |0.6974|± |0.0374|
| - college_biology | 1|none | 0|acc |↑ |0.7639|± |0.0355|
| - college_chemistry | 1|none | 0|acc |↑ |0.4400|± |0.0499|
| - college_computer_science | 1|none | 0|acc |↑ |0.4900|± |0.0502|
| - college_mathematics | 1|none | 0|acc |↑ |0.3900|± |0.0490|
| - college_physics | 1|none | 0|acc |↑ |0.3922|± |0.0486|
| - computer_security | 1|none | 0|acc |↑ |0.7800|± |0.0416|
| - conceptual_physics | 1|none | 0|acc |↑ |0.5447|± |0.0326|
| - electrical_engineering | 1|none | 0|acc |↑ |0.6207|± |0.0404|
| - elementary_mathematics | 1|none | 0|acc |↑ |0.4286|± |0.0255|
| - high_school_biology | 1|none | 0|acc |↑ |0.7323|± |0.0252|
| - high_school_chemistry | 1|none | 0|acc |↑ |0.5123|± |0.0352|
| - high_school_computer_science | 1|none | 0|acc |↑ |0.6500|± |0.0479|
| - high_school_mathematics | 1|none | 0|acc |↑ |0.4000|± |0.0299|
| - high_school_physics | 1|none | 0|acc |↑ |0.3775|± |0.0396|
| - high_school_statistics | 1|none | 0|acc |↑ |0.5324|± |0.0340|
| - machine_learning | 1|none | 0|acc |↑ |0.4464|± |0.0472|
|openbookqa | 1|none | 0|acc |↑ |0.3480|± |0.0213|
| | |none | 0|acc_norm |↑ |0.4440|± |0.0222|
|piqa | 1|none | 0|acc |↑ |0.7943|± |0.0094|
| | |none | 0|acc_norm |↑ |0.8107|± |0.0091|
|truthfulqa_mc1 | 2|none | 0|acc |↑ |0.2717|± |0.0156|
|winogrande | 1|none | 0|acc |↑ |0.7301|± |0.0125|

|      Groups      |Version|Filter|n-shot|Metric|   |Value |   |Stderr|
|------------------|------:|------|------|------|---|-----:|---|-----:|
|mmlu              |      2|none  |      |acc   |↑  |0.6211|±  |0.0038|
| - humanities     |      2|none  |      |acc   |↑  |0.5492|±  |0.0066|
| - other          |      2|none  |      |acc   |↑  |0.7029|±  |0.0078|
| - social sciences|      2|none  |      |acc   |↑  |0.7335|±  |0.0078|
| - stem           |      2|none  |      |acc   |↑  |0.5379|±  |0.0086|

---------

Signed-off-by: huanxing <huanxing.shen@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants