fix prompt_logprob crash when delayed sampling is on by ccrhx4 · Pull Request #1421 · HabanaAI/vllm-fork

ccrhx4 · 2025-06-13T07:27:38Z

vllm-fork crash when logporbs and prompt_logprobs are both request. This PR is to fix this issue. https://jira.habana-labs.com/browse/SW-231158

I have verified the fix with all kinds of lm_eval tasks, which request prompt_logprobs and logprobs or not.

VLLM_SKIP_WARMUP=true vllm serve meta-llama/Meta-Llama-3-8B

no_proxy=0.0.0.0 HF_ALLOW_CODE_EVAL=1 lm_eval --model local-completions --tasks lambada_openai,hellaswag,winogrande,piqa,mmlu,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge --model_args model=facebook/opt-125m,base_url=http://0.0.0.0:8000/v1/completions,trust_remote_code=True,max_gen_toks=1024 --confirm_run_unsafe_code

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_challenge	1	none	0	acc	↑	0.5000	±	0.0146
		none	0	acc_norm	↑	0.5358	±	0.0146
arc_easy	1	none	0	acc	↑	0.8005	±	0.0082
		none	0	acc_norm	↑	0.7765	±	0.0085
boolq	2	none	0	acc	↑	0.8107	±	0.0069
hellaswag	1	none	0	acc	↑	0.6013	±	0.0049
		none	0	acc_norm	↑	0.7908	±	0.0041
lambada_openai	1	none	0	acc	↑	0.7677	±	0.0059
		none	0	perplexity	↓	3.0996	±	0.0571
mmlu	2	none		acc	↑	0.6211	±	0.0038
- humanities	2	none		acc	↑	0.5492	±	0.0066
- formal_logic	1	none	0	acc	↑	0.3968	±	0.0438
- high_school_european_history	1	none	0	acc	↑	0.7515	±	0.0337
- high_school_us_history	1	none	0	acc	↑	0.8039	±	0.0279
- high_school_world_history	1	none	0	acc	↑	0.8186	±	0.0251
- international_law	1	none	0	acc	↑	0.7851	±	0.0375
- jurisprudence	1	none	0	acc	↑	0.7500	±	0.0419
- logical_fallacies	1	none	0	acc	↑	0.7239	±	0.0351
- moral_disputes	1	none	0	acc	↑	0.6994	±	0.0247
- moral_scenarios	1	none	0	acc	↑	0.2380	±	0.0142
- philosophy	1	none	0	acc	↑	0.7267	±	0.0253
- prehistory	1	none	0	acc	↑	0.7315	±	0.0247
- professional_law	1	none	0	acc	↑	0.4570	±	0.0127
- world_religions	1	none	0	acc	↑	0.8129	±	0.0299
- other	2	none		acc	↑	0.7029	±	0.0078
- business_ethics	1	none	0	acc	↑	0.6000	±	0.0492
- clinical_knowledge	1	none	0	acc	↑	0.7358	±	0.0271
- college_medicine	1	none	0	acc	↑	0.6243	±	0.0369
- global_facts	1	none	0	acc	↑	0.2800	±	0.0451
- human_aging	1	none	0	acc	↑	0.6592	±	0.0318
- management	1	none	0	acc	↑	0.8544	±	0.0349
- marketing	1	none	0	acc	↑	0.8632	±	0.0225
- medical_genetics	1	none	0	acc	↑	0.8400	±	0.0368
- miscellaneous	1	none	0	acc	↑	0.8148	±	0.0139
- nutrition	1	none	0	acc	↑	0.7288	±	0.0255
- professional_accounting	1	none	0	acc	↑	0.4504	±	0.0297
- professional_medicine	1	none	0	acc	↑	0.7132	±	0.0275
- virology	1	none	0	acc	↑	0.5422	±	0.0388
- social sciences	2	none		acc	↑	0.7335	±	0.0078
- econometrics	1	none	0	acc	↑	0.4211	±	0.0464
- high_school_geography	1	none	0	acc	↑	0.7727	±	0.0299
- high_school_government_and_politics	1	none	0	acc	↑	0.8549	±	0.0254
- high_school_macroeconomics	1	none	0	acc	↑	0.6282	±	0.0245
- high_school_microeconomics	1	none	0	acc	↑	0.6807	±	0.0303
- high_school_psychology	1	none	0	acc	↑	0.8220	±	0.0164
- human_sexuality	1	none	0	acc	↑	0.7634	±	0.0373
- professional_psychology	1	none	0	acc	↑	0.6928	±	0.0187
- public_relations	1	none	0	acc	↑	0.6909	±	0.0443
- security_studies	1	none	0	acc	↑	0.7429	±	0.0280
- sociology	1	none	0	acc	↑	0.8308	±	0.0265
- us_foreign_policy	1	none	0	acc	↑	0.8700	±	0.0338
- stem	2	none		acc	↑	0.5379	±	0.0086
- abstract_algebra	1	none	0	acc	↑	0.3000	±	0.0461
- anatomy	1	none	0	acc	↑	0.6963	±	0.0397
- astronomy	1	none	0	acc	↑	0.6974	±	0.0374
- college_biology	1	none	0	acc	↑	0.7639	±	0.0355
- college_chemistry	1	none	0	acc	↑	0.4400	±	0.0499
- college_computer_science	1	none	0	acc	↑	0.4900	±	0.0502
- college_mathematics	1	none	0	acc	↑	0.3900	±	0.0490
- college_physics	1	none	0	acc	↑	0.3922	±	0.0486
- computer_security	1	none	0	acc	↑	0.7800	±	0.0416
- conceptual_physics	1	none	0	acc	↑	0.5447	±	0.0326
- electrical_engineering	1	none	0	acc	↑	0.6207	±	0.0404
- elementary_mathematics	1	none	0	acc	↑	0.4286	±	0.0255
- high_school_biology	1	none	0	acc	↑	0.7323	±	0.0252
- high_school_chemistry	1	none	0	acc	↑	0.5123	±	0.0352
- high_school_computer_science	1	none	0	acc	↑	0.6500	±	0.0479
- high_school_mathematics	1	none	0	acc	↑	0.4000	±	0.0299
- high_school_physics	1	none	0	acc	↑	0.3775	±	0.0396
- high_school_statistics	1	none	0	acc	↑	0.5324	±	0.0340
- machine_learning	1	none	0	acc	↑	0.4464	±	0.0472
openbookqa	1	none	0	acc	↑	0.3480	±	0.0213
		none	0	acc_norm	↑	0.4440	±	0.0222
piqa	1	none	0	acc	↑	0.7943	±	0.0094
		none	0	acc_norm	↑	0.8107	±	0.0091
truthfulqa_mc1	2	none	0	acc	↑	0.2717	±	0.0156
winogrande	1	none	0	acc	↑	0.7301	±	0.0125

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.6211	±	0.0038
- humanities	2	none	acc	↑	0.5492	±	0.0066
- other	2	none	acc	↑	0.7029	±	0.0078
- social sciences	2	none	acc	↑	0.7335	±	0.0078
- stem	2	none	acc	↑	0.5379	±	0.0086

Signed-off-by: huanxing <huanxing.shen@intel.com>

michalkuligowski · 2025-06-16T12:22:56Z

/run-gaudi-tests

ccrhx4 · 2025-06-27T02:24:14Z

@michalkuligowski @kzawora-intel @madamczyk-intel Please review the PR and provide feedback. Without this fix, lm-eval will crash for some tasks, it breaks a lot of things.

vllm-fork crash when logporbs and prompt_logprobs are both request. This PR is to fix this issue. https://jira.habana-labs.com/browse/SW-231158 I have verified the fix with all kinds of lm_eval tasks, which request prompt_logprobs and logprobs or not. VLLM_SKIP_WARMUP=true vllm serve meta-llama/Meta-Llama-3-8B no_proxy=0.0.0.0 HF_ALLOW_CODE_EVAL=1 lm_eval --model local-completions --tasks lambada_openai,hellaswag,winogrande,piqa,mmlu,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge --model_args model=facebook/opt-125m,base_url=http://0.0.0.0:8000/v1/completions,trust_remote_code=True,max_gen_toks=1024 --confirm_run_unsafe_code | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------------------------------------|------:|------|-----:|----------|---|-----:|---|-----:| |arc_challenge | 1|none | 0|acc |↑ |0.5000|± |0.0146| | | |none | 0|acc_norm |↑ |0.5358|± |0.0146| |arc_easy | 1|none | 0|acc |↑ |0.8005|± |0.0082| | | |none | 0|acc_norm |↑ |0.7765|± |0.0085| |boolq | 2|none | 0|acc |↑ |0.8107|± |0.0069| |hellaswag | 1|none | 0|acc |↑ |0.6013|± |0.0049| | | |none | 0|acc_norm |↑ |0.7908|± |0.0041| |lambada_openai | 1|none | 0|acc |↑ |0.7677|± |0.0059| | | |none | 0|perplexity|↓ |3.0996|± |0.0571| |mmlu | 2|none | |acc |↑ |0.6211|± |0.0038| | - humanities | 2|none | |acc |↑ |0.5492|± |0.0066| | - formal_logic | 1|none | 0|acc |↑ |0.3968|± |0.0438| | - high_school_european_history | 1|none | 0|acc |↑ |0.7515|± |0.0337| | - high_school_us_history | 1|none | 0|acc |↑ |0.8039|± |0.0279| | - high_school_world_history | 1|none | 0|acc |↑ |0.8186|± |0.0251| | - international_law | 1|none | 0|acc |↑ |0.7851|± |0.0375| | - jurisprudence | 1|none | 0|acc |↑ |0.7500|± |0.0419| | - logical_fallacies | 1|none | 0|acc |↑ |0.7239|± |0.0351| | - moral_disputes | 1|none | 0|acc |↑ |0.6994|± |0.0247| | - moral_scenarios | 1|none | 0|acc |↑ |0.2380|± |0.0142| | - philosophy | 1|none | 0|acc |↑ |0.7267|± |0.0253| | - prehistory | 1|none | 0|acc |↑ |0.7315|± |0.0247| | - professional_law | 1|none | 0|acc |↑ |0.4570|± |0.0127| | - world_religions | 1|none | 0|acc |↑ |0.8129|± |0.0299| | - other | 2|none | |acc |↑ |0.7029|± |0.0078| | - business_ethics | 1|none | 0|acc |↑ |0.6000|± |0.0492| | - clinical_knowledge | 1|none | 0|acc |↑ |0.7358|± |0.0271| | - college_medicine | 1|none | 0|acc |↑ |0.6243|± |0.0369| | - global_facts | 1|none | 0|acc |↑ |0.2800|± |0.0451| | - human_aging | 1|none | 0|acc |↑ |0.6592|± |0.0318| | - management | 1|none | 0|acc |↑ |0.8544|± |0.0349| | - marketing | 1|none | 0|acc |↑ |0.8632|± |0.0225| | - medical_genetics | 1|none | 0|acc |↑ |0.8400|± |0.0368| | - miscellaneous | 1|none | 0|acc |↑ |0.8148|± |0.0139| | - nutrition | 1|none | 0|acc |↑ |0.7288|± |0.0255| | - professional_accounting | 1|none | 0|acc |↑ |0.4504|± |0.0297| | - professional_medicine | 1|none | 0|acc |↑ |0.7132|± |0.0275| | - virology | 1|none | 0|acc |↑ |0.5422|± |0.0388| | - social sciences | 2|none | |acc |↑ |0.7335|± |0.0078| | - econometrics | 1|none | 0|acc |↑ |0.4211|± |0.0464| | - high_school_geography | 1|none | 0|acc |↑ |0.7727|± |0.0299| | - high_school_government_and_politics| 1|none | 0|acc |↑ |0.8549|± |0.0254| | - high_school_macroeconomics | 1|none | 0|acc |↑ |0.6282|± |0.0245| | - high_school_microeconomics | 1|none | 0|acc |↑ |0.6807|± |0.0303| | - high_school_psychology | 1|none | 0|acc |↑ |0.8220|± |0.0164| | - human_sexuality | 1|none | 0|acc |↑ |0.7634|± |0.0373| | - professional_psychology | 1|none | 0|acc |↑ |0.6928|± |0.0187| | - public_relations | 1|none | 0|acc |↑ |0.6909|± |0.0443| | - security_studies | 1|none | 0|acc |↑ |0.7429|± |0.0280| | - sociology | 1|none | 0|acc |↑ |0.8308|± |0.0265| | - us_foreign_policy | 1|none | 0|acc |↑ |0.8700|± |0.0338| | - stem | 2|none | |acc |↑ |0.5379|± |0.0086| | - abstract_algebra | 1|none | 0|acc |↑ |0.3000|± |0.0461| | - anatomy | 1|none | 0|acc |↑ |0.6963|± |0.0397| | - astronomy | 1|none | 0|acc |↑ |0.6974|± |0.0374| | - college_biology | 1|none | 0|acc |↑ |0.7639|± |0.0355| | - college_chemistry | 1|none | 0|acc |↑ |0.4400|± |0.0499| | - college_computer_science | 1|none | 0|acc |↑ |0.4900|± |0.0502| | - college_mathematics | 1|none | 0|acc |↑ |0.3900|± |0.0490| | - college_physics | 1|none | 0|acc |↑ |0.3922|± |0.0486| | - computer_security | 1|none | 0|acc |↑ |0.7800|± |0.0416| | - conceptual_physics | 1|none | 0|acc |↑ |0.5447|± |0.0326| | - electrical_engineering | 1|none | 0|acc |↑ |0.6207|± |0.0404| | - elementary_mathematics | 1|none | 0|acc |↑ |0.4286|± |0.0255| | - high_school_biology | 1|none | 0|acc |↑ |0.7323|± |0.0252| | - high_school_chemistry | 1|none | 0|acc |↑ |0.5123|± |0.0352| | - high_school_computer_science | 1|none | 0|acc |↑ |0.6500|± |0.0479| | - high_school_mathematics | 1|none | 0|acc |↑ |0.4000|± |0.0299| | - high_school_physics | 1|none | 0|acc |↑ |0.3775|± |0.0396| | - high_school_statistics | 1|none | 0|acc |↑ |0.5324|± |0.0340| | - machine_learning | 1|none | 0|acc |↑ |0.4464|± |0.0472| |openbookqa | 1|none | 0|acc |↑ |0.3480|± |0.0213| | | |none | 0|acc_norm |↑ |0.4440|± |0.0222| |piqa | 1|none | 0|acc |↑ |0.7943|± |0.0094| | | |none | 0|acc_norm |↑ |0.8107|± |0.0091| |truthfulqa_mc1 | 2|none | 0|acc |↑ |0.2717|± |0.0156| |winogrande | 1|none | 0|acc |↑ |0.7301|± |0.0125| | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.6211|± |0.0038| | - humanities | 2|none | |acc |↑ |0.5492|± |0.0066| | - other | 2|none | |acc |↑ |0.7029|± |0.0078| | - social sciences| 2|none | |acc |↑ |0.7335|± |0.0078| | - stem | 2|none | |acc |↑ |0.5379|± |0.0086| --------- Signed-off-by: huanxing <huanxing.shen@intel.com> Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

fix prompt_logprob crash

9038e9d

Signed-off-by: huanxing <huanxing.shen@intel.com>

ccrhx4 requested review from afierka-intel, jikunshang, kzawora-intel, madamczyk-intel, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners June 13, 2025 07:27

fix mypy error

4b91e3e

Signed-off-by: huanxing <huanxing.shen@intel.com>

michalkuligowski approved these changes Jun 16, 2025

View reviewed changes

Merge branch 'habana_main' into fix_prompt_logprobs_crash

2b9d911

michalkuligowski merged commit c61532e into HabanaAI:habana_main Jul 11, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix prompt_logprob crash when delayed sampling is on#1421

fix prompt_logprob crash when delayed sampling is on#1421
michalkuligowski merged 3 commits intoHabanaAI:habana_mainfrom
ccrhx4:fix_prompt_logprobs_crash

ccrhx4 commented Jun 13, 2025 •

edited by github-actions Bot

Loading

Uh oh!

michalkuligowski commented Jun 16, 2025

Uh oh!

ccrhx4 commented Jun 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ccrhx4 commented Jun 13, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michalkuligowski commented Jun 16, 2025

Uh oh!

ccrhx4 commented Jun 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ccrhx4 commented Jun 13, 2025 •

edited by github-actions Bot

Loading