[bugfix] handle large bucket minimums correctly by kzawora-intel · Pull Request #235 · HabanaAI/vllm-fork

kzawora-intel · 2024-09-03T16:49:54Z

This bugfix addresses incorrect lower boundary handling for bucketing

Previous behavior:

INFO 09-03 19:36:28 habana_model_runner.py:564] Prompt bucket config (min, step, max_warmup) bs:[64, 32, 64], seq:[768, 128, 768]
INFO 09-03 19:36:28 habana_model_runner.py:577] Generated 12 prompt buckets: [(32, 128), (32, 256), (32, 384), (32, 512), (32, 640), (32, 768), (64, 128), (64, 256), (64, 384), (64, 512), (64, 640), (64, 768)]
INFO 09-03 19:36:28 habana_model_runner.py:582] Omitted 0 prompt buckets due to exceeded token budget (max_num_batched_tokens=131072)
INFO 09-03 19:36:28 habana_model_runner.py:590] Decode bucket config (min, step, max_warmup) bs:[64, 128, 64], seq:[768, 128, 1024]
INFO 09-03 19:36:28 habana_model_runner.py:601] Generated 8 decode buckets: [(64, 128), (64, 256), (64, 384), (64, 512), (64, 640), (64, 768), (64, 896), (64, 1024)]
INFO 09-03 19:36:28 habana_model_runner.py:606] Omitted 0 decode buckets due to exceeded token budget (max_num_batched_tokens=131072)

Min seq len dimension is set to 768, but buckets with seq_len=128-768 are present

Current behavior:

INFO 09-03 19:45:42 habana_model_runner.py:563] Prompt bucket config (min, step, max_warmup) bs:[64, 32, 64], seq:[768, 128, 768]
INFO 09-03 19:45:42 habana_model_runner.py:576] Generated 1 prompt buckets: [(64, 768)]
INFO 09-03 19:45:42 habana_model_runner.py:581] Omitted 0 prompt buckets due to exceeded token budget (max_num_batched_tokens=131072)
INFO 09-03 19:45:42 habana_model_runner.py:589] Decode bucket config (min, step, max_warmup) bs:[64, 128, 64], seq:[768, 128, 1024]
INFO 09-03 19:45:42 habana_model_runner.py:600] Generated 3 decode buckets: [(64, 768), (64, 896), (64, 1024)]
INFO 09-03 19:45:42 habana_model_runner.py:605] Omitted 0 decode buckets due to exceeded token budget (max_num_batched_tokens=131072)

No bucket with seq_len < 768 is captured

…a/bucket_min

This reverts commit a40794b.

madamczyk-intel

LGTM!

kzawora-intel added 5 commits September 2, 2024 17:35

Do not pass warmup_mode to execute_model_kwargs

a40794b

bugfix: correctly handle large bucket minimums

637d2df

Merge remote-tracking branch 'origin/habana_main' into private/kzawor…

9341617

…a/bucket_min

Revert "Do not pass warmup_mode to execute_model_kwargs"

9933ec5

This reverts commit a40794b.

formatting

e32d203

kzawora-intel requested a review from madamczyk-intel September 3, 2024 16:52

madamczyk-intel approved these changes Sep 4, 2024

View reviewed changes

kzawora-intel merged commit a4e1d52 into habana_main Sep 4, 2024

kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 5, 2024

kzawora-intel deleted the private/kzawora/bucket_min branch October 7, 2024 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] handle large bucket minimums correctly#235

[bugfix] handle large bucket minimums correctly#235
kzawora-intel merged 5 commits intohabana_mainfrom
private/kzawora/bucket_min

kzawora-intel commented Sep 3, 2024

Uh oh!

madamczyk-intel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kzawora-intel commented Sep 3, 2024

Uh oh!

madamczyk-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants