feat: add thinking_budget (version 2)#6208
feat: add thinking_budget (version 2)#6208thyecust wants to merge 15 commits intosgl-project:mainfrom
Conversation
|
@zhyncs May you provide more information about #6181 (comment) ? I cannot reproduce that. |
There are only two references of |
|
@minleminzui Could you help review this? thx |
@CatherineSue, @ispobock, @sleepcoo please help to review this pr, please ensure it doesn't affect your internal services. |
CatherineSue
left a comment
There was a problem hiding this comment.
It seems we allow user to pass this parameter with a non-thinking model? I think it is better to isolate this from non-thinking model to prevent potential effects in the future
in if any(hasattr(r.tokenizer, "think_end_id") for r in reqs):
think_end_ids = torch.tensor(
[getattr(r.tokenizer, "think_end_id", -1) for r in reqs],
dtype=torch.int64,
).to(device, non_blocking=True)
num_thinking_tokens = torch.tensor([0 for _ in reqs], dtype=torch.int64).to(
device, non_blocking=TrueFor a non-thinking model, r.tokenizer has no think_end_id attribute, so nothing will happen. Better isolation introduces more code. Is this silent ignorance acceptable? @CatherineSue |
| [r.sampling_params.min_p for r in reqs], dtype=torch.float | ||
| ).to(device, non_blocking=True) | ||
|
|
||
| if any(hasattr(r.tokenizer, "think_end_id") for r in reqs): |
There was a problem hiding this comment.
For non-reasoning model, we can skip this check? Do we need to identify if the model is a reasoning model from the architect?
There was a problem hiding this comment.
I am thinking about it. Now I don't know how to decide whether a model is reasoning from a request.
There was a problem hiding this comment.
As far as I can think of, there seems to be no better way. The information about can be obtained here is very limited. It is just doing sampling and does not know the specific model architecture.
| "temperature": request.temperature, | ||
| "max_new_tokens": request.max_tokens or request.max_completion_tokens, | ||
| "min_new_tokens": request.min_tokens, | ||
| "thinking_budget": request.thinking_budget, |
There was a problem hiding this comment.
It seems only valid when enable_thinking = True. Need to do the validation.
| top_k: int = -1 | ||
| min_p: float = 0.0 | ||
| min_tokens: int = 0 | ||
| thinking_budget: Optional[int] = None |
There was a problem hiding this comment.
Will it be used in non-chat scenario? Is there any usage example/reference for that?
There was a problem hiding this comment.
I am not familiar with CompletionRequest. Removed thinking_budget in it 186b1aa
| def apply_thinking_budgets(self, next_token_logits: torch.Tensor): | ||
| if self.thinking_budgets is None: | ||
| return | ||
| has_budget = self.thinking_budgets > 0 |
There was a problem hiding this comment.
Doesn't support =0 scenario?
There was a problem hiding this comment.
Hi, now we support thinking_budget=0 in 30aa15f
| next_token_logits[batch_indices, end_token_indices] = 0.0 | ||
|
|
||
| def update_thinking_budgets(self, next_token_ids: torch.Tensor): | ||
| if self.thinking_budgets is None or not torch.any(self.thinking_budgets > 0): |
There was a problem hiding this comment.
Doesn't support =0 scenario?
There was a problem hiding this comment.
Hi, now we support thinking_budget=0 in 30aa15f
| device, non_blocking=True | ||
| ) | ||
| thinking_budgets = torch.tensor( | ||
| [r.sampling_params.thinking_budget or -1 for r in reqs], |
There was a problem hiding this comment.
When thinking_budget=0, this will be assigned a value of -1. It is recommended to modify this, otherwise it will not be fully supported.
Motivation
See #6089
Modifications
Removed incorrect diff in protocol.py. See commit
version 2.Checklist