[Bug] Language model still thinks when "thinking" is disabled in TextGenerate

### Custom Node Testing

- [x] I have tried disabling custom nodes and the issue persists (see [how to disable custom nodes](https://docs.comfy.org/troubleshooting/custom-node-issues#step-1%3A-test-with-all-custom-nodes-disabled) if you need help)

### Expected Behavior

If I disable "thinking" in the `TextGenerate` node, I expect the language models not to think before answering.

### Actual Behavior

The language model still thinks even when `thinking` is disabled.

### Steps to Reproduce

- Import this workflow:

```json
{"id":"5af8db4d-44eb-49f0-b205-76b0aa497cff","revision":0,"last_node_id":3,"last_link_id":2,"nodes":[{"id":1,"type":"TextGenerate","pos":[743.3188073596824,-1303.468489826901],"size":[400,372],"flags":{},"order":1,"mode":0,"inputs":[{"name":"clip","type":"CLIP","link":1},{"name":"image","shape":7,"type":"IMAGE","link":null}],"outputs":[{"name":"generated_text","type":"STRING","links":[2]}],"properties":{"Node name for S&R":"TextGenerate"},"widgets_values":["what's your name?",256,"on",0.7,64,0.95,0.05,1.05,0,0,false,true]},{"id":2,"type":"CLIPLoader","pos":[320,-1300],"size":[360,120],"flags":{},"order":0,"mode":0,"inputs":[],"outputs":[{"name":"CLIP","type":"CLIP","links":[1]}],"properties":{"Node name for S&R":"CLIPLoader"},"widgets_values":["qwen_3_4b_fp8_mixed.safetensors","stable_diffusion","default"]},{"id":3,"type":"PreviewAny","pos":[1220,-1300],"size":[540,600],"flags":{},"order":2,"mode":0,"inputs":[{"name":"source","type":"*","link":2}],"outputs":[{"name":"STRING","type":"STRING","links":null}],"properties":{"Node name for S&R":"PreviewAny"},"widgets_values":[null,null,false]}],"links":[[1,2,0,1,0,"CLIP"],[2,1,0,3,0,"STRING"]],"groups":[],"config":{},"extra":{"ds":{"scale":0.9849732675807851,"offset":[-153.98631927138626,1543.676768024058]},"frontendVersion":"1.42.11"},"version":0.4}
```

- In the `CLIPLoader` node, select a language model that supports thinking mode e..g: [Qwen 3 4B](https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/text_encoders/qwen_3_4b_fp8_mixed.safetensors)
- Run the workflow
- 
Notice how the output still shows the model thinking:

```
<think>
Okay, the user is asking for my name. I need to respond appropriately.

First, I should confirm that my name is Qwen. That's correct.

I should mention that I am developed by Alibaba Cloud and part of the Qwen series. That gives context about my origin.

Also, I should highlight my capabilities, like answering questions, creating content, and helping with tasks. This shows the user what I can do.

I should keep the response friendly and welcoming. Maybe add an emoji to make it more approachable.

Make sure the answer is clear and concise. Avoid technical jargon so it's easy to understand.

Check for any typos or errors. Ensure the information is accurate.

Alright, time to put it all together in a natural way.
</think>

Hello! My name is Qwen. I am developed by Alibaba Cloud and belong to the Qwen series. I can help you with answering questions, creating content, and assisting with various tasks. How can I help you today? 😊
```

<img width="1678" height="703" alt="Image" src="https://github.com/user-attachments/assets/9712f61a-7c0e-4b6d-88e9-59c488ce18ac" />

---

Adding `/no_think` to the user prompt helps, but it still outputs the `<think></think>` tags, so my workaround is to add a `RegexReplace` to remove the stubborn prefix:

<img width="1421" height="628" alt="Image" src="https://github.com/user-attachments/assets/bc5ccd07-f040-439c-b552-6e21ec079f07" />

### Debug Logs

```powershell
got prompt
Found quantization metadata version 1
Using MixedPrecisionOps for text encoder
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load ZImageTEModel_
loaded completely; 5688.80 MB usable, 4207.26 MB loaded, full load: True
Generating tokens:  80%|████████  | 205/256 [00:29<00:07,  6.95it/s]
Prompt executed in 35.07 seconds
```

### Other

```
ComfyUI: v0.20.1
ComfyUI_frontend: v1.42.15
OS: win32
Python Version: 3.12.9 (main, Feb 12 2025, 14:52:31) [MSC v.1942 64 bit (AMD64)]
Embedded Python: false
Pytorch Version: 2.6.0+cu126
Device Name: cuda:0 NVIDIA GeForce RTX 3080 Laptop GPU : cudaMallocAsync
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Language model still thinks when "thinking" is disabled in TextGenerate #13641

Custom Node Testing

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Language model still thinks when "thinking" is disabled in TextGenerate #13641

Description

Custom Node Testing

Expected Behavior

Actual Behavior

Steps to Reproduce

Debug Logs

Other

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions