Infinite tool call loop: `HuggingFaceModel` and `text-generation-inference`

## Description
Hello. Needless to say, amazing library.  Please let me know if you'd like me to try something or if you need more info.

I've been going through various local model providers trying to find one that works well, when I cam across a rather shocking bug when running against Huggingface's TGI model host.

The problem appears whether using the OpenAI "compatible" endpoints or the `HuggingfaceModel` with custom `AsyncInferenceClient` and `HuggingFaceProvider`. The latter probably being the official approach, the code included here will be using that.

## System Info
`curl 127.0.0.1:8080/info | jq`:
```json
{
  "model_id": "/models/meta-llama/Meta-Llama-3-8B-Instruct",
  "model_sha": null,
  "model_pipeline_tag": null,
  "max_concurrent_requests": 128,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_tokens": 8191,
  "max_total_tokens": 8192,
  "validation_workers": 2,
  "max_client_batch_size": 4,
  "router": "text-generation-router",
  "version": "3.3.4-dev0",
  "sha": "9f38d9305168f4b47c8c46b573f5b2c07881281d",
  "docker_label": "sha-9f38d93"
}
```

`nvidia-smi`:
```shell
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.05              Driver Version: 575.64.05      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:01:00.0  On |                  Off |
| 40%   54C    P2             61W /  450W |   21499MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:48:00.0 Off |                  Off |
| 30%   43C    P2             52W /  450W |   21394MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
```

### Information

- [x] Docker
- [ ] The CLI directly

### Tasks

- [x] An officially supported command
- [ ] My own modifications

## Reproduction

### Setup

Here's the `docker-compose.yaml` I'm using to start TGI:
```yaml
services:
  text-generation-inference:
    image: ghcr.io/huggingface/text-generation-inference:latest
    container_name: tgi
    ports:
      - "8081:80"
    volumes:
      - ../../../models:/models:ro
      - tgi-data:/data
    environment:
      - RUST_LOG=info
    # I have also tested with 3.1-8B and 3.2-3B with the same end results
    command: >
      --model-id /models/meta-llama/Meta-Llama-3-8B-Instruct
      --hostname 0.0.0.0
      --port 80
      --trust-remote-code
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0", "1"]
              capabilities: [gpu]
    shm_size: "64g"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:80/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

volumes:
  tgi-data:
    driver: local
```

### Code

All code is running in a Jupyter notebook.

Here's the common setup cell:
```python
from huggingface_hub import AsyncInferenceClient
from pydantic_ai.models.huggingface import HuggingFaceModel
from pydantic_ai.providers.huggingface import HuggingFaceProvider
from pydantic_ai.providers.openai import OpenAIProvider

provider = OpenAIProvider(base_url="http://localhost:8081/v1") # Just used to get the model slug
models = await provider.client.models.list()

client = AsyncInferenceClient(base_url="http://localhost:8081/")

print(f"Connected to TGI. Available models: {len(models.data)}")
for model in models.data:
    print(f"  - {model.id}")

# Create the model instance
agent_model = HuggingFaceModel(
    models.data[0].id,
    provider=HuggingFaceProvider(hf_client=client, api_key="None"),
    # Annoyingly, despite this being basically the default profile, Llama 3's tool calls often fall through to the response without this
    profile=ModelProfile(
        supports_tools=True,
        json_schema_transformer=InlineDefsJsonSchemaTransformer
    )
)
```

### Working: Basic requests and history

1. Create the basic agent
```python
from pydantic_ai import Agent

simple_agent = Agent(model=agent_model)
```

2. Make a simple request
```python
simple_result = await simple_agent.run("Tell me a joke.")

simple_result.output # "Why couldn't the bicycle stand up by itself?\n\nBecause it was two-tired!"
```

3. Test including previous messages in another simple request
```python
simple_result_2 = await simple_agent.run( message_history=simple_result.all_messages()

simple_result_2.output # 'Why did the scarecrow win an award?\n\nBecause he was outstanding in his field! (get it?)'
```

### Not working (or sometimes "working" with like 20 tool calls)

1. Create the agent and a basic function
```python
from pydantic_ai import Tool
from pydantic_ai.toolsets import FunctionToolset
from datetime import datetime

# Create a simple tool
@Tool
async def get_current_date() -> str:
    """Get the current date.

    Returns:
        str: The current date in YYYY-MM-DD format.
    """
    return datetime.now().strftime("%Y-%m-%d")

# Create an agent with the simple tool
tool_agent = Agent(model=agent_model, tools=[get_current_date])
```

2. Make a simple request that should use the tool call
```python
tool_result = await tool_agent.run("What is the current date?")

tool_result.output # 'I apologize for the repetition! According to my system clock, the current date is indeed August 31st, 2025.'
```

3. Hmm. 8 seconds for that request? Let's inspect the messages
```python
for message in tool_result.all_messages():
    print(message)
```

Which yields something like:
```
ModelRequest(parts=[UserPromptPart(content='What is the current date?', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 70324, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=175, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 288467, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=219, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 505643, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=262, output_tokens=12), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 674762, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=305, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 851700, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=348, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 65279, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=391, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 286718, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=434, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 480682, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=477, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 696462, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=520, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 907846, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=563, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 152962, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=606, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 337485, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=649, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 528383, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=692, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 760306, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=735, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 995073, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=778, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 186872, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=821, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 426914, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=864, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 653267, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=907, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 877281, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=950, output_tokens=17), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 124358, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=993, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 319587, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1036, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 517817, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1079, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 709416, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1122, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 946267, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1165, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 183936, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1208, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 389117, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1251, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 621889, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1294, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 847334, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1337, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 39434, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1380, output_tokens=19), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 300561, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1423, output_tokens=17), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 536096, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1466, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 752334, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1509, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 941799, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1552, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 174612, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1595, output_tokens=14), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 387760, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1638, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 587324, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[TextPart(content='I apologize for the repetition! According to my system clock, the current date is indeed August 31st, 2025.')], usage=RequestUsage(input_tokens=1521, output_tokens=12), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
```

**35 tool calls!**

Here's a log from TGI from one of the calls
```
INFO chat_completions{parameters="GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, frequency_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: true, max_new_tokens: None, return_full_text: None, stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None, grammar: Some(Json(Object {\"$functions\": Object {\"get_current_date\": Object {\"description\": String(\"<summary>Get the current date.</summary>\\n<returns>\\n<type>str</type>\\n<description>The current date in YYYY-MM-DD format.</description>\\n</returns>\"), \"additionalProperties\": Bool(false), \"properties\": Object {\"_name\": Object {\"type\": String(\"string\"), \"const\": String(\"get_current_date\")}}, \"required\": Array [String(\"_name\")]}, \"no_tool\": Object {\"description\": String(\"Open ended response with no specific tool selected\"), \"additionalProperties\": Bool(false), \"properties\": Object {\"_name\": Object {\"type\": String(\"string\"), \"const\": String(\"no_tool\")}}, \"required\": Array [String(\"_name\")]}}, \"properties\": Object {\"function\": Object {\"anyOf\": Array [Object {\"$ref\": String(\"#/$functions/get_current_date\")}, Object {\"$ref\": String(\"#/$functions/no_tool\")}]}}})), adapter_id: Some(\"/models/meta-llama/Meta-Llama-3-8B-Instruct\") }" total_time="180.268942ms" validation_time="1.161794ms" queue_time="46.08µs" inference_time="179.061248ms" time_per_token="14.92177ms" seed="Some(6476155871046790452)" total_time="349.189932ms" validation_time="948.707µs" queue_time="38.419µs" inference_time="348.202936ms" time_per_token="12.896405ms" seed="Some(5246360728990037330)"}: text_generation_router::server: router/src/server.rs:432: Success
```

## Expected behavior

I'd understand if it failed to call the tool, but getting the current date 35 times is a bit much! Ideally, the `HuggingfaceModel` would work with TGI and tool calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite tool call loop: `HuggingFaceModel` and `text-generation-inference` #3318

Description

System Info

Information

Tasks

Reproduction

Setup

Code

Working: Basic requests and history

Not working (or sometimes "working" with like 20 tool calls)

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Infinite tool call loop: HuggingFaceModel and text-generation-inference #3318

Description

Description

System Info

Information

Tasks

Reproduction

Setup

Code

Working: Basic requests and history

Not working (or sometimes "working" with like 20 tool calls)

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Infinite tool call loop: `HuggingFaceModel` and `text-generation-inference` #3318