Hello. Needless to say, amazing library. Please let me know if you'd like me to try something or if you need more info.
I've been going through various local model providers trying to find one that works well, when I cam across a rather shocking bug when running against Huggingface's TGI model host.
All code is running in a Jupyter notebook.
ModelRequest(parts=[UserPromptPart(content='What is the current date?', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 70324, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=175, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 288467, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=219, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 505643, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=262, output_tokens=12), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 674762, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=305, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 16, 851700, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=348, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 65279, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=391, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 286718, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=434, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 480682, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=477, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 696462, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=520, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 17, 907846, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=563, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 152962, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=606, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 337485, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=649, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 528383, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=692, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 760306, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=735, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 18, 995073, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=778, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 186872, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=821, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 426914, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=864, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 653267, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=907, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 19, 877281, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=950, output_tokens=17), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 124358, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=993, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 319587, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1036, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 517817, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1079, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 709416, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1122, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 20, 946267, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1165, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 183936, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1208, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 389117, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1251, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 621889, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1294, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 21, 847334, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1337, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 39434, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1380, output_tokens=19), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 300561, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1423, output_tokens=17), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 536096, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1466, output_tokens=15), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 752334, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1509, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 22, 941799, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1552, output_tokens=16), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 174612, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1595, output_tokens=14), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 387760, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='get_current_date', args='{}', tool_call_id='0')], usage=RequestUsage(input_tokens=1638, output_tokens=13), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
ModelRequest(parts=[ToolReturnPart(tool_name='get_current_date', content='2025-08-31', tool_call_id='0', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, 587324, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[TextPart(content='I apologize for the repetition! According to my system clock, the current date is indeed August 31st, 2025.')], usage=RequestUsage(input_tokens=1521, output_tokens=12), model_name='/models/meta-llama/Meta-Llama-3-8B-Instruct', timestamp=datetime.datetime(2025, 8, 31, 8, 14, 23, tzinfo=datetime.timezone.utc), provider_name='huggingface', provider_request_id='')
I'd understand if it failed to call the tool, but getting the current date 35 times is a bit much! Ideally, the HuggingfaceModel would work with TGI and tool calls.
Description
Hello. Needless to say, amazing library. Please let me know if you'd like me to try something or if you need more info.
I've been going through various local model providers trying to find one that works well, when I cam across a rather shocking bug when running against Huggingface's TGI model host.
The problem appears whether using the OpenAI "compatible" endpoints or the
HuggingfaceModelwith customAsyncInferenceClientandHuggingFaceProvider. The latter probably being the official approach, the code included here will be using that.System Info
curl 127.0.0.1:8080/info | jq:{ "model_id": "/models/meta-llama/Meta-Llama-3-8B-Instruct", "model_sha": null, "model_pipeline_tag": null, "max_concurrent_requests": 128, "max_best_of": 2, "max_stop_sequences": 4, "max_input_tokens": 8191, "max_total_tokens": 8192, "validation_workers": 2, "max_client_batch_size": 4, "router": "text-generation-router", "version": "3.3.4-dev0", "sha": "9f38d9305168f4b47c8c46b573f5b2c07881281d", "docker_label": "sha-9f38d93" }nvidia-smi:Information
Tasks
Reproduction
Setup
Here's the
docker-compose.yamlI'm using to start TGI:Code
All code is running in a Jupyter notebook.
Here's the common setup cell:
Working: Basic requests and history
Not working (or sometimes "working" with like 20 tool calls)
Which yields something like:
35 tool calls!
Here's a log from TGI from one of the calls
Expected behavior
I'd understand if it failed to call the tool, but getting the current date 35 times is a bit much! Ideally, the
HuggingfaceModelwould work with TGI and tool calls.