-
Notifications
You must be signed in to change notification settings - Fork 4.7k
feat: Add token counting utility + Add support for it in Compression #5593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
manuhortet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! would be great to see tests for the model-specific counting functions too
libs/agno/agno/models/aws/bedrock.py
Outdated
| return response.get("inputTokens", 0) | ||
| except Exception as e: | ||
| log_warning(f"Failed to count tokens via Bedrock API: {e}") | ||
| return super().count_tokens(messages, tools) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use this? probably the same counting mechanism, as it should just depend on the model encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The token counting logic won't work for our Claude models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cant we use the count_tokens fn of the base Claude class: libs/agno/agno/models/anthropic/claude.py ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bedrock supports other Non-Anthropic models as well so count_tokens fn of the base Claude class won't work? Also I am not sure if token couting is same here because Claude has intelligent caching
cookbook/agents/context_compression/token_based_tool_call_compression.py
Outdated
Show resolved
Hide resolved
| self, | ||
| messages: List[Message], | ||
| tools: Optional[List] = None, | ||
| main_model: Optional[Model] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
target_model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use both as model
| # Add a function call for each successful execution | ||
| function_call_count += len(function_call_results) | ||
|
|
||
| all_messages = messages + function_call_results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this? I think probably you are right, but there was a reason we did it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we were limited by tool call count but now we can estimate token count for messages before API calls so moved it up (before 1st API call)
| messages: List[Message], | ||
| tools: Optional[List[Union[Function, Dict[str, Any]]]] = None, | ||
| ) -> int: | ||
| if not self.vertexai: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true? Their client supports it in general? And worsk with Multimodal which is nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Google AI Studio the system_instruction and tools are ignored for count_token.
For VertexAI it works correctly
| tool_names.append(result.tool_name) | ||
| message_metrics += result.metrics | ||
|
|
||
| tool_name = ", ".join(tool_names) if tool_names else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? It looks like you append all the tool names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gemini combines multiple tool results into a single message (unlike OpenAI/Claude) but can we find the tool name - a comma-separated list like "search, calculator". Useful for logging (message.log()) and debugging
| # Tool token counting | ||
|
|
||
|
|
||
| def _format_function_definitions(tools: List[Dict[str, Any]]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What format does this create? Is this how it is done for OpenAI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments in the code
libs/agno/agno/utils/tokens.py
Outdated
| for msg in messages: | ||
| total += _count_message_tokens(msg, model_id, tokens_per_message, tokens_per_name) | ||
|
|
||
| # Add 3 tokens for reply priming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the rationale here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments in the code
|
|
||
| # Count tool tokens | ||
| if tools: | ||
| includes_system = any(msg.role == "system" for msg in messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please share the rationale here as well? Also some comments here would be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comments in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is meant by more efficiently here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment
| ) -> int: | ||
| tokens = tokens_per_message | ||
|
|
||
| if message.role: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please look into whether a model counts the "role" as a part of input tokens? As more often than not, role takes up a separate param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Updated the algo
libs/agno/agno/utils/tokens.py
Outdated
| Total token count for the text. | ||
| # gpt-4o models use the newer o200k_base encoding with 200k vocabulary | ||
| if "gpt-4o" in model_id.lower(): | ||
| return tiktoken.get_encoding("o200k_base") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about gpt 5? Does that not use the newer o200k_base encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use tiktoken method which handles all models
Co-authored-by: Yash Pratap Solanky <[email protected]>
Summary
Adds a comprehensive token counting utility that works across multiple model providers (OpenAI, Anthropic, AWS Bedrock, Google Gemini, LiteLLM) wherever supported
Also, integrates token-based compression into the existing CompressionManager.
Type of change
Checklist
./scripts/format.shand./scripts/validate.sh)