Skip to content

Conversation

@Mustafa-Esoofally
Copy link
Contributor

@Mustafa-Esoofally Mustafa-Esoofally commented Dec 3, 2025

Summary

Adds a comprehensive token counting utility that works across multiple model providers (OpenAI, Anthropic, AWS Bedrock, Google Gemini, LiteLLM) wherever supported

Also, integrates token-based compression into the existing CompressionManager.

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Improvement
  • Model update
  • Other:

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Examples and guides: Relevant cookbook examples have been included or updated (if applicable)
  • Tested in clean environment
  • Tests added/updated (if applicable)

@Mustafa-Esoofally Mustafa-Esoofally requested a review from a team as a code owner December 3, 2025 15:37
@Mustafa-Esoofally Mustafa-Esoofally changed the title Feat: Improve token counting Logic feat: Add token counting utility + Add support for it in Compression Dec 4, 2025
Copy link
Contributor

@manuhortet manuhortet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! would be great to see tests for the model-specific counting functions too

return response.get("inputTokens", 0)
except Exception as e:
log_warning(f"Failed to count tokens via Bedrock API: {e}")
return super().count_tokens(messages, tools)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use this? probably the same counting mechanism, as it should just depend on the model encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token counting logic won't work for our Claude models

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cant we use the count_tokens fn of the base Claude class: libs/agno/agno/models/anthropic/claude.py ?

Copy link
Contributor Author

@Mustafa-Esoofally Mustafa-Esoofally Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bedrock supports other Non-Anthropic models as well so count_tokens fn of the base Claude class won't work? Also I am not sure if token couting is same here because Claude has intelligent caching

self,
messages: List[Message],
tools: Optional[List] = None,
main_model: Optional[Model] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use both as model

# Add a function call for each successful execution
function_call_count += len(function_call_results)

all_messages = messages + function_call_results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing this? I think probably you are right, but there was a reason we did it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we were limited by tool call count but now we can estimate token count for messages before API calls so moved it up (before 1st API call)

messages: List[Message],
tools: Optional[List[Union[Function, Dict[str, Any]]]] = None,
) -> int:
if not self.vertexai:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true? Their client supports it in general? And worsk with Multimodal which is nice

Copy link
Contributor Author

@Mustafa-Esoofally Mustafa-Esoofally Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Google AI Studio the system_instruction and tools are ignored for count_token.

For VertexAI it works correctly

tool_names.append(result.tool_name)
message_metrics += result.metrics

tool_name = ", ".join(tool_names) if tool_names else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? It looks like you append all the tool names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini combines multiple tool results into a single message (unlike OpenAI/Claude) but can we find the tool name - a comma-separated list like "search, calculator". Useful for logging (message.log()) and debugging

# Tool token counting


def _format_function_definitions(tools: List[Dict[str, Any]]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What format does this create? Is this how it is done for OpenAI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments in the code

for msg in messages:
total += _count_message_tokens(msg, model_id, tokens_per_message, tokens_per_name)

# Add 3 tokens for reply priming
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the rationale here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments in the code


# Count tool tokens
if tools:
includes_system = any(msg.role == "system" for msg in messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please share the rationale here as well? Also some comments here would be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments in the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by more efficiently here?

Copy link
Contributor Author

@Mustafa-Esoofally Mustafa-Esoofally Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment

) -> int:
tokens = tokens_per_message

if message.role:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please look into whether a model counts the "role" as a part of input tokens? As more often than not, role takes up a separate param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Updated the algo

Total token count for the text.
# gpt-4o models use the newer o200k_base encoding with 200k vocabulary
if "gpt-4o" in model_id.lower():
return tiktoken.get_encoding("o200k_base")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about gpt 5? Does that not use the newer o200k_base encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use tiktoken method which handles all models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants