feat(beta): add ToolPolicyMiddleware for tool access control by amabito · Pull Request #2533 · ag2ai/ag2

amabito · 2026-03-31T02:18:38Z

Why are these changes needed?

Pulls the tool allow/block list logic out of GovernanceMiddleware (#2501,
closed) into its own middleware. Sits on on_tool_execution, checks the
tool name against blocked and allowed lists, returns ToolErrorEvent if denied.

blocked_tools overrides allowed_tools. allowed_tools=None means no
restriction; allowed_tools=[] means deny all. Config is frozen after
construction.

Related issue number

Replaces the tool policy portion of #2501 (closed).
Related: #2531 (BudgetMiddleware), #2532 (CircuitBreakerMiddleware).

Checks

I've included any doc changes needed for https://docs.ag2.ai/. See https://docs.ag2.ai/latest/docs/contributor-guide/documentation/ to build and test documentation locally.
I've added tests (if relevant) corresponding to the changes introduced in this PR.
I've made sure all auto checks have passed.

Lancetnik · 2026-04-08T16:00:47Z

@amabito hi! Thank you for the middlewares contribution, but I don't see a sense of the current one

If you don't wont to let an Agent execute a tool, you can just do not pass it to context. Provide models with information about tool and forbid its execution looks strange to me
Global counters _total_tool_calls and _total_blocked also could slowly lead to memory leaks because we have no options to drop them. Also, I don't see a reason to have such counters

amabito · 2026-04-09T05:17:16Z

Hey @Lancetnik, thanks for the pushback -- both points are worth addressing directly.

"Just don't pass the tool to context"

Static exclusion at registration time works when the policy is fixed for the session, and for that case you're right that a middleware adds nothing. Where it stops working:

Audit of denied attempts. If a tool was advertised to the model earlier in the session (cached context, multi-turn history, a prior planning step) and policy changes mid-session, the model may still try to call it. Static exclusion at registration doesn't intercept that call or leave any record. The middleware catches the attempt, returns a structured ToolErrorEvent via _make_tool_error (tool_policy.py L37-L47), and surfaces it on the event stream where observability hooks can pick it up. "The model tried to call delete_all and was blocked" is a signal you want to see, not silence.
Dynamic policy updates. Policies driven by external state -- rate limiting, user role, per-tenant rules, kill-switches triggered by an ops dashboard -- can't be expressed by omitting the tool at construction time, because the agent is already built. The middleware holds _policy as a swappable reference, so a supervisor can update it without rebuilding the agent. I haven't exposed a setter yet, but the shape is there.
Deny-by-default sandboxes. An allowlist-based agent (allowed_tools=["search", "calc"]) is a cleaner primitive than "register every tool except these", especially when the tool set is assembled dynamically from plugins or MCP servers. The check in _ToolPolicy.check (L59-L68) enforces deny-beats-allow precedence, which is the usual policy semantics.

If even the audit case feels out of scope for AG2's middleware layer and you'd rather see this as a dedicated capability or event hook rather than a middleware, I'm open to that -- say the word and I'll repackage it. But I don't think "don't pass the tool" covers these cases.

Counters and the leak concern

The counters at L90-L91 are instance attributes on ToolPolicyMiddleware, not module-level globals, so they don't persist across unrelated factories. What they do is grow unbounded over the lifetime of a long-lived factory with no reset path, which is a real API design smell even if it's not strictly a leak.

I'd rather just drop the counters. They were a convenience for testing and a half-hearted observability hook, and neither justifies stateful middleware. Concrete change:

Remove _total_tool_calls, _total_blocked, _lock, and the two properties (L90-L104).
Add an optional on_blocked: Callable[[ToolCallEvent, str], None] | None parameter. Users who want metrics wire it to their own Prometheus counter / log / whatever. The middleware stays stateless.
Tests that check counter values get rewritten against a mock callback.

Landing it in the next revision alongside the constructor simplification from #2531 / #2532.

@Lancetnik

Address review feedback from @Lancetnik on ag2ai#2533: - Flatten constructor API: pass blocked_tools/allowed_tools/on_blocked directly instead of requiring callers to build a ToolPolicyConfig - Drop stateful counters (_total_tool_calls, _total_blocked, _lock) and their accessor properties -- the middleware is now stateless - Add optional on_blocked callback so callers can wire their own observability (metrics, audit log) without the library retaining any per-factory state The ToolPolicyConfig dataclass is kept as an internal struct for immutability (tuple freezing in __post_init__).

Lancetnik · 2026-04-09T08:48:35Z

Dynamic policy updates. Policies driven by external state -- rate limiting, user role, per-tenant rules, kill-switches triggered by an ops dashboard -- can't be expressed by omitting the tool at construction time, because the agent is already built. The middleware holds _policy as a swappable reference, so a supervisor can update it without rebuilding the agent. I haven't exposed a setter yet, but the shape is there.

Fair, it seems the commonest case for such feature - but I don't see a public API covers it. Can we introduce a some?

Lancetnik · 2026-04-09T08:55:20Z

I prefer end-to-end test here using public API (or at least do not use non-public objects). Such tests are not so implementation-tied and not so fragile. Also, they cover real usecases and could be used as a some kind of documentation

Please, try to implement each test using TestClient / TracingClient. Reference: https://github.com/ag2ai/ag2/blob/main/test/beta/middleware/test_tool_execution.py

@Lancetnik

Address review feedback from @Lancetnik on ag2ai#2533: - Flatten constructor API: pass blocked_tools/allowed_tools/on_blocked directly instead of requiring callers to build a ToolPolicyConfig - Drop stateful counters (_total_tool_calls, _total_blocked, _lock) and their accessor properties -- the middleware is now stateless - Add optional on_blocked callback so callers can wire their own observability (metrics, audit log) without the library retaining any per-factory state The ToolPolicyConfig dataclass is kept as an internal struct for immutability (tuple freezing in __post_init__).

@Lancetnik

Address @Lancetnik follow-up on ag2ai#2533: the dynamic-policy justification needs a public API to be useful. Adds: - ToolPolicyMiddleware.update_policy(config: ToolPolicyConfig) Atomically replaces the active policy. Subsequent tool calls see the new config; per-invocation instances already inside on_tool_execution keep their captured snapshot, so in-flight executions are never torn across a config change. - ToolPolicyMiddleware.config property Read-only access to the active ToolPolicyConfig. Intended hooks: rate-limit kill switches, per-tenant rule updates from an ops dashboard, role changes during a long-running session -- all without rebuilding the agent. Thread-safety: _policy is assigned last so readers in __call__ never observe a state where _config is new but _policy is stale (Python attribute assignment is atomic for single references). Tests: 19/19 pass, covering update-then-permit, in-flight-snapshot isolation, restriction clearing, and the config property.

amabito · 2026-04-11T08:58:55Z

@Lancetnik Thanks -- fair point, "it's in the shape" is not a public API. Added one in 5984625.

`ToolPolicyMiddleware.update_policy(config)`

mw = ToolPolicyMiddleware(allowed_tools=["search", "calc"])
agent = MyAgent(middleware=[mw])

# Later, from an ops callback, tenant-context hook, rate-limit watchdog, etc.:
mw.update_policy(ToolPolicyConfig(
    blocked_tools=["delete_all", "shell_exec"],
    allowed_tools=["search", "calc"],
))

The swap is atomic with a specific contract:

Subsequent calls to __call__ (i.e. new events arriving at the agent) pick up the new policy.
Per-invocation _ToolPolicyInstance objects that are already inside on_tool_execution keep the policy snapshot they captured at construction time. In-flight tool executions are never torn across a config change -- you won't see a call half-evaluated against the old rules and half against the new.
_policy is assigned last (after _config) so a concurrent reader in __call__ never observes a state where _config is new but _policy is stale. Single-reference attribute assignment is atomic in CPython, so no lock is needed for the happy path.

Also added mw.config as a read-only property for supervisors that want to log or diff the active configuration.

Tests cover:

test_update_policy_takes_effect_on_next_call -- deny shell_exec, swap policy, next call permits it.
test_update_policy_in_flight_instance_uses_snapshot -- construct instance, swap to "no restrictions", confirm the in-flight instance still enforces the old allowlist.
test_update_policy_clears_restrictions -- passing a fresh ToolPolicyConfig() drops all rules.
test_config_property_returns_active_config.

Branch is also rebased onto current main -- the CONFLICTING state is resolved.

Other changes carried from the earlier reply (already in the branch via 1d96b4a): _total_tool_calls / _total_blocked / _lock and their properties are gone; observation goes through the on_blocked callback.

19/19 tests pass, pre-commit clean.

amabito · 2026-04-11T12:11:42Z

Sorry for the rapid-fire -- ran the middleware through another review pass and wanted to send the consolidated state.

Since the last update:

update_policy() contract. Docstring now states clearly that self._policy is the sole authority for enforcement in __call__; self._config is updated for the .config property. Under CPython the two STORE_ATTR are not a single transaction, so a .config reader may briefly see the new config while __call__ uses the previous policy for one final instantiation. test_concurrent_update_policy_consistent_decisions exercises 20 parallel on_tool_execution calls while a background writer alternates policies and confirms every result is a valid decision against some snapshot.
Callback isolation. on_blocked is now wrapped in contextlib.suppress(Exception), so a failing audit callback does not prevent the ToolErrorEvent from being returned.
Export. ToolPolicyMiddleware and ToolPolicyConfig are re-exported from both autogen.beta.middleware and autogen.beta.middleware.builtin. Regression test test_tool_policy_importable_from_middleware_top_level covers the top-level path. Note: denials currently produce ToolErrorEvent(PermissionError(reason)), not a dedicated ToolDeniedError -- happy to add a named exception in a follow-up if that matters for callers.
Edge-case tests added. test_empty_tool_name_denied_by_allowlist and test_empty_allowlist_blocks_all_via_public_interface cover corner cases through the public middleware interface.

25 tests, all passing. Pre-commit clean.

Single-responsibility middleware that blocks disallowed tool calls based on configurable allow/block lists. Extracted from GovernanceMiddleware to align with beta v2 composable middleware design. - ToolPolicyConfig: blocked_tools, allowed_tools - blocked_tools overrides allowed_tools - allowed_tools=None means no restriction, [] means deny all - Immutable config (lists frozen to tuples)

@Lancetnik

Address review feedback from @Lancetnik on ag2ai#2533: - Flatten constructor API: pass blocked_tools/allowed_tools/on_blocked directly instead of requiring callers to build a ToolPolicyConfig - Drop stateful counters (_total_tool_calls, _total_blocked, _lock) and their accessor properties -- the middleware is now stateless - Add optional on_blocked callback so callers can wire their own observability (metrics, audit log) without the library retaining any per-factory state The ToolPolicyConfig dataclass is kept as an internal struct for immutability (tuple freezing in __post_init__).

@Lancetnik

Address @Lancetnik follow-up on ag2ai#2533: the dynamic-policy justification needs a public API to be useful. Adds: - ToolPolicyMiddleware.update_policy(config: ToolPolicyConfig) Atomically replaces the active policy. Subsequent tool calls see the new config; per-invocation instances already inside on_tool_execution keep their captured snapshot, so in-flight executions are never torn across a config change. - ToolPolicyMiddleware.config property Read-only access to the active ToolPolicyConfig. Intended hooks: rate-limit kill switches, per-tenant rule updates from an ops dashboard, role changes during a long-running session -- all without rebuilding the agent. Thread-safety: _policy is assigned last so readers in __call__ never observe a state where _config is new but _policy is stale (Python attribute assignment is atomic for single references). Tests: 19/19 pass, covering update-then-permit, in-flight-snapshot isolation, restriction clearing, and the config property.

…eware at top level TP-1 (update_policy atomicity): The previous docstring said "Assign _policy last so readers in __call__ never observe a state where _config is new but _policy is stale", but the assignment order was _config then _policy -- which is exactly right. The comment was misleading. Clarify: _config is assigned first so the config property never lags behind the active policy; under CPython's GIL each assignment is atomic at the bytecode level, so __call__ readers cannot observe a mismatched pair. TP-2 (export gap): autogen.beta.middleware top-level __init__ did not re-export ToolPolicyMiddleware or ToolPolicyConfig. Add imports and __all__ entries so callers can reach them from the canonical import path. Add test_tool_policy_importable_from_middleware_top_level regression test.

…ersarial tests - Wrap on_blocked callback in contextlib.suppress(Exception) so a failing audit observer does not prevent the ToolErrorEvent from being returned to the caller. - Add test_on_blocked_callback_exception_does_not_prevent_denial. - Add test_concurrent_update_policy_consistent_decisions: 20 parallel coroutines interleaved with a background policy-writer; asserts every result is a valid ToolErrorEvent or ToolResultEvent. - Add test_empty_tool_name_denied_by_allowlist. - Add test_empty_allowlist_blocks_all_via_public_interface covering ToolPolicyMiddleware(allowed_tools=[]) through the public interface.

codecov · 2026-04-19T21:33:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines	Coverage Δ
autogen/beta/middleware/__init__.py	`100.00% <ø> (ø)`
autogen/beta/middleware/builtin/__init__.py	`93.75% <100.00%> (+0.41%)`	⬆️
autogen/beta/middleware/builtin/tool_policy.py	`100.00% <100.00%> (ø)`

... and 48 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

amabito requested a review from Lancetnik as a code owner March 31, 2026 02:18

github-actions Bot added the beta label Mar 31, 2026

amabito mentioned this pull request Mar 31, 2026

feat(beta): add governance middleware (budget, circuit breaker, tool policy) #2501

Closed

Lancetnik self-assigned this Apr 7, 2026

Lancetnik reviewed Apr 9, 2026

View reviewed changes

amabito force-pushed the feat/tool-policy-middleware branch from 3253b29 to 5984625 Compare April 11, 2026 08:58

amabito added 5 commits April 14, 2026 14:55

amabito force-pushed the feat/tool-policy-middleware branch from 5043ecc to 2cd9dd9 Compare April 14, 2026 05:55

amabito added 3 commits April 16, 2026 19:08

Merge branch 'main' into feat/tool-policy-middleware

578d24e

Merge branch 'main' into feat/tool-policy-middleware

50f83fc

Merge branch 'main' into feat/tool-policy-middleware

6ec02dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(beta): add ToolPolicyMiddleware for tool access control#2533

feat(beta): add ToolPolicyMiddleware for tool access control#2533
amabito wants to merge 8 commits intoag2ai:mainfrom
amabito:feat/tool-policy-middleware

amabito commented Mar 31, 2026

Uh oh!

Lancetnik commented Apr 8, 2026

Uh oh!

amabito commented Apr 9, 2026

Uh oh!

Lancetnik commented Apr 9, 2026

Uh oh!

Lancetnik Apr 9, 2026

Uh oh!

amabito commented Apr 11, 2026

Uh oh!

amabito commented Apr 11, 2026

Uh oh!

codecov Bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

amabito commented Mar 31, 2026

Why are these changes needed?

Related issue number

Checks

Uh oh!

Lancetnik commented Apr 8, 2026

Uh oh!

amabito commented Apr 9, 2026

"Just don't pass the tool to context"

Counters and the leak concern

Uh oh!

Lancetnik commented Apr 9, 2026

Uh oh!

Lancetnik Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

amabito commented Apr 11, 2026

ToolPolicyMiddleware.update_policy(config)

Uh oh!

amabito commented Apr 11, 2026

Uh oh!

codecov Bot commented Apr 19, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`ToolPolicyMiddleware.update_policy(config)`