✨ feat: comprehensive native function calling support and CI stabilization#15
Merged
✨ feat: comprehensive native function calling support and CI stabilization#15
Conversation
Introduce native tool calling by leveraging AI Studio's built-in UI, providing superior reliability and full OpenAI API compatibility compared to emulation. Key Changes: - Add `FunctionCallingOrchestrator` for dual-mode (native/emulated) coordination - Implement `SchemaConverter` for OpenAI to AI Studio tool declaration mapping - Implement `ResponseFormatter` for standardized tool_calls output - Add browser automation mixin for tool configuration via AI Studio's UI - Integrate native calling into request processor, response generators, and streams - Support both streaming and non-streaming tool calls with finish_reason: tool_calls - Add graceful fallback mechanism to emulated mode on UI automation failures - Comprehensive documentation including ADR-001, sequence diagrams, and guides - Full unit tests for response parsing and schema conversion This enables standard OpenAI tools usage while maintaining backward compatibility via the legacy emulation mode. Refs: ADR-001
…setup Google Search grounding interferes with native function calling in AI Studio. This change ensures that if Google Search grounding is enabled, it is automatically disabled before attempting to set function declarations. Changes: - Add Step 0 to `set_function_declarations` to check and disable Google Search toggle - Improve reliability of function calling setup workflow
Ensure reliability of native function calling implementation with comprehensive tests. Changes: - Add `tests/api_utils/utils_ext/test_function_calling_core.py`: Unit tests for function declaration parsing and conversion logic - Add `tests/verify_native_fc_e2e.py`: End-to-end script for verifying function calling via the proxy API
…/url context Automatically disables Google Search and URL Context when Native Function Calling is active, as these features are mutually exclusive in the AI Studio interface. Changes: - Implement `_adjust_url_context` in `ParameterController` to support both enabling and disabling - Force disable URL Context in `PageController` and `ParameterController` when FC is active - Update `FunctionCallingController` to proactively disable Google Search and URL Context during setup - Add unit tests to verify conflict resolution logic
…bility Refactor SchemaConverter to use whitelist approach for Gemini-compatible schema fields. The previous blacklist approach missed unsupported JSON Schema properties (minimum, maximum, pattern, etc.) which caused silent failures in AI Studio. Key changes: - SchemaConverter now uses ALLOWED_SCHEMA_FIELDS whitelist - Handle anyOf/oneOf/allOf by extracting first non-null option - Convert const to enum, normalize type arrays with nullable - Support flat tool format (e.g., from opencode) alongside standard format - Add non-streaming response formatter for tool calls - Fix streaming delta structure for tool_calls - Add parallel_tool_calls field to ChatCompletionRequest - Improve debug logging with UI: prefix for browser operations - Fix file handler log level configuration Includes Gemini API research documentation for FunctionDeclaration and FunctionCalling schemas. Refs: ADR-001
…support Implement parsing for AI Studio's built-in native function calling response format while maintaining full backward compatibility. Changes: - Add selectors for ms-function-call-chunk elements (name, args, code blocks) - Implement Strategy 1: native chunk parsing in FunctionCallResponseParser - Add _parse_native_function_calls() for multi-chunk handling - Add _parse_single_native_chunk() for element extraction - Add _extract_function_name_from_header() for header text cleanup - Update legacy selectors with native paths as fallbacks - Reorder parsing strategies: native first, then legacy widget/code block Native format extracts function name from mat-panel-title and arguments from pre > code JSON blocks.
… UI ops Implement digest-based caching for function calling configuration to avoid expensive browser UI operations when the same tools are used in subsequent requests. This significantly reduces latency for agentic workflows. Key changes: - Add FunctionCallingCache singleton with SHA256-based tool digest - Integrate cache with FunctionCallingOrchestrator for request preparation - Add instance-level toggle state caching in FunctionCallingController - Invalidate cache on model switch, new chat, or explicit clear - Add comprehensive logging with [FC:Cache], [FC:UI], [FC:Perf] prefixes - Add performance timing metrics for UI operations Configuration: - FUNCTION_CALLING_CACHE_ENABLED (default: true) - FUNCTION_CALLING_CACHE_TTL (default: 0 = no TTL) Refs: ADR-001-native-function-calling
… parsing
AI Studio's wire format uses variable nesting levels for function call
arguments. The previous implementation assumed a fixed depth, causing
"filePath undefined" validation errors when the nesting varied.
Changes:
- Add `_unwrap_to_param_list()` helper that dynamically unwraps nested
lists until it finds the actual parameter tuples
- Add `_parse_array_items()` for proper array type handling
- Update tests to expect graceful degradation (return `{}`) instead of
exceptions for malformed input
- Extend function calling core with related improvements
… calls
Some models output function calls as plain text in the format:
"Request function call: <name>\nParameters:\n{...}" instead of using
AI Studio's native function calling UI elements. This caused tool_calls
to be returned as plain text content, breaking client integrations.
Changes:
- Add EMULATED_FUNCTION_CALL_PATTERN and EMULATED_PARAMS_PATTERN regexes
- Add _parse_emulated_function_calls() method for text-based format
- Add _extract_emulated_params() for robust JSON extraction with fallbacks
- Add _clean_json_string() to handle control characters in JSON
- Update _parse_text_function_calls() to try emulated parsing first
- Add 10 comprehensive tests covering edge cases
Closes: Kilo Code "You did not use a tool" errors
Fix critical parsing failures discovered during native function calling testing
that caused malformed or missing tool calls in client responses.
Parsing fixes:
- No-param calls: Regex now allows end-of-string after function name
- Inline params: Fixed detection when `{` follows extracted name
- `<ctrl46>` format: New `_parse_inline_params()` with 5 strategies
- `default_api:` prefix: Strip common model prefixes before returning
Streaming fixes:
- Strip "Request function call:..." text from body before streaming
- Suppress synthetic "*Model finished thinking...*" when function exists
Tests: Added 4 new cases covering prefix stripping, ctrl46 parsing, and
no-param calls.
Closes: #native-fc-parsing
Fix two critical race conditions causing false negatives and text leakage: 1. DOM detection timing: Added retry loop (10×0.3s) when detecting function calls from DOM elements, as native UI may not render immediately after done=True is received from the stream. 2. Text stripping unconditional: Always strip "Request function call:" text from response body regardless of whether function detection succeeded. This prevents raw emulated FC text from leaking to clients. 3. Recovery parsing: Added static parser function to extract function calls from emulated text format as fallback when DOM detection fails entirely. Closes: Function call detection reliability issues in native mode
…s and control chars Address parsing failures observed in native function calling: - Add _clean_body_text() to strip <ctrl##> artifacts leaking into responses - Reset has_seen_functions when wire format returns empty arguments to trigger DOM fallback parsing for argument recovery - Add warning logs when wire format parsing yields empty params for debugging potential parse failures These fixes improve resilience when AI Studio's wire format produces incomplete or malformed function call data.
…ing edge cases Fix two critical edge cases in native function calling: 1. AUTO mode fallback bug: When FUNCTION_CALLING_MODE=auto and native FC failed, fallback to emulated mode silently failed because the tool catalog wasn't injected. Fixed by passing fc_state through the call chain so should_skip_tool_injection() uses dynamic state instead of static config. 2. Client switching edge case: When switching from an FC client (with tools parameter) to an XML-only client (no tools), the FC toggle remained enabled in the browser UI. Added _ensure_fc_disabled_when_no_tools() to auto-disable the toggle when no tools are provided. Changes: - Pass fc_state through request_processor → prepare_combined_prompt - Update should_skip_tool_injection() to check dynamic fc_state first - Add _ensure_fc_disabled_when_no_tools() cleanup method to orchestrator - Update .env.example to recommend "auto" mode as default - Add 11 tests for fallback and edge case scenarios
…nt desync When the FC cache reports a HIT with toggle_enabled=True, the system was skipping native FC setup entirely. However, after new_chat clears the conversation (e.g., soft context limit reached), the UI toggle resets to disabled while the cache still reports enabled. This caused native FC to silently fail, falling back to emulated text parsing. The fix now verifies the actual UI toggle state (bypassing instance cache with use_cache=False) on cache HIT and re-enables the toggle if needed before trusting the cached state. Refs: DEBUG_REPORT_fc-cache-toggle-desync-2025-12-25.md
Introduce a comprehensive debug logging infrastructure for function calling to aid troubleshooting across diverse coding tools (Copilot, Kilo Code, Roo Code, Cline, OpenCode CLI, Codex CLI, Claude Code). Key capabilities: - 7 modular loggers: ORCHESTRATOR, UI, CACHE, WIRE, DOM, SCHEMA, RESPONSE - Per-module enable/disable via FC_DEBUG_* environment variables - Separate log files per module in logs/fc_debug/ - Configurable log levels per module - Smart payload truncation for large tool definitions - Request ID correlation for cross-module tracing Integrations: - function_calling_orchestrator.py: decision logging - function_calling_cache.py: cache hit/miss tracking - function_calling.py: wire format parsing - page_controller function_calling.py: DOM interactions - stream/interceptors.py: response extraction Includes 62 comprehensive tests and full documentation.
…ire format
Wire format parser incorrectly wrapped object values in arrays when objects
appeared directly inside array parameters (e.g., `{"id": ["1"]}` instead of
`{"id": "1"}`). Root cause: `_parse_single_array_item()` checked length-based
type encoding BEFORE checking if an item was a param list (object).
Changes:
- Add `_looks_like_param_list()` helper to detect object structures
- Prioritize param list detection BEFORE length-based type decoding
- Add recursive unwrapping for variable nesting depths
- Enhance debug logging in argument parsing flow
- Add 17 comprehensive test cases including edge cases
Verified working with OpenCode CLI tools: todowrite, gh_grep, tavily, write,
bash, prune, etc.
Closes: Wire format array parsing issues with object parameters
…G flag Add conditional guards to all function calling (FC) related logging across the codebase. When `FUNCTION_CALLING_DEBUG=false` (default), all FC console logs and fc_logger file logging are completely disabled, eliminating string formatting overhead and disk I/O in production. Changes: - Add FUNCTION_CALLING_DEBUG import and guards to fc_logger calls - Wrap all [FC:] prefixed console logs in conditional checks - Consolidate FC_DEBUG_ENABLED as alias for FUNCTION_CALLING_DEBUG - Clean up duplicate documentation in .env.example Modules updated: - api_utils/utils_ext: function_calling.py, function_calling_cache.py, function_calling_orchestrator.py - browser_utils/page_controller_modules: function_calling.py - stream: interceptors.py - logging_utils/fc_debug: config.py
…r model hallucination recovery Integrate tool name storage in the function calling cache to enable validation and correction of malformed function names from model output. When Gemini outputs text-format function calls with truncated names (e.g. "gh_grep_searchGitH" instead of "gh_grep_searchGitHub"), the new fuzzy matching can recover the intended function. Key changes: - Add `tools` parameter pipeline through orchestrator → controller → cache - Store registered tool names in FunctionCallingCacheEntry - Add `_extract_tool_names()` supporting OpenAI and flat formats - Add `validate_function_name()` with prefix-based fuzzy matching (70% threshold) - Integrate `_validate_function_names()` into emulated FC parsing path - Improve regex pattern to capture full function names with special chars - Demote "Recovered function calls" log from INFO to DEBUG Tests: - Add 21 new tests for FunctionCallingCache (extraction, validation, fuzzy matching) - Add 8 new tests for _validate_function_names integration Refs: docs/reports/debug/DEBUG_REPORT_fc-model-hallucination-2025-12-26.md
…te artifacts - Remove obsolete directories: docs/archive/, docs/reports/, docs/research/, and docs/spikes/ - Consolidate redundant function calling documentation into a new comprehensive guide - Add docs/guides/native-function-calling.md as the primary reference for native FC - Update README.md to highlight Native Function Calling features and link to new guide - Update docs/guides/env-variables-reference.md with new modular FC debug flags - Transition ADR-001-native-function-calling.md from Proposed to Implemented
…l parsing - Update `_prepare_and_validate_request` return signature in tests to match 3-tuple output. - Enhance emulated function call regex to support colon (`:`) in tool names (e.g., `default_api:tool`). - Fix `ValueError` unpacking failures in `test_request_processor.py`. - Ensure compatibility with namespaced tool definitions in `test_emulated_fc_parsing.py`. Refs: docs/reports/build/IMPLEMENTATION_REPORT_fix-test-failures-2025-12-26.md
Address technical debt by fixing widespread linting issues and stabilizing the test suite. Changes: - 🎨 fix(style): resolve Ruff linting errors (unused imports, sorting, formatting) across 18+ files - 🧪 fix(tests): rewrite `test_response_generators` with robust async iteration and minimal mocking - ⚙️ fix(function-calling): update orchestrator tool injection logic to handle dynamic state and fallbacks - 🔧 fix(logging): default `fc_debug` modules to disabled for proper granular control - 🚦 fix(tests): synchronize launcher defaults and mock setups across the test suite - ♻️ refactor: cleanup unused imports and organize exports in `api_utils/utils_ext` This commit ensures a clean CI pass and more reliable test results for function calling and streaming.
AI Studio's wire format sends duplicate function call data across multiple stream chunks. This caused MCP tools to receive the same function call multiple times, leading to redundant executions. Changes: - Add function call accumulation dict in HttpInterceptor to track unique calls using (func_name, params_hash) as dedup key - Add reset_for_new_request() to clear state between requests - Return accumulated function calls only when stream completes - Fix pre-existing test_proxy_connector test that required event loop Verified: 2022 tests pass, 17 unique function calls parsed correctly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces full native function calling support for AI Studio, enabling high-performance, structured tool invocation that is 100% compatible with the OpenAI
toolsandtool_callsAPI. It features a robust multi-strategy parsing engine, a performance-optimized configuration cache, and a sophisticated modular diagnostics system.Key Technical Features
1. Robust Multi-Strategy Parsing
2. Performance & Efficiency
FUNCTION_CALLING_DEBUGflag.3. Sophisticated Diagnostics
ORCH,UI,CACHE,WIRE,DOM,SCHEMA,RESP) with per-module enablement and log level control.4. Smart Conflict Resolution
CI & Code Quality
Documentation & Maintenance
docs/guides/native-function-calling.mdcovering architecture, configuration, and troubleshooting.archive/,reports/,research/,spikes/) and consolidated redundant artifacts.ADR-001to 'Implemented'.Testing & Reliability
OpenCode CLI,Copilot,Roo Code,Cline, and standard OpenAI SDKs.Note: This PR consolidates all development efforts for native function calling into a single, production-ready release including CI/CD stabilization.