Skip to content

feat: Improve Mistral and Qwen25 function call parsing#6597

Merged
zhyncs merged 7 commits intomainfrom
chang/tool-call-mistral-qwen25
May 26, 2025
Merged

feat: Improve Mistral and Qwen25 function call parsing#6597
zhyncs merged 7 commits intomainfrom
chang/tool-call-mistral-qwen25

Conversation

@CatherineSue
Copy link
Collaborator

@CatherineSue CatherineSue commented May 25, 2025

Motivation

This PR focuses to resolve the parallel tool calls parsing for MistralDetector and Qwen25Detector
See Multiple Tool Call Support for MistralDetector and Qwen25Detector for more details.

TL;DR:

  • Qwen25Detector's bot_token and eot_token is for single function call. But EBNFComposer was designed for using bot_token and eot_token for the entier function call sequence.
  • MistralDetector doesn't support the parallel tool calls parsing yet.

Modifications

refactor EBNF composer API for multiple tool call support

  • ebnf_composer.py: Refactor build_ebnf() method to use clearer
    parameter names:
    • Replace bot_token/eot_token with
      sequence_start_token/sequence_end_token for sequence-level wrapping
    • Add individual_call_start_token/individual_call_end_token for
      individual call wrapping
    • Rename TOOL_CALLS_MAP to TOOL_CALL_MAP and update logic to handle
      both sequence and individual call patterns
  • qwen25_detector.py: Update to use new individual_call_* parameters
    for per-call token wrapping
  • mistral_detector.py: Update parameter names and improve regex comment
    to clarify multiple tool call support
  • deepseekv3_detector.py: Update to use new sequence_* parameters
    for sequence-level token wrapping
  • pythonic_detector.py: Update to use new sequence_* parameters

improve streaming function call parsing and token handling in Qwen25Detector and MistralDetector

  • base_format_detector.py: Add ends_with_partial_token() method to detect
    partial bot tokens during streaming, improving buffer management and preventing
    premature buffer clearing
  • qwen25_detector.py: Implement custom streaming parser with buffering to
    handle partial end tokens (</tool_call>) that are streamed character-by-character,
    preventing them from appearing in normal text output
  • mistral_detector.py:
    • Refactor JSON parsing to properly handle both single
      objects and arrays, improve error handling with logging, and remove deprecated
      _clean_text() method
    • Improve JSON parsing when there are nested brackets in the arguments, such
      as {"name":"make_next_step_decision", "arguments":{"decision":"","content":"\nTOOL: Access a weather API or service\nOBSERVATION: Retrieve the current weather data for the top 5 populated cities in the US\nANSWER: The weather in the top 5 populated cities in the US is as follows: [City Name] - [Weather Conditions] - [Temperature]\n"}

Checklist

- **ebnf_composer.py**: Refactor `build_ebnf()` method to use clearer
 parameter names:
  - Replace `bot_token`/`eot_token` with
   `sequence_start_token`/`sequence_end_token` for sequence-level wrapping
  - Add `individual_call_start_token`/`individual_call_end_token` for
   individual call wrapping
  - Rename `TOOL_CALLS_MAP` to `TOOL_CALL_MAP` and update logic to handle
   both sequence and individual call patterns
- **qwen25_detector.py**: Update to use new `individual_call_*` parameters
 for per-call token wrapping
- **mistral_detector.py**: Update parameter names and improve regex comment
 to clarify multiple tool call support
- **deepseekv3_detector.py**: Update to use new `sequence_*` parameters
 for sequence-level token wrapping
- **pythonic_detector.py**: Update to use new `sequence_*` parameters

This refactoring provides a clearer API distinction between tokens that wrap
the entire sequence of tool calls versus tokens that wrap individual calls,
enabling better support for multiple tool call formats across different model types.
- **base_format_detector.py**: Add `ends_with_partial_token()` method to detect
partial bot tokens during streaming, improving buffer management and preventing
premature buffer clearing
- **qwen25_detector.py**: Implement custom streaming parser with buffering to
handle partial end tokens (`</tool_call>`) that are streamed character-by-character,
preventing them from appearing in normal text output
- **mistral_detector.py**: Refactor JSON parsing to properly handle both single
objects and arrays, improve error handling with logging, and remove deprecated
`_clean_text()` method

These changes enhance the robustness of streaming function call detection across
different model formats, particularly addressing issues where partial tokens were incorrectly processed or leaked into normal text output.
@CatherineSue
Copy link
Collaborator Author

Manual test for chang/tests/examples/test_tool_choice.py

Mistral
mistralai/Mistral-7B-Instruct-v0.3 seems flaky in multiple tool call support.
Screenshot 2025-05-25 at 11 32 32 AM

Qwen25
Screenshot 2025-05-25 at 10 31 29 AM

The MistralDetector was failing to parse tool calls when the JSON content
contained nested brackets (e.g., "[City Name]" within string values).

**Context**
- Regex pattern `r"\[TOOL_CALLS\] (\[.*?\])"` used non-greedy matching
- Would stop at first ']' encountered, even if inside a JSON string

**Changes**
- Replaced regex-based extraction with bracket counting algorithm
- New `_extract_json_array()` method properly handles:
  - Nested brackets within JSON strings
  - Escaped characters and quotes
  - Proper string boundary detection
- Add UT for MistralDetector
- function_call_unit for pythonic and json should be the same, both are `function_call`
- Remove `TOOL_CALL_MAP` as pythonic and json should be the same.
@zhyncs zhyncs merged commit 16f69b1 into main May 26, 2025
0 of 36 checks passed
@zhyncs zhyncs deleted the chang/tool-call-mistral-qwen25 branch May 26, 2025 06:07
Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025
@MooMoo-Yang
Copy link

Does tool call support image return or multi-modal return?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants