Skip to content

feat(core): OpenAI Responses API (/v1/responses) native support#2588

Closed
netbrah wants to merge 18 commits intoQwenLM:mainfrom
netbrah:feat/openai-responses-api
Closed

feat(core): OpenAI Responses API (/v1/responses) native support#2588
netbrah wants to merge 18 commits intoQwenLM:mainfrom
netbrah:feat/openai-responses-api

Conversation

@netbrah
Copy link
Copy Markdown
Contributor

@netbrah netbrah commented Mar 22, 2026

Summary

Full Codex-parity implementation of the OpenAI Responses API as a new provider type openai-responses, parallel to the existing openai (Chat Completions) provider. Users select it via authType: "openai-responses" in their model providers config.

What's new

  • New openai-responses auth type — routes to /v1/responses instead of /v1/chat/completions
  • SSE event converter — parses all Responses API streaming events into Gemini-format responses
  • previous_response_id + incremental items — server-side history, only new items sent per turn
  • prompt_cache_key — conversation-scoped server caching for faster TTFT
  • Native reasoning controlseffort, summary, include: ["reasoning.encrypted_content"] with full round-trip
  • truncation: {type: "auto"} — prevents hard 400 on context overflow
  • parallel_tool_calls, tool_choice: "auto" — Codex-parity tool handling
  • Dual-path compaction — remote (/v1/responses/compact) for openai-responses, inline for all others
  • Pipeline state reset after compaction to prevent stale previous_response_id
  • Pre-workresolveModel(), tokenEstimationScaleFactor(), Turn.getResponseText(), cleanToolSchema(), tool response dedup, abort error handling

Test plan

  • 76 new unit tests across converter, pipeline, and compaction client
  • 4905 total tests passing (up from 4829)
  • Full build green (tsc + eslint + vscode companion)
  • Zero regressions to existing Chat Completions, Anthropic, or Gemini paths
  • Integration test against LLM proxy with a Codex model (manual)
  • Dogfooding with qwen --auth-type openai-responses

Made with Cursor

netbrah added 17 commits March 21, 2026 16:09
Adds Dockerfile.sea and supporting scripts to produce a single
self-contained Node.js SEA binary for headless Linux deployment.

- sea/sea-launch.cjs: CJS entry point that extracts embedded ESM
  bundle to a versioned temp dir and dynamically imports it
- scripts/build_binary.js: generates SEA config, blob, and injects
  into a copy of the Node binary via postject
- Dockerfile.sea: multi-stage build targeting linux/amd64 via buildx
- .gitignore: exclude .bin/ build output and local scratch files

Made-with: Cursor
Allows running multiple qwen-code instances with isolated configs.
When QWEN_CODE_HOME is set, the global config dir uses that path
instead of ~/.qwen/. Enables hermetic distribution (ontap-apex) to
use ~/.apex/.qwen/ without conflicting with personal ~/.qwen/ config.

Centralises global dir resolution through Storage.getGlobalQwenDir()
in todoWrite, memoryDiscovery, and settings.ts instead of ad-hoc
path.join(homeDir, QWEN_DIR) construction.

Made-with: Cursor
Keep .bin/ directory structure in repo but ignore build artifacts
(ontap-apex launcher, SEA binaries, etc.) that live there locally.

Made-with: Cursor
…file

TypeScript strict mode requires process.env['KEY'] not process.env.KEY
for index signature access. Also set HUSKY=0 in Dockerfile.sea to
skip git hooks during container builds without --ignore-scripts.

Made-with: Cursor
When QWEN_CODE_BRAND is set (e.g. "APEX"), the header banner and
window title use it instead of "Qwen Code" / "Qwen". Allows the
hermetic distribution to show its own branding without source changes.

Made-with: Cursor
SEA binaries can't accept V8 flags (--max-old-space-size) as argv —
they get parsed by yargs as unknown CLI arguments. Detect SEA mode
via node:sea module and skip the memory tuning relaunch entirely.

Made-with: Cursor
relaunchAppInChildProcess passes process.argv[1] as the script path,
but SEA binaries have no script — argv[1] is undefined, which gets
stringified and interpreted by yargs as a one-shot prompt, preventing
interactive mode from launching.

Made-with: Cursor
When QWEN_CODE_BRAND=APEX, the splash screen shows an APEX ASCII
art logo instead of the default QWEN logo. Extensible via brandLogos
map for future brand variants.

Made-with: Cursor
Made-with: Cursor
Add context budget trimming that truncates older conversation turns before
hitting the provider API, preventing context-window overflows. Includes
test expectations updated for the thinking-budget headroom logic that bumps
max_tokens by 8000 when thinking is enabled.

Made-with: Cursor
Mark getHistory() return as readonly Content[] to prevent accidental
mutation. Update stripStartupContext to accept readonly arrays and spread
at the arena boundary where mutable Content[] is required.

Made-with: Cursor
Increase default maxAttempts from 7 to 10 and add detection for transient
SSL/TLS errors (EPROTO, DEPTH_ZERO_SELF_SIGNED_CERT, etc.) and network
errors (ECONNRESET, ETIMEDOUT) to trigger automatic retries with backoff.

Made-with: Cursor
When a subagent attempts to use a tool not in its allowed list, return a
structured error with the tool name and available alternatives instead of
a generic "not found" message. Updated test expectation accordingly.

Made-with: Cursor
Detects when the model generates lazy placeholder text like
"(rest of methods ...)" or "// unchanged code ..." instead of actual
code in edit and write_file tool calls. Prevents accidental code
deletion by blocking these at validation time.

- edit.ts: Compares old_string vs new_string placeholders, blocks
  new placeholders not present in original
- write-file.ts: Blocks content with any omission placeholders
- 6 unit tests: standalone, case-insensitive, multi-line, false
  positive avoidance, inline comments, unrelated ellipsis

Made-with: Cursor
Add readOnlyTools boolean to MCPServerConfig. When true, all
discovered tools default to readOnlyHint: true unless the server
explicitly overrides with readOnlyHint: false in its annotations.

This enables MCP tools from read-only servers to benefit from
contiguous read-only tool parallelization.

Fixes QwenLM#2564

Made-with: Cursor
…tree utils

Ported from gemini-cli:
- read_many_files tool with glob patterns
- JIT context discovery (auto-inject AGENTS.md from subdirectories)
- Git worktree utilities
- Tool error type extensions

Made-with: Cursor
- Read-only tool parallelization (Kind.Read, Kind.Search, Kind.Fetch)
- Tool output masking service with telemetry
- Dynamic tool output truncation based on context pressure
- Token estimation utilities (tokenCalculation.ts)
- Settings schema updates for tool output masking
- Sync script for portable settings

Made-with: Cursor
Full Codex-parity implementation of the OpenAI Responses API as a new
provider type `openai-responses`, parallel to `openai` (Chat Completions).

New provider: AuthType.USE_OPENAI_RESPONSES
- SSE event converter (text, function calls, reasoning summaries)
- HTTP streaming pipeline with previous_response_id + incremental items
- prompt_cache_key, reasoning, verbosity, service_tier, tool_choice
- truncation: auto, parallel_tool_calls, extra_body merge
- encrypted_content round-trip for reasoning continuity

Dual-path compaction:
- Remote via POST /v1/responses/compact (openai-responses)
- Inline via existing ChatCompressionService (all other providers)
- Pipeline state reset after compaction
- Auto-compact threshold 70% -> 90% (Codex default)

Pre-work: resolveModel, tokenEstimationScaleFactor, Turn.getResponseText,
cleanToolSchema, tool response dedup, abort error handling in stream consumer.

76 new tests, 4905 total passing, zero regressions.

Made-with: Cursor
@netbrah netbrah force-pushed the feat/openai-responses-api branch from bd23459 to f493c6a Compare March 22, 2026 03:19
@netbrah
Copy link
Copy Markdown
Contributor Author

netbrah commented Mar 22, 2026

Opened against upstream by mistake — this is a fork-internal review PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant