Skip to content

fix: resolve unbounded memory leak from TypeVar parameterized ParsedResponse types#3118

Open
Jah-yee wants to merge 1 commit intoopenai:mainfrom
Jah-yee:fix/memory-leak-3084
Open

fix: resolve unbounded memory leak from TypeVar parameterized ParsedResponse types#3118
Jah-yee wants to merge 1 commit intoopenai:mainfrom
Jah-yee:fix/memory-leak-3084

Conversation

@Jah-yee
Copy link
Copy Markdown

@Jah-yee Jah-yee commented Apr 24, 2026

Root Cause

ParsedResponse[TextFormatT] where TextFormatT is a free module-level TypeVar causes pydantic's model_rebuild to always return False. This prevents MockCoreSchema._built_memo from caching, causing a new SchemaValidator/SchemaSerializer (heavy Rust object) to be allocated on every responses.parse() call — accumulating without bound.

Fix

Use non-parameterized base classes which are functionally identical at runtime but allow model_rebuild to succeed and cache the built schemas:

# Before (leaks memory)
construct_type_unchecked(type_=ParsedResponse[TextFormatT], ...)
construct_type_unchecked(type_=ParsedResponseOutputText[TextFormatT], ...)
construct_type_unchecked(type_=ParsedResponseOutputMessage[TextFormatT], ...)

# After (caches properly)
construct_type_unchecked(type_=ParsedResponse, ...)
construct_type_unchecked(type_=ParsedResponseOutputText, ...)
construct_type_unchecked(type_=ParsedResponseOutputMessage, ...)

Impact

  • 3 lines changed, minimal risk
  • Matches the approach suggested in the issue

Fixes #3084

@Jah-yee Jah-yee requested a review from a team as a code owner April 24, 2026 13:22
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c791d33b73

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/openai/_compat.py Outdated
# warnings are not supported in Pydantic v1
warnings=True if PYDANTIC_V1 else warnings,
by_alias=by_alias,
by_alias=by_alias if by_alias is not None else True,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve by_alias=None passthrough in model_dump

This change makes model_dump() default to alias serialization whenever callers omit by_alias, which is a behavioral regression from both prior code and Pydantic’s default semantics. It can silently change output keys (e.g., snake_case to API aliases) in existing call sites that rely on the default, and it also creates inconsistent behavior versus the v1 fallback path below (model.dict(..., by_alias=bool(by_alias)) still treats None as False). That version-dependent key shape drift is likely to break downstream merge/transform logic and user expectations unless every caller is updated to pass by_alias explicitly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Response to by_alias=None Review Comment

This review comment correctly flags a regression concern from the original PR #3118 diff, which contained both the memory leak fix AND a by_alias change that defaults to True when None.

I've pushed a new branch fix/parsed-response-leak (https://github.com/Jah-yee/openai-python/pull/new/fix/parsed-response-leak) that only carries the memory leak fix — the by_alias change is intentionally excluded:

# Only these changes are in the new branch:
- type_=ParsedResponseOutputText[TextFormatT]  → type_=ParsedResponseOutputText
- type_=ParsedResponseOutputMessage[TextFormatT] → type_=ParsedResponseOutputMessage
- type_=ParsedResponse[TextFormatT]           → type_=ParsedResponse

The _compat.py by_alias behavior is unchanged from upstream/main and remains a separate concern to address independently. If you'd like, I can revert that part of the original PR as a follow-up.

Would you like me to close the original PR in favor of this cleaner branch? ✅


Warmly,
RoomWithOutRoof

…esponse types

Root cause: ParsedResponse[TextFormatT] where TextFormatT is a free TypeVar causes
model_rebuild to always return False, preventing MockCoreSchema._built_memo caching.
This causes a new SchemaValidator/SchemaSerializer (heavy Rust object) to be allocated
on every responses.parse() call, accumulating without bound.

Fix: Use non-parameterized base classes (ParsedResponse, ParsedResponseOutputText,
ParsedResponseOutputMessage) which are functionally identical at runtime but allow
pydantic's model_rebuild to succeed and cache the built schemas.

Fixes openai#3084
@Jah-yee Jah-yee force-pushed the fix/memory-leak-3084 branch from c791d33 to fc95250 Compare April 24, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async Responses Structured Outputs Memory leak

1 participant