GPT token cache

Hi,

I noticed that when using the standard OpenAI Chat Completions API - where I only modify the final fragment of the user prompt while keeping the full system prompt and the majority (initial part) of the user prompt identical across requests - prompt caching works effectively, with most tokens being cached.

However, when using AX in a similar pattern (only changing one input field in the signature + a bit longer signature description, typically the last one), caching is rarely utilized. Is this due to how AX constructs the prompt structure, the order of inputs, or something else? Can this be addressed?

I believe this is a significant disadvantage of AX that might be relatively easy to fix.

**Context on GPT Prompt Caching:**
Token caching in GPT models like 4o-mini automatically stores prefixes of repeated input prompts (starting from 1024 tokens, in 128-token blocks), reducing latency by up to 80% and input costs by 50-90% on cache hits.

What do you think?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT token cache #486

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPT token cache #486

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions