Gemini context caching #1716

jobryan205 · 2025-07-17T15:02:44Z

jobryan205
Jul 17, 2025

Hi all,

I am trying to use Context Caching with my calls to Gemini via Instructor (https://ai.google.dev/gemini-api/docs/caching?lang=python). I can't seem to make it work, however. Does Instructor support Gemini's Context Caching abilities? If not, are there plans to support it in the future?

The following is a call to Gemini via the Gemini SDK that successfully uses context caching:

response = client.models.generate_content(
                    model=model.model_name,
                    contents=prompt,
                    config={
                        "max_output_tokens": max_tokens,
                        "cached_content": cache.name
                    }
                )

The following is my attempt to use context caching via Instructor:

response = instructor_client.messages.create(
                    messages=[{"role": "user", "content": prompt}],
                    response_model=schema,
                    model=model.model_name,
                    validation_context=validation_context,
                    generation_config={
                        "max_output_tokens": max_tokens,
                        "cached_content": cache.name
                    }
                )

The above doesn't work, however.

jonbuffington · 2025-07-28T07:46:51Z

jonbuffington
Jul 28, 2025

I am just starting to explore this also. Perhaps, the approach outlined by #940 (reply in thread) may help us.

0 replies

xXMrNidaXx · 2026-02-23T14:05:17Z

xXMrNidaXx
Feb 23, 2026

Gemini context caching with Instructor is powerful for reducing costs on long contexts!

Why it matters:

Cached context = ~75% cheaper
Faster TTFT on repeated prompts
Essential for RAG with large docs

Setup:

import instructor
import google.generativeai as genai
from pydantic import BaseModel

# Configure Gemini with caching
genai.configure(api_key="...")

# Create cached context
cache = genai.caching.CachedContent.create(
    model="gemini-1.5-pro-001",
    display_name="my-docs",
    contents=[{
        "role": "user",
        "parts": [large_document_content]
    }],
    ttl="3600s"  # 1 hour cache
)

# Use with Instructor
model = genai.GenerativeModel.from_cached_content(cache)
client = instructor.from_gemini(model)

class Analysis(BaseModel):
    summary: str
    key_points: list[str]

result = client.chat.completions.create(
    response_model=Analysis,
    messages=[{"role": "user", "content": "Analyze the document"}]
)

Tips:

Cache the static context (docs), vary the query
Monitor cache hits via API
Set appropriate TTL based on doc freshness

We've saved significant costs using Gemini caching at RevolutionAI for document analysis pipelines.

What's your context size and query pattern?

0 replies

xXMrNidaXx · 2026-02-23T15:35:35Z

xXMrNidaXx
Feb 23, 2026

Gemini context caching is huge for cost! At RevolutionAI (https://revolutionai.io) we use this.

Setup:

import instructor
import google.generativeai as genai

# Create cached context
cache = genai.caching.CachedContent.create(
    model="gemini-1.5-pro",
    contents=[large_document],
    ttl=timedelta(hours=1)
)

# Use with Instructor
client = instructor.from_gemini(
    genai.GenerativeModel(
        model_name="gemini-1.5-pro",
        cached_content=cache
    )
)

# Queries use cached context
result = client.chat.completions.create(
    response_model=MyModel,
    messages=[{"role": "user", "content": "Summarize section 3"}]
)

Cost savings:

Cached tokens: 75% cheaper
Amortized over many queries

Perfect for RAG with large docs!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gemini context caching #1716

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Gemini context caching #1716

Uh oh!

jobryan205 Jul 17, 2025

Replies: 3 comments

Uh oh!

jonbuffington Jul 28, 2025

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

jobryan205
Jul 17, 2025

jonbuffington
Jul 28, 2025

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026