Skip to content

feat(proxy): add /v1/memory CRUD endpoints#26218

Merged
krrish-berri-2 merged 21 commits intolitellm_internal_stagingfrom
litellm_memory_endpoints
Apr 25, 2026
Merged

feat(proxy): add /v1/memory CRUD endpoints#26218
krrish-berri-2 merged 21 commits intolitellm_internal_stagingfrom
litellm_memory_endpoints

Conversation

@krrish-berri-2
Copy link
Copy Markdown
Contributor

@krrish-berri-2 krrish-berri-2 commented Apr 22, 2026

Screenshot 2026-04-23 at 10 04 12 PM

Summary

  • New LiteLLM_MemoryTable stores memory entries with a globally unique key plus optional JSON metadata. value is a String (LLM-readable text) and metadata is an optional Json? envelope — the Letta + mem0 hybrid — so future structured fields can be added without a schema migration. Callers namespace their own keys (e.g. user:123:notes) if per-user isolation is needed.
  • Adds CRUD endpoints under /v1/memory:
    • POST /v1/memory — create (409 on any duplicate key)
    • GET /v1/memory — list with optional key_prefix (Redis-style namespace scan); caller-scoped, admins see all
    • GET /v1/memory/{key} — fetch one (slashes in keys supported via {key:path})
    • PUT /v1/memory/{key} — upsert (idempotent under concurrent writes; explicit metadata: null clears the column)
    • DELETE /v1/memory/{key} — delete
  • Non-admin callers cannot set user_id / team_id other than their own; a caller with neither user_id nor team_id is rejected (400) to avoid orphan rows. PROXY_ADMIN can scope to any user/team.
  • UI: new Memory page in the AI GATEWAY sidebar. Full CRUD with server-paginated table, prefix search, detail drawer, and a create/edit modal (optional JSON metadata). Writes go through useMutation + queryClient.invalidateQueries so every cached page refetches on change.

Test plan

  • Unit tests in tests/test_litellm/proxy/memory/test_memory_endpoints.py (20 tests, all passing):
    • Create defaults scope to caller
    • Create duplicate key → 409
    • Non-admin setting foreign user_id → 403
    • Admin can set any scope
    • Identity-less caller → 400 (prevents orphan rows)
    • List is caller-scoped; admin sees all
    • key_prefix filter works and never leaks across scopes
    • GET by key respects visibility (404 if row belongs to someone else)
    • PUT creates when missing, updates when present
    • PUT admin can bootstrap foreign scope
    • PUT unique-violation race → falls through to update
    • PUT explicit metadata: null clears the column
    • PUT omitted metadata preserves the column
    • PUT with empty body → 400
    • DELETE removes row; 404 when row not visible
  • Prisma migration applied on a fresh DB (tested locally against Neon PG16).
  • Smoke test the endpoints against a running proxy (verified POST/GET/PUT/DELETE via curl + via the new UI).

New LiteLLM_MemoryTable stores user/team-scoped key/value entries with
optional JSON metadata. Value is a String (LLM-readable text) and metadata
is an optional Json? envelope, matching the Letta + mem0 hybrid model so
future structured fields can be added without a schema migration.

Endpoints:
  POST   /v1/memory         - create
  GET    /v1/memory         - list (caller-scoped; admins see all)
  GET    /v1/memory/{key}   - fetch one
  PUT    /v1/memory/{key}   - upsert
  DELETE /v1/memory/{key}   - delete

Non-admin callers cannot set a user_id/team_id other than their own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 22, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
0 out of 2 committers have signed the CLA.

❌ krrish-berri-2
❌ mateo-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR adds a /v1/memory CRUD API (POST/GET/PUT/DELETE) backed by a new LiteLLM_MemoryTable with a globally-unique key, Prisma schema and migration, 20 mock-only unit tests, and a full React UI page in the AI Gateway sidebar.

Most issues surfaced in earlier review rounds have been resolved: the encodeMemoryKeyForPath helper preserves literal slashes for {key:path} routes, the upsert race condition is caught and retried rather than surfacing a 500, model_fields_set correctly distinguishes explicit-null metadata from omitted metadata, and the visibility filter now uses a top-level AND to avoid dict-key collisions.

  • P1: _is_team_admin_for calls prisma_client.db.litellm_teamtable.find_unique() directly instead of the get_team_object cache helper, violating the repo's rule against raw DB queries on endpoint paths. Every PUT/DELETE on a team-scoped row triggers a synchronous DB round-trip that bypasses Redis/in-process caching.

Confidence Score: 4/5

Safe to merge after addressing the direct DB query in _is_team_admin_for; all other previously-raised P1/P0 issues have been resolved.

One remaining P1 — the raw team-table DB query in _is_team_admin_for bypasses the caching layer that every other team-management endpoint relies on. All other issues flagged in prior review rounds have been addressed: metadata sentinel pattern, upsert race condition, visibility filter merge, encodeMemoryKeyForPath, orphan row rejection, and unique-violation detection.

litellm/proxy/memory/memory_endpoints.py (_is_team_admin_for direct DB call); litellm/proxy/schema.prisma (missing @@Map)

Important Files Changed

Filename Overview
litellm/proxy/memory/memory_endpoints.py New CRUD endpoints for /v1/memory. Previous P1/P0 issues (race condition, metadata sentinel, visibility filter merge, orphan rows) have all been addressed; the direct-DB query in _is_team_admin_for remains.
litellm/types/memory_management.py Pydantic models for memory endpoints; correctly uses Optional[Any] so model_fields_set can distinguish omitted vs explicit-null metadata.
ui/litellm-dashboard/src/components/networking.tsx Adds memory CRUD API functions; encodeMemoryKeyForPath correctly preserves literal slashes for the {key:path} backend route while encoding all other unsafe characters.
ui/litellm-dashboard/src/components/MemoryView/MemoryView.tsx Memory UI with server-side pagination, prefix search, and CRUD. The metadata clear path correctly sends explicit null on edit so the backend's model_fields_set registers the intent.
litellm/proxy/schema.prisma Adds LiteLLM_MemoryTable model with key String @unique (globally unique); missing @@Map annotation that all sibling models use for explicitness.
tests/test_litellm/proxy/memory/test_memory_endpoints.py 20 mock-only unit tests using FastAPI TestClient with in-memory fakes; no real network calls, compliant with repo's test isolation policy.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming /v1/memory request] --> B[user_api_key_auth]
    B --> C{Endpoint?}
    C -->|POST /v1/memory| D[_resolve_scope\nenforce user_id/team_id\nreject orphan rows]
    D --> E[DB create\nunique key or 409]
    C -->|GET /v1/memory| F[_visibility_filter\nNone=admin sees all\nOR clause for user/team]
    F --> G[DB find_many\npaginated + key prefix]
    C -->|GET /v1/memory/key| H[_find_memory_for_caller\nAND key + vis filter]
    H --> I{Found?}
    I -->|Yes| J[Return row]
    I -->|No| K[404]
    C -->|PUT /v1/memory/key| L[_find_memory_for_caller]
    L --> M{Exists?}
    M -->|Yes| N[_assert_write_access\nowner or team admin]
    N --> O[DB update]
    M -->|No| P[_resolve_scope]
    P --> Q[DB create\nrace catch + re-update]
    C -->|DELETE /v1/memory/key| R[_find_memory_for_caller]
    R --> S[_assert_write_access]
    S --> T[DB delete]
Loading

Reviews (18): Last reviewed commit: "fix(schema): close LiteLLM_MemoryTable m..." | Re-trigger Greptile

Comment thread litellm/proxy/memory/memory_endpoints.py
Comment thread litellm/proxy/memory/memory_endpoints.py Outdated
Prisma's Python client rejects `metadata=None` on a `Json?` field with
"A value is required but not set" — the field must be omitted from the
`data` dict entirely to store SQL NULL. Build the create payload
conditionally in both `create_memory` and the PUT-create branch of
`upsert_memory`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@veria-ai
Copy link
Copy Markdown

veria-ai Bot commented Apr 23, 2026

Low: New memory CRUD endpoints with standard auth

This PR adds /v1/memory CRUD endpoints scoped by user/team identity. Authentication uses the standard user_api_key_auth middleware, database access goes through Prisma ORM (parameterized queries), and scope enforcement prevents non-admins from writing to other users' scopes. The routes are not currently registered in any of the LiteLLMRoutes allowed-route lists, so only PROXY_ADMIN users can reach them today. No significant security issues found.


Status: 0 open
Risk: 2/10

Posted by Veria AI · 2026-04-23T14:31:12.762Z

Adds a new "Memory" sidebar item under Tools so users can see what their
agents have stored. Lists all memories visible to the caller (scoped by
the backend), with a key-search filter, preview column, scope tags, and
view/edit/delete actions. Create modal accepts optional JSON metadata.

- networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory
  wired to the /v1/memory CRUD endpoints.
- MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md:
  use antd for new UI, not tremor).
- page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread litellm/proxy/memory/memory_endpoints.py
Comment thread litellm/proxy/memory/memory_endpoints.py Outdated
@veria-ai
Copy link
Copy Markdown

veria-ai Bot commented Apr 23, 2026

Low: Cross-tenant key existence oracle via globally unique constraint

This PR adds CRUD endpoints for user/team-scoped memory entries. Authentication is properly enforced on all routes via user_api_key_auth. Write-authorization checks prevent cross-user mutation through team visibility. Internal error details are no longer leaked. The remaining design concern is that the globally unique key column combined with 409 responses allows authenticated users to probe for key existence across tenant boundaries, but exploitability is limited since keys are opaque strings chosen by callers and the information disclosed is minimal (existence only, not content).


Status: 6 open
Risk: 3/10

Posted by Veria AI · 2026-04-25T00:55:15.669Z

Comment thread litellm/proxy/memory/memory_endpoints.py Outdated
Backend:
- GET /v1/memory now accepts `key_prefix` for Redis-style namespace
  scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix`
  are passed, `key_prefix` wins.
- Prefix filter sits under the visibility filter in the Prisma where
  clause, so it can never leak rows across user/team scopes.
- New tests: prefix match, and cross-scope isolation (another user's
  `user:*` rows must not appear in the caller's results).

UI:
- Memory moved from a Tools submenu to a top-level AI GATEWAY item
  (alongside Agents, MCP Servers, Skills) — it's an API primitive,
  not a tool-management surface.
- Search box now drives prefix search, matching the Redis mental
  model ("type the namespace, see everything under it").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable
silently allowed duplicates when user_id or team_id was NULL, because
Postgres treats every NULL as distinct by default (ANSI semantics). A
caller with no team_id could POST the same key three times and get
three rows.

Migration:
1. Dedupe existing rows, keeping the most recent per (key, user_id,
   team_id), using `IS NOT DISTINCT FROM` so NULL == NULL.
2. Drop the old unique index.
3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+).

No code change: POST already returns 409 on unique-violation error
messages — it just wasn't firing before because the constraint didn't
catch the NULL-team case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches from the compound unique `(key, user_id, team_id)` to a simple
`key @unique`. The compound form silently allowed duplicates when
user_id or team_id was NULL (Postgres treats each NULL as distinct), so
callers could POST the same key repeatedly. Globally-unique key means
one row per key, period — any duplicate create → 409.

- schema.prisma (×3): `key String @unique`, drop `@@unique(...)`.
- initial add_memory_table migration: unique index on (key) only.
- Remove the now-unused follow-up NULLS NOT DISTINCT migration.
- Endpoint error message simplified ("already exists" — no "for this scope").
- Test fake's create() now enforces global key uniqueness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two Veria findings:

**High — cross-user memory tampering via team membership.** The
visibility filter uses an OR (`user_id == caller OR team_id == caller`)
so team members can SEE each other's team-scoped rows. That's
intentional for list/get. But because PUT/DELETE used the same filter
to find the target row, any team member could overwrite or delete a
teammate's *personal* row whenever both `user_id` and `team_id` were
stamped on it — broader visibility was being silently treated as
broader authority.

New `_assert_write_access(row, caller)` enforces ownership for
mutations. Non-admin rules:

- The row's `user_id` must match the caller (personal ownership), OR
- The row has no `user_id` and its `team_id` matches the caller's
  team (a "pure team row" intended for shared writes).

Admins bypass the check. The same gate runs in PUT (both regular
and post-race-recovery branches) and DELETE.

**Medium — DB internals leaked through 500 detail.** Every `except`
block was raising `HTTPException(500, detail=str(e))`, which surfaces
Prisma error strings (table/column names, host:port, error class
names) to API callers. New `_internal_error()` helper logs the real
exception server-side and returns a generic, caller-safe `detail`.
Applied to create, list, upsert (general fallthrough), and delete.

Also tightened the race-recovery 409 message to drop the "in a
different scope" wording — the caller never needs to know whose
scope it lives in.

Tests (+5):
- teammate cannot overwrite personal row → 403
- teammate cannot delete personal row → 403
- teammate CAN modify pure team row (no user_id stamped) → 200
- admin bypasses write-auth → 200
- 500 response never echoes Prisma internals (table/host/class names)

25/25 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@krrish-berri-2
Copy link
Copy Markdown
Contributor Author

On the cross-tenant key-existence oracle (medium): acknowledged but intentional.

The global-unique-key design is the chosen model (matches Redis SET / mem0 / Letta block semantics — one key, one value, callers namespace their own keys). A 409 inherently signals that some row with that key exists; the alternative is a per-scope unique constraint, which we explicitly decided against (NULL-in-compound-unique hazard, ambiguous routing for GET /v1/memory/{key}, and the silent-duplicate bug that originally motivated the switch — see #26218 commit history).

What we did mitigate: the 409 detail string is now "Memory with key 'X' already exists." with no scope information, and the post-race-recovery 409 dropped its old "in a different scope" wording in this push. The information disclosure is "this key is taken globally" — same as Redis SET … NX would give you, and the same as any registry of globally-unique slugs.

If a deployment wants per-tenant key isolation, callers prefix their keys (team:acme:notes, user:123:role). That's documented in the schema comment. Resolving as intentional.

Comment thread litellm/proxy/memory/memory_endpoints.py
Tightens the write-authorization rule for "pure team rows" (rows with
no user_id stamped, only team_id) to match the pattern used by
team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`):

- Plain team members can READ team rows via the OR visibility filter
  (intentional, unchanged).
- Only PROXY_ADMIN, team admins of the row's team_id, or org admins
  for the team's organization may MODIFY them. Plain members get 403.

`_assert_write_access` is now async and takes the prisma_client so it
can fetch the team and run the existing `_is_user_team_admin` /
`_is_user_org_admin_for_team` helpers from
`litellm.proxy.management_endpoints.common_utils`. The org-admin path
is best-effort: it calls `get_user_object`, which depends on the
proxy_server module being initialized, so any exception there is
treated as "not an org admin" rather than crashing the request.

Tests:
- team admin can modify pure team row → 200
- plain team member cannot modify pure team row → 403
- plain team member cannot delete pure team row → 403

Updates the test fake to add a tiny `litellm_teamtable.find_unique`
implementation and a `_make_team(team_id, admin_user_ids=[...])`
helper.

27/27 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +159 to +167
try:
team_obj = await prisma_client.db.litellm_teamtable.find_unique(
where={"team_id": team_id}
)
except Exception as e:
verbose_proxy_logger.exception(
"Error loading team for write-auth check (team_id=%s): %s", team_id, e
)
return False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Direct DB query bypasses the team caching layer

_is_team_admin_for calls prisma_client.db.litellm_teamtable.find_unique() directly instead of the get_team_object helper (which reads from the Redis/in-process cache). Every PUT or DELETE on a team-scoped memory entry triggers a synchronous DB round-trip. All other team-management endpoints use the shared helper to benefit from caching and avoid DB regression under load — this path should do the same.

Rule Used: What: In critical path of request, there should be... (source)

Two CI failures:

1. mypy: `_find_memory_for_caller` had `key_filter` inferred as
   `dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}`
   returned `dict[str, list[...]]`, so the join site failed
   `dict-item` typing. Annotate both intermediates as `dict` so mypy
   widens the value type.

2. UI test (`page_utils.test.ts > should have descriptions for all
   pages`): every leftnav entry must have a description in
   `page_metadata.ts`, and `memory` was missing. Added a one-line
   description, matching the style of neighboring entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented Apr 25, 2026

⚠️ GitGuardian has uncovered 2 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
31539530 Triggered Generic Password f775fa1 .github/workflows/_test-unit-services-base.yml View secret
29203053 Triggered Generic Password f775fa1 .circleci/config.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

mateo-berri and others added 2 commits April 24, 2026 17:50
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro

Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:

- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
  input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
  per 1M input/output/cached input

Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.

No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.

Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields

* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants

gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.

Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.

Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
The rebase against `litellm_internal_staging` (which added
`LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left
the closing brace of `LiteLLM_MemoryTable` missing in all three
schema copies — the next model declaration ended up parsed as a field
of the memory table, surfacing as the CI prisma error:

    error: This line is not a valid field or attribute definition.
      -->  schema.prisma:1250
       |
    1249 | // Per-(router, request_type, model) Beta posterior for the adaptive router.
    1250 | model LiteLLM_AdaptiveRouterState {

Add the missing `}` (and the standard blank line) after the memory
table's `@@index([team_id])` in `schema.prisma`,
`litellm/proxy/schema.prisma`, and
`litellm-proxy-extras/litellm_proxy_extras/schema.prisma`.

`prisma generate --schema litellm/proxy/schema.prisma` now runs clean;
27/27 memory unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines +156 to +175
_is_user_team_admin,
)

try:
team_obj = await prisma_client.db.litellm_teamtable.find_unique(
where={"team_id": team_id}
)
except Exception as e:
verbose_proxy_logger.exception(
"Error loading team for write-auth check (team_id=%s): %s", team_id, e
)
return False
if team_obj is None:
return False

if _is_user_team_admin(user_api_key_dict=user_api_key_dict, team_obj=team_obj):
return True

# Org-admin path is best-effort: it pulls from the user cache via
# `get_user_object` which depends on the proxy_server module being
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Direct DB query bypasses the team caching layer (rule violation)

_is_team_admin_for calls prisma_client.db.litellm_teamtable.find_unique() directly on every PUT/DELETE that touches a team-scoped row. Per the repository's performance rule, DB queries in endpoint handlers must go through the shared helper functions (get_team_object etc.) so they benefit from the Redis / in-process cache. Each team-write without that cache hits the DB synchronously, which regresses under load in the same way the rule was introduced to prevent.

Rule Used: What: In critical path of request, there should be... (source)

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

❌ Patch coverage is 27.35849% with 154 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/proxy/memory/memory_endpoints.py 14.91% 154 Missing ⚠️

📢 Thoughts on this report? Let us know!

@krrish-berri-2 krrish-berri-2 merged commit 70492ce into litellm_internal_staging Apr 25, 2026
118 of 119 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants