feat(proxy): add /v1/memory CRUD endpoints#26218
feat(proxy): add /v1/memory CRUD endpoints#26218krrish-berri-2 merged 21 commits intolitellm_internal_stagingfrom
Conversation
New LiteLLM_MemoryTable stores user/team-scoped key/value entries with
optional JSON metadata. Value is a String (LLM-readable text) and metadata
is an optional Json? envelope, matching the Letta + mem0 hybrid model so
future structured fields can be added without a schema migration.
Endpoints:
POST /v1/memory - create
GET /v1/memory - list (caller-scoped; admins see all)
GET /v1/memory/{key} - fetch one
PUT /v1/memory/{key} - upsert
DELETE /v1/memory/{key} - delete
Non-admin callers cannot set a user_id/team_id other than their own.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
Greptile SummaryThis PR adds a Most issues surfaced in earlier review rounds have been resolved: the
Confidence Score: 4/5Safe to merge after addressing the direct DB query in _is_team_admin_for; all other previously-raised P1/P0 issues have been resolved. One remaining P1 — the raw team-table DB query in _is_team_admin_for bypasses the caching layer that every other team-management endpoint relies on. All other issues flagged in prior review rounds have been addressed: metadata sentinel pattern, upsert race condition, visibility filter merge, encodeMemoryKeyForPath, orphan row rejection, and unique-violation detection. litellm/proxy/memory/memory_endpoints.py (_is_team_admin_for direct DB call); litellm/proxy/schema.prisma (missing @@Map)
|
| Filename | Overview |
|---|---|
| litellm/proxy/memory/memory_endpoints.py | New CRUD endpoints for /v1/memory. Previous P1/P0 issues (race condition, metadata sentinel, visibility filter merge, orphan rows) have all been addressed; the direct-DB query in _is_team_admin_for remains. |
| litellm/types/memory_management.py | Pydantic models for memory endpoints; correctly uses Optional[Any] so model_fields_set can distinguish omitted vs explicit-null metadata. |
| ui/litellm-dashboard/src/components/networking.tsx | Adds memory CRUD API functions; encodeMemoryKeyForPath correctly preserves literal slashes for the {key:path} backend route while encoding all other unsafe characters. |
| ui/litellm-dashboard/src/components/MemoryView/MemoryView.tsx | Memory UI with server-side pagination, prefix search, and CRUD. The metadata clear path correctly sends explicit null on edit so the backend's model_fields_set registers the intent. |
| litellm/proxy/schema.prisma | Adds LiteLLM_MemoryTable model with key String @unique (globally unique); missing @@Map annotation that all sibling models use for explicitness. |
| tests/test_litellm/proxy/memory/test_memory_endpoints.py | 20 mock-only unit tests using FastAPI TestClient with in-memory fakes; no real network calls, compliant with repo's test isolation policy. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Incoming /v1/memory request] --> B[user_api_key_auth]
B --> C{Endpoint?}
C -->|POST /v1/memory| D[_resolve_scope\nenforce user_id/team_id\nreject orphan rows]
D --> E[DB create\nunique key or 409]
C -->|GET /v1/memory| F[_visibility_filter\nNone=admin sees all\nOR clause for user/team]
F --> G[DB find_many\npaginated + key prefix]
C -->|GET /v1/memory/key| H[_find_memory_for_caller\nAND key + vis filter]
H --> I{Found?}
I -->|Yes| J[Return row]
I -->|No| K[404]
C -->|PUT /v1/memory/key| L[_find_memory_for_caller]
L --> M{Exists?}
M -->|Yes| N[_assert_write_access\nowner or team admin]
N --> O[DB update]
M -->|No| P[_resolve_scope]
P --> Q[DB create\nrace catch + re-update]
C -->|DELETE /v1/memory/key| R[_find_memory_for_caller]
R --> S[_assert_write_access]
S --> T[DB delete]
Reviews (18): Last reviewed commit: "fix(schema): close LiteLLM_MemoryTable m..." | Re-trigger Greptile
Prisma's Python client rejects `metadata=None` on a `Json?` field with "A value is required but not set" — the field must be omitted from the `data` dict entirely to store SQL NULL. Build the create payload conditionally in both `create_memory` and the PUT-create branch of `upsert_memory`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Low: New memory CRUD endpoints with standard authThis PR adds Status: 0 open Posted by Veria AI · 2026-04-23T14:31:12.762Z |
Adds a new "Memory" sidebar item under Tools so users can see what their agents have stored. Lists all memories visible to the caller (scoped by the backend), with a key-search filter, preview column, scope tags, and view/edit/delete actions. Create modal accepts optional JSON metadata. - networking.tsx: fetchMemoryList / createMemory / updateMemory / deleteMemory wired to the /v1/memory CRUD endpoints. - MemoryView + MemoryEditModal: new antd-based components (per CLAUDE.md: use antd for new UI, not tremor). - page.tsx + leftnav.tsx: wire the "memory" route + sidebar entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Low: Cross-tenant key existence oracle via globally unique constraintThis PR adds CRUD endpoints for user/team-scoped memory entries. Authentication is properly enforced on all routes via Status: 6 open Posted by Veria AI · 2026-04-25T00:55:15.669Z |
Backend:
- GET /v1/memory now accepts `key_prefix` for Redis-style namespace
scans (e.g. `?key_prefix=user:`). When both `key` and `key_prefix`
are passed, `key_prefix` wins.
- Prefix filter sits under the visibility filter in the Prisma where
clause, so it can never leak rows across user/team scopes.
- New tests: prefix match, and cross-scope isolation (another user's
`user:*` rows must not appear in the caller's results).
UI:
- Memory moved from a Tools submenu to a top-level AI GATEWAY item
(alongside Agents, MCP Servers, Skills) — it's an API primitive,
not a tool-management surface.
- Search box now drives prefix search, matching the Redis mental
model ("type the namespace, see everything under it").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The unique constraint `(key, user_id, team_id)` on LiteLLM_MemoryTable silently allowed duplicates when user_id or team_id was NULL, because Postgres treats every NULL as distinct by default (ANSI semantics). A caller with no team_id could POST the same key three times and get three rows. Migration: 1. Dedupe existing rows, keeping the most recent per (key, user_id, team_id), using `IS NOT DISTINCT FROM` so NULL == NULL. 2. Drop the old unique index. 3. Recreate it with `NULLS NOT DISTINCT` (Postgres 15+). No code change: POST already returns 409 on unique-violation error messages — it just wasn't firing before because the constraint didn't catch the NULL-team case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches from the compound unique `(key, user_id, team_id)` to a simple `key @unique`. The compound form silently allowed duplicates when user_id or team_id was NULL (Postgres treats each NULL as distinct), so callers could POST the same key repeatedly. Globally-unique key means one row per key, period — any duplicate create → 409. - schema.prisma (×3): `key String @unique`, drop `@@unique(...)`. - initial add_memory_table migration: unique index on (key) only. - Remove the now-unused follow-up NULLS NOT DISTINCT migration. - Endpoint error message simplified ("already exists" — no "for this scope"). - Test fake's create() now enforces global key uniqueness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two Veria findings: **High — cross-user memory tampering via team membership.** The visibility filter uses an OR (`user_id == caller OR team_id == caller`) so team members can SEE each other's team-scoped rows. That's intentional for list/get. But because PUT/DELETE used the same filter to find the target row, any team member could overwrite or delete a teammate's *personal* row whenever both `user_id` and `team_id` were stamped on it — broader visibility was being silently treated as broader authority. New `_assert_write_access(row, caller)` enforces ownership for mutations. Non-admin rules: - The row's `user_id` must match the caller (personal ownership), OR - The row has no `user_id` and its `team_id` matches the caller's team (a "pure team row" intended for shared writes). Admins bypass the check. The same gate runs in PUT (both regular and post-race-recovery branches) and DELETE. **Medium — DB internals leaked through 500 detail.** Every `except` block was raising `HTTPException(500, detail=str(e))`, which surfaces Prisma error strings (table/column names, host:port, error class names) to API callers. New `_internal_error()` helper logs the real exception server-side and returns a generic, caller-safe `detail`. Applied to create, list, upsert (general fallthrough), and delete. Also tightened the race-recovery 409 message to drop the "in a different scope" wording — the caller never needs to know whose scope it lives in. Tests (+5): - teammate cannot overwrite personal row → 403 - teammate cannot delete personal row → 403 - teammate CAN modify pure team row (no user_id stamped) → 200 - admin bypasses write-auth → 200 - 500 response never echoes Prisma internals (table/host/class names) 25/25 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
On the cross-tenant key-existence oracle (medium): acknowledged but intentional. The global-unique-key design is the chosen model (matches Redis What we did mitigate: the 409 detail string is now If a deployment wants per-tenant key isolation, callers prefix their keys ( |
Tightens the write-authorization rule for "pure team rows" (rows with no user_id stamped, only team_id) to match the pattern used by team-management endpoints (`_is_user_team_admin` + `_is_user_org_admin_for_team`): - Plain team members can READ team rows via the OR visibility filter (intentional, unchanged). - Only PROXY_ADMIN, team admins of the row's team_id, or org admins for the team's organization may MODIFY them. Plain members get 403. `_assert_write_access` is now async and takes the prisma_client so it can fetch the team and run the existing `_is_user_team_admin` / `_is_user_org_admin_for_team` helpers from `litellm.proxy.management_endpoints.common_utils`. The org-admin path is best-effort: it calls `get_user_object`, which depends on the proxy_server module being initialized, so any exception there is treated as "not an org admin" rather than crashing the request. Tests: - team admin can modify pure team row → 200 - plain team member cannot modify pure team row → 403 - plain team member cannot delete pure team row → 403 Updates the test fake to add a tiny `litellm_teamtable.find_unique` implementation and a `_make_team(team_id, admin_user_ids=[...])` helper. 27/27 unit tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| try: | ||
| team_obj = await prisma_client.db.litellm_teamtable.find_unique( | ||
| where={"team_id": team_id} | ||
| ) | ||
| except Exception as e: | ||
| verbose_proxy_logger.exception( | ||
| "Error loading team for write-auth check (team_id=%s): %s", team_id, e | ||
| ) | ||
| return False |
There was a problem hiding this comment.
Direct DB query bypasses the team caching layer
_is_team_admin_for calls prisma_client.db.litellm_teamtable.find_unique() directly instead of the get_team_object helper (which reads from the Redis/in-process cache). Every PUT or DELETE on a team-scoped memory entry triggers a synchronous DB round-trip. All other team-management endpoints use the shared helper to benefit from caching and avoid DB regression under load — this path should do the same.
Rule Used: What: In critical path of request, there should be... (source)
Two CI failures:
1. mypy: `_find_memory_for_caller` had `key_filter` inferred as
`dict[str, str]` (literal type) and the conditional `{"AND": [key_filter, vis]}`
returned `dict[str, list[...]]`, so the join site failed
`dict-item` typing. Annotate both intermediates as `dict` so mypy
widens the value type.
2. UI test (`page_utils.test.ts > should have descriptions for all
pages`): every leftnav entry must have a description in
`page_metadata.ts`, and `memory` was missing. Added a one-line
description, matching the style of neighboring entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 31539530 | Triggered | Generic Password | f775fa1 | .github/workflows/_test-unit-services-base.yml | View secret |
| 29203053 | Triggered | Generic Password | f775fa1 | .circleci/config.yml | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
* feat(openai): day-0 support for GPT-5.5 and GPT-5.5 Pro
Add pricing + capability entries for the new GPT-5.5 family launched by
OpenAI on 2026-04-24:
- gpt-5.5 / gpt-5.5-2026-04-23 (chat): $5/$30/$0.50 per 1M
input/output/cached input
- gpt-5.5-pro / gpt-5.5-pro-2026-04-23 (responses-only): $60/$360/$6
per 1M input/output/cached input
Other fees (long-context >272k, flex, batches, priority, cache
discounts) follow the same ratios as GPT-5.4, with context window
retained at 1.05M input / 128K output.
No transformation / classifier code changes are required:
OpenAIGPT5Config.is_model_gpt_5_4_plus_model() already matches 5.5+ via
numeric version parsing, and model registration is driven from the
JSON. The existing responses-API bridge for tools + reasoning_effort
(litellm/main.py:970) already covers gpt-5.5-pro.
Tests:
- GPT5_MODELS regression list now covers gpt-5.5-pro and dated variants
- New test_generic_cost_per_token_gpt55_pro cost-calc test
- Updated test_generic_cost_per_token_gpt55 for long-context fields
* fix(openai): mirror reasoning_effort flags onto gpt-5.5 dated variants
gpt-5.5-2026-04-23 and gpt-5.5-pro-2026-04-23 were missing the
supports_none_reasoning_effort, supports_xhigh_reasoning_effort, and
supports_minimal_reasoning_effort flags that their non-dated
counterparts define. Reasoning-effort routing in OpenAIGPT5Config is
fully capability-driven from these JSON flags — since an absent flag
is treated as False for opt-in levels (xhigh), users pinning to a
dated snapshot would silently lose xhigh support and diverge from the
base alias on logprobs + flexible temperature handling.
Copy the flags onto both dated variants so every dated snapshot
inherits the base model's reasoning-effort capability profile.
Adds a parametrized regression test that asserts
supports_{none,minimal,xhigh}_reasoning_effort parity between each
dated variant and its non-dated counterpart, preventing future drift
when new snapshots are added.
The rebase against `litellm_internal_staging` (which added
`LiteLLM_AdaptiveRouterState` / `LiteLLM_AdaptiveRouterSession`) left
the closing brace of `LiteLLM_MemoryTable` missing in all three
schema copies — the next model declaration ended up parsed as a field
of the memory table, surfacing as the CI prisma error:
error: This line is not a valid field or attribute definition.
--> schema.prisma:1250
|
1249 | // Per-(router, request_type, model) Beta posterior for the adaptive router.
1250 | model LiteLLM_AdaptiveRouterState {
Add the missing `}` (and the standard blank line) after the memory
table's `@@index([team_id])` in `schema.prisma`,
`litellm/proxy/schema.prisma`, and
`litellm-proxy-extras/litellm_proxy_extras/schema.prisma`.
`prisma generate --schema litellm/proxy/schema.prisma` now runs clean;
27/27 memory unit tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| _is_user_team_admin, | ||
| ) | ||
|
|
||
| try: | ||
| team_obj = await prisma_client.db.litellm_teamtable.find_unique( | ||
| where={"team_id": team_id} | ||
| ) | ||
| except Exception as e: | ||
| verbose_proxy_logger.exception( | ||
| "Error loading team for write-auth check (team_id=%s): %s", team_id, e | ||
| ) | ||
| return False | ||
| if team_obj is None: | ||
| return False | ||
|
|
||
| if _is_user_team_admin(user_api_key_dict=user_api_key_dict, team_obj=team_obj): | ||
| return True | ||
|
|
||
| # Org-admin path is best-effort: it pulls from the user cache via | ||
| # `get_user_object` which depends on the proxy_server module being |
There was a problem hiding this comment.
Direct DB query bypasses the team caching layer (rule violation)
_is_team_admin_for calls prisma_client.db.litellm_teamtable.find_unique() directly on every PUT/DELETE that touches a team-scoped row. Per the repository's performance rule, DB queries in endpoint handlers must go through the shared helper functions (get_team_object etc.) so they benefit from the Redis / in-process cache. Each team-write without that cache hits the DB synchronously, which regresses under load in the same way the rule was introduced to prevent.
Rule Used: What: In critical path of request, there should be... (source)
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
70492ce
into
litellm_internal_staging
Summary
LiteLLM_MemoryTablestores memory entries with a globally uniquekeyplus optional JSON metadata.valueis aString(LLM-readable text) andmetadatais an optionalJson?envelope — the Letta + mem0 hybrid — so future structured fields can be added without a schema migration. Callers namespace their own keys (e.g.user:123:notes) if per-user isolation is needed./v1/memory:POST /v1/memory— create (409 on any duplicate key)GET /v1/memory— list with optionalkey_prefix(Redis-style namespace scan); caller-scoped, admins see allGET /v1/memory/{key}— fetch one (slashes in keys supported via{key:path})PUT /v1/memory/{key}— upsert (idempotent under concurrent writes; explicitmetadata: nullclears the column)DELETE /v1/memory/{key}— deleteuser_id/team_idother than their own; a caller with neither user_id nor team_id is rejected (400) to avoid orphan rows.PROXY_ADMINcan scope to any user/team.useMutation+queryClient.invalidateQueriesso every cached page refetches on change.Test plan
tests/test_litellm/proxy/memory/test_memory_endpoints.py(20 tests, all passing):user_id→ 403key_prefixfilter works and never leaks across scopesmetadata: nullclears the column