feat: first-class workspace entities with membership and cross-workspace sharing#1734
feat: first-class workspace entities with membership and cross-workspace sharing#1734serrrfirat wants to merge 30 commits intostagingfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive database-backed user management system, enabling multi-tenant deployments with admin APIs for user CRUD, secret provisioning, and self-service token management. It also enhances workspace isolation, LLM usage tracking, and Docker build efficiency. Feedback focuses on preventing potential panics during UTF-8 string slicing, resolving inconsistencies between documentation and database migrations, and improving type safety by replacing hardcoded role strings with enums. Additionally, replacing browser-native prompts with custom UI components is recommended for better security and user experience.
| let prefix = if auth_token.len() >= 8 { | ||
| &auth_token[..8] | ||
| } else { | ||
| auth_token |
There was a problem hiding this comment.
The logic for prefix calculation can panic if auth_token[..8] slices in the middle of a multi-byte UTF-8 character or if the string is shorter than 8 bytes. Use character-aware methods like char_indices to safely truncate the string to a maximum of 8 characters.
let prefix = match auth_token.char_indices().nth(8) {
Some((idx, _)) => &auth_token[..idx],
None => &auth_token,
};References
- When truncating a UTF-8 string, use character-aware methods (e.g., char_indices() or iterating over chars()) to avoid panics caused by slicing in the middle of a multi-byte character.
| | Column | Type (PG / libSQL) | Notes | | ||
| |--------|--------------------|-------| | ||
| | `id` | `UUID` / `TEXT` | Primary key | | ||
| | `user_id` | `TEXT NOT NULL` | FK to `users.id` (PG cascades; libSQL explicit cleanup) | |
| email TEXT UNIQUE, -- nullable for token-only users | ||
| display_name TEXT NOT NULL, | ||
| status TEXT NOT NULL DEFAULT 'active', -- active | suspended | deactivated | ||
| role TEXT NOT NULL DEFAULT 'member', -- admin | member |
|
|
||
| CREATE TABLE api_tokens ( | ||
| id UUID PRIMARY KEY, | ||
| user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE, |
There was a problem hiding this comment.
| if role != "admin" && role != "member" { | ||
| return Err(( | ||
| StatusCode::BAD_REQUEST, | ||
| "role must be 'admin' or 'member'".to_string(), | ||
| )); | ||
| } |
There was a problem hiding this comment.
Hardcoding role strings like "admin" and "member" can lead to errors if role names change or new roles are introduced. Consider using an enum or constants for roles to improve type safety and maintainability.
let role = body
.get("role")
.and_then(|v| v.as_str())
.unwrap_or(DEFAULT_ROLE)
.to_string();
if !is_valid_role(&role) {
return Err((
StatusCode::BAD_REQUEST,
"role must be 'admin' or 'member'".to_string(),
));
}| if role != "admin" && role != "member" { | ||
| return Err(( | ||
| StatusCode::BAD_REQUEST, | ||
| "role must be 'admin' or 'member'".to_string(), | ||
| )); | ||
| } |
There was a problem hiding this comment.
Hardcoding role strings like "admin" and "member" can lead to errors if role names change or new roles are introduced. Consider using an enum or constants for roles to improve type safety and maintainability.
| if role != "admin" && role != "member" { | |
| return Err(( | |
| StatusCode::BAD_REQUEST, | |
| "role must be 'admin' or 'member'".to_string(), | |
| )); | |
| } | |
| if let Some(role) = body.get("role").and_then(|v| v.as_str()) { | |
| if !is_valid_role(role) { | |
| return Err(( | |
| StatusCode::BAD_REQUEST, | |
| "role must be 'admin' or 'member'".to_string(), | |
| )); | |
| } |
| .catch(function(e) { alert(I18n.t('users.failedRoleChange') + ': ' + e.message); }); | ||
| } | ||
|
|
||
| function createTokenForUser(userId, displayName) { |
There was a problem hiding this comment.
…tion, heartbeat cycling Finishes the remaining isolation work from phases 2–4 of #59: Phase 2 (DB scoping): Fix /status and /list commands to use _for_user DB variants instead of global queries that leaked cross-user job data. Phase 3 (Runtime isolation): Per-user workspace in routine engine's spawn_fire so lightweight routines run in the correct user context. Per-user daily cost tracking in CostGuard with configurable budget via MAX_COST_PER_USER_PER_DAY_CENTS. Multi-user heartbeat that cycles through all users with routines, auto-detected from GATEWAY_USER_TOKENS. Phase 4 (Provider/tools): Per-user model selection via preferred_model setting — looked up from SettingsStore on first iteration, threaded through ReasoningContext.model_override to CompletionRequest. Works with providers that support per-request model overrides (NearAI). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…verride Three follow-up fixes for multi-tenant isolation: 1. Multi-user heartbeat now runs memory hygiene per user before each heartbeat check, matching single-user heartbeat behavior. 2. /model command in multi-tenant mode only persists to per-user settings (selected_model) without calling set_model() on the shared LlmProvider. The per-request model_override in the dispatcher reads from the same setting. Added multi_tenant flag to AgentConfig (auto-detected from GATEWAY_USER_TOKENS). 3. RigAdapter now supports per-request model overrides by injecting the model name into rig-core's additional_params. OpenAI/Anthropic/Ollama API servers use last-key-wins for duplicate JSON keys, so the override takes effect via serde's flatten serialization order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y, pruning Fixes from review comments on #1614: - Cost tracking now uses the override model name (not active_model_name) when a per-user model override is active, for accurate attribution. - Multi-user heartbeat runs per-user checks concurrently via JoinSet instead of sequentially, preventing one slow user from blocking others. - Per-user failure counts tracked independently; users exceeding max_failures are skipped (matching single-user semantics). - per_user_daily_cost HashMap pruned on day rollover to prevent unbounded growth in long-lived deployments. - Doc comment fixed: says "routines" not "active routines". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses second round of PR review on #1614: - /status <job_id> DB path now validates job.user_id == requesting user before returning data (was missing ownership check, security fix). - persist_selected_model takes user_id param instead of owner_id, and skips .env/TOML writes in multi-tenant mode (these are shared global files). handle_system_command now receives user_id from caller. - JoinSet collection handles Err(JoinError) explicitly instead of silently dropping panicked tasks. - Notification forwarder extracts owner_id from response metadata in multi-tenant mode for per-user routing instead of broadcasting to the agent owner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round 3 review fixes: - Cost tracking passes None for cost_per_token when model override is active, letting CostGuard look up pricing by model name instead of using the default provider's rates (serrrfirat). - fire_manual() now uses per-user workspace, matching spawn_fire() pattern (serrrfirat). - Removed MULTI_TENANT env var — multi-tenant mode is auto-detected solely from GATEWAY_USER_TOKENS presence (serrrfirat + Copilot). - Multi-user heartbeat capped at 8 concurrent tasks to avoid flooding the LLM provider (serrrfirat + Copilot). - Fixed inject_model_override doc comment accuracy (Copilot). - Added comment explaining multi-tenant notification routing priority (Copilot). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds POST /api/webhooks/u/{user_id}/{path} — a user-scoped webhook
endpoint that filters the routine lookup by user_id, preventing
cross-user webhook triggering when paths collide.
The existing /api/webhooks/{path} endpoint remains unchanged for
backward compatibility in single-user deployments.
Changes:
- get_webhook_routine_by_path gains user_id: Option<&str> param
- Both postgres and libsql implementations add AND user_id = ? filter
when user_id is provided
- New webhook_trigger_user_scoped_handler extracts (user_id, path)
from URL and passes to shared fire_webhook_inner logic
- Route registered on public router (webhooks are called by external
services that can't send bearer tokens)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the web gateway layer for DB-backed user management (#1605): Auth refactor: - CombinedAuthState wraps env-var tokens (MultiAuthState) + optional DbAuthenticator for DB-backed token lookup with LRU cache (60s TTL, 1024 max entries) - auth_middleware tries env-var tokens first, then DB fallback - From<MultiAuthState> impl for backward compatibility - main.rs wires with_db_auth when database is available API handlers (12 new endpoints): - /api/admin/users — CRUD: create, list, detail, update, suspend, activate - /api/tokens — create (returns plaintext once), list, revoke - /api/invitations — create, list, accept (creates user + first token) Token creation: 32 random bytes → hex plaintext, SHA-256 hash stored. Invitation accept: validates hash + pending + not expired, creates user record and first API token atomically. All test files updated for CombinedAuthState type change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes the DB-backed user management feature (#1605): - Startup migration: when GATEWAY_USER_TOKENS is set and the users table is empty, inserts env-var users + hashed tokens into DB. Logs deprecation notice when DB already has users. - hash_token made pub for reuse in migration code. - 10 integration tests for UserStore (libsql file-backed): - has_any_users bootstrap detection - create/get/get_by_email/list/update user lifecycle - token create → authenticate → revoke → reject cycle - suspended user tokens rejected - wrong-user token revoke returns false - invitation create → accept → user created - record_login and record_token_usage timestamps - libSQL migration: removed FK constraints from V14 (incompatible with execute_batch inside transactions). Tables in both base SCHEMA and incremental migration for fresh and existing databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GATEWAY_USER_TOKENS never went to production — replaced entirely by DB-backed user management via /api/admin/users and /api/tokens. Removed: - UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing - user_tokens field from GatewayConfig - GatewayChannel::new_multi_auth() constructor - Env-var user migration block in main.rs (~90 lines) - multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime via db.has_any_users() in app.rs) Review fixes (zmanian): - User ID generation: UUID instead of display-name derivation (#1) - Invitation accept moved to public router (no auth needed) (#3) - libSQL get_invitation_by_hash aligned with postgres: filters status='pending' AND expires_at > now (#4) - UUID parse: returns DatabaseError::Serialization instead of unwrap_or_default (#7) - PostgreSQL SELECT * replaced with explicit column lists (#8) - Sort order aligned (both backends use DESC) (#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a `role` field (admin|member) to user management: Schema: - `role TEXT NOT NULL DEFAULT 'member'` added to users table in both PostgreSQL V14 migration and libSQL schema/incremental migration - UserRecord gains `role: String` field - UserIdentity gains `role: String` field, populated from DB in DbAuthenticator and defaulting to "admin" for single-user mode Access control: - AdminUser extractor: returns 403 Forbidden if role != "admin" - /api/admin/users/* handlers: require AdminUser (create, list, detail, update, suspend, activate) - POST /api/invitations: requires AdminUser (only admins can invite) - User creation accepts optional "role" param (defaults to "member") - Invitation acceptance creates users with "member" role Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a Users tab to the web gateway UI for managing users, tokens, and roles without needing direct API calls. Features: - User list table with ID, name, email, role, status, created date - Create user form with display name, email, role selector - Suspend/activate actions per user - Create API token for any user (shows plaintext once with copy button) - Role badges (admin highlighted, member muted) - Non-admin users see "Admin access required" message - Keyboard shortcut: Cmd/Ctrl+5 switches to Users tab CSS: - Reuses routines-table styles for the user list - Badge, token-display, btn-small, btn-danger, btn-primary components Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical auth bug: token creation hashed the raw 32 bytes (hasher.update(token_bytes)) but authentication hashed the hex-encoded string (hash_token(candidate) where candidate is the hex string the user sends). This meant newly created tokens could never authenticate. Fixed all 4 token creation sites (users, tokens, invitations create, invitations accept) to use hash_token(&plaintext_token) which hashes the hex string consistently with the auth lookup path. Removed now-unused sha2::Digest imports from handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The invitation flow is redundant — admin create user already generates a token and shows a login link. Invitations add complexity without value until email integration exists. Removed: - InvitationRecord struct and 4 UserStore trait methods - invitations table from V14 migration (postgres + both libsql schemas) - PostgreSQL Store methods (create/get/accept/list invitations) - libSQL UserStore invitation methods + row_to_invitation helper - invitations.rs handler file (212 lines) - /api/invitations routes (create, list, accept) - test_invitation_lifecycle test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements zmanian's architectural proposal from #1614 review: two-tier scoped database access (TenantScope/AdminScope) so handler code cannot accidentally bypass tenant scoping. TenantScope (default): wraps user_id + Arc<dyn Database>, auto-binds user_id on every operation. ID-based lookups return None for cross- tenant resources. No escape hatch — forgetting to scope is a compile error. AdminScope (explicit opt-in): cross-tenant access for system-level components (heartbeat, routine engine, self-repair, scheduler, worker). TenantCtx bundles TenantScope + workspace + cost guard + per-user rate limiting. Constructed once per request in handle_message, threaded through all command handlers and ChatDelegate. Key changes: - New src/tenant.rs (~920 lines): TenantScope, AdminScope, TenantCtx, TenantRateState, TenantRateRegistry - All command handlers: user_id: &str → ctx: &TenantCtx - ChatDelegate: cost check/record/settings via self.tenant - System components: store field changed to AdminScope - Config: TENANT_MAX_LLM_CONCURRENT, TENANT_MAX_JOBS_CONCURRENT env vars - Fixes bug: /status <job_id> cross-tenant leak (now auto-filtered) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temporary diagnostic endpoint that tests DB INSERT to users table with full error logging. No auth required. Will be removed after debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon
left a comment
There was a problem hiding this comment.
Code Review
Security Concern — Webhook Guard Removal
The diff removes the multi-tenant isolation guard from webhook_trigger_handler (src/channels/web/handlers/webhooks.rs):
// REMOVED:
if state.workspace_pool.is_some() {
return Err((StatusCode::GONE, "Unscoped webhooks disabled..."));
}This guard prevents cross-user routine triggering in multi-tenant deployments. Removing it weakens tenant isolation — any attacker with a webhook secret can trigger routines across all users via the unscoped /api/webhooks/{path} endpoint. The doc comment is updated from "Disabled in multi-tenant mode" to advisory guidance, but nothing enforces that recommendation.
Recommendation: Keep the guard, or make unscoped webhooks in multi-tenant mode opt-in via config.
Good Changes
token_prefix()helper — properly useschar_indices().nth(8)to avoid panicking on multi-byte UTF-8. Applied consistently acrosstokens.rs,users.rs, andmain.rs.- Role constants (
ROLE_ADMIN/ROLE_MEMBER+is_valid_role()) — reduces stringly-typed bugs. created_byvalidation — checks user existence before recording. However, the_ => Nonearm silently swallows DB errors — consider logging onErr(e).
Issues
- No tests —
token_prefix()is a pure function that's trivially testable. Thecreated_byvalidation change also warrants a test. Per project rules, new code should include tests. - PR scope mismatch — title says "first-class workspace entities with membership and cross-workspace sharing" but the diff is incremental cleanups. The PR notes mention uncommitted changes — consider scoping the title to what's actually here.
- Whitespace-only changes in
heartbeat.rsandcli/mod.rsare cosmetic noise. - Migration comment removes "invitation flow" from the header — may be premature if that's planned for a follow-up.
…e-entities-db-users # Conflicts: # src/agent/scheduler.rs # src/agent/thread_ops.rs # src/channels/web/auth.rs # src/channels/web/handlers/chat.rs # src/channels/web/server.rs # src/db/libsql/conversations.rs # src/db/libsql_migrations.rs # src/db/postgres.rs # src/history/store.rs # src/tenant.rs
zmanian
left a comment
There was a problem hiding this comment.
Review -- REQUEST_CHANGES
Well-structured transition from implicit per-user isolation to explicit DB-backed workspace entities. The backward compatibility story (personal workspace = workspace_id IS NULL) is clean. Authorization model consistently applied across all handler types. Good integration test coverage.
Must fix
1. No role validation on member upsert -- body.role is passed directly to DB with no validation. Any string accepted ("superadmin", "", etc.). Validate against {owner, admin, member, viewer}.
2. Slug validation missing -- workspaces_create_handler accepts any slug including special characters, spaces, path traversal. Slugs appear in URL paths. Validate with pattern like ^[a-z0-9][a-z0-9-]{1,62}[a-z0-9]$.
3. Last-owner protection -- No guard preventing the sole owner from removing themselves, orphaning the workspace.
Medium
4. Admin can promote themselves to owner via the upsert endpoint's ON CONFLICT DO UPDATE. Is this intentional?
5. Workspace-scoped settings bypass key extraction -- API keys in workspace LLM settings stored unencrypted (secrets store extraction skipped when scope.is_some()).
6. Duplicate resolve_chat_workspace_id -- identical function in both chat.rs and server.rs. Extract to shared location.
7. fire_manual(routine_id, None) loses caller tracking -- previously passed Some(&user.user_id) for audit trail.
Low
list_workspaces_for_userincludes archived workspaces (intentional?)row_to_workspaceuses.parse().unwrap_or_default()-- silently produces nil UUID on parse failure- PostgreSQL migration drops/recreates unique indexes without data guards
ilblackdragon
left a comment
There was a problem hiding this comment.
Code Review: First-class Workspace Entities
Overview
This PR adds DB-backed workspace entities with membership, cross-workspace sharing, and scoped access across the entire stack — from schema through DB traits, API handlers, SSE event delivery, WebSocket connections, the Responses API, and runtime propagation. It's a large (~4.5k additions), ambitious, and well-structured change that touches nearly every subsystem.
Strengths
- Clean backward compatibility: Personal workspace (
workspace_id = NULL) continues to work transparently. Existing data requires no migration into stored workspace rows. - Consistent scope propagation:
workspace_idflows end-to-end throughIncomingMessage→JobContext→PendingApproval→ SSE events. This is done thoroughly. - Good handler refactoring: Routines handlers extracted
load_visible_routine()andresolve_routine_scope(), eliminating repeated fetch-then-check-ownership patterns. - SSE scope filtering: The
event_matches_scope()function and tests for workspace fan-out are well-designed — personal events stay personal, workspace events reach all workspace subscribers. - Both backends implemented: PostgreSQL and libSQL both implement all new trait methods, meeting the dual-backend requirement.
- New tests: Workspace integration tests exercise membership, archived gating, scoped chat, settings isolation, thread isolation, and Responses API ownership.
Issues
Critical
-
libSQL base schema FK ordering bug: In
libsql_migrations.rs, theconversations,agent_jobs,memory_documents,routines, andsettingstables all haveworkspace_id TEXT REFERENCES workspaces(id) ON DELETE CASCADE, but theworkspacestable is defined after them in the base schema. On a fresh libSQL database, this will fail because the referenced table doesn't exist when the FK is created. Theworkspacesandworkspace_memberstable definitions must be moved before any table that references them. -
PostgreSQL
set_settingrace condition: The PGset_settingwas rewritten fromON CONFLICT ... DO UPDATEtoINSERT ... ON CONFLICT DO NOTHING+UPDATE. Because the unique constraint was replaced with partial indexes,ON CONFLICTcan't reference the old constraint. However, the single-setting path does not wrap in a transaction, so two concurrent writes could both succeed atINSERT ON CONFLICT DO NOTHING(neither sees the other's uncommitted row) before theUPDATEruns. Either wrap in a transaction or useON CONFLICT ON CONSTRAINT <index_name> DO UPDATE. -
Missing incremental libSQL migration: The workspace schema changes appear only in the base schema, not as incremental migration steps. Existing libSQL databases won't get the new
workspaces/workspace_memberstables or theworkspace_idcolumns on existing tables. There needs to be a migration step for existing databases.
Security
-
Owner can be removed from workspace:
remove_workspace_memberhas no guard against removing the last owner. An admin could remove the sole owner, leaving an orphaned workspace with no owner. Add a check similar tois_last_adminin user management. -
No role validation on
add_workspace_member: Theworkspace_members_upsert_handleraccepts anybody.rolestring without validation. TheWorkspaceMemberWriteRequesthas justpub role: String— there's no check that it's one of the valid roles (owner/admin/member/viewer). Theis_valid_role()helper was added inusers.rsbut not used here. -
Unscoped webhooks re-enabled in multi-tenant mode: The guard in
webhook_trigger_handlerthat rejected unscoped webhooks in multi-tenant mode was removed entirely. The comment says "For multi-tenant isolation, use the user-scoped endpoint" but the old guard prevented cross-user routine triggering. This is a regression — the per-routine webhook secret provides weak protection compared to tenant isolation.
Correctness
-
conversation_belongs_to_userworkspace semantics: Whenworkspace_idisSome(id), the query matches correctly. Butparsed_workspace_id()inthread_ops.rssilently drops invalid workspace UUIDs (returnsNone). If a message has a malformedworkspace_idstring, it would be treated as a personal-scope check, potentially allowing access to personal conversations when workspace access was intended. -
fire_manuallost user verification: Inroutines_trigger_handler, the call changed fromfire_manual(routine_id, Some(&user.user_id))tofire_manual(routine_id, None). While the handler now doesload_visible_routinebefore firing, verify that this fully replaces the internal ownership check that the user_id parameter provided. -
Duplicate
resolve_chat_workspace_idfunction: This function is defined identically in bothsrc/channels/web/handlers/chat.rsandsrc/channels/web/server.rs. Extract to a shared location to avoid divergence.
Code Quality
-
#[allow(clippy::too_many_arguments)]onsend_notification: Addingworkspace_idpushed this to 8 parameters. Consider grouping into a struct instead of suppressing the lint. -
workspace_settings_user_idconvention: Both libSQL and PG settings backends useformat!("workspace:{workspace_id}")as a syntheticuser_idfor workspace settings. This convention-based coupling is fragile — if a real user_id ever starts withworkspace:, it would collide. Consider using a fixed sentinel or removing the user_id requirement for workspace-scoped settings. -
Repeated scope-resolution boilerplate: Nearly every handler has the same
resolve_*_scope→ branch on workspace/personal pattern. Consider an axum extractor that resolves workspace scope once and injects it.
Missing
-
No slug validation:
WorkspaceCreateRequest.slugis accepted as-is with no validation (length, allowed characters, reserved words). Slugs are used in URLs — validate them. -
Settings key extraction skipped for workspace scope:
settings_set_handlerskips API key extraction for workspace-scopedllm_builtin_overridesandllm_custom_providers(just usesbody.value.clone()). This means workspace settings could store plaintext API keys that personal settings would encrypt.
Summary
This is solid foundational work. The architecture is clean, the scope propagation is thorough, and backward compatibility is well-preserved. The critical items (libSQL schema ordering, PG settings race, missing incremental migration) need to be fixed before merge. The security items (owner removal, role validation, webhook guard) should be addressed.
|
Addressed the review items in Summary of fixes:
Verification run:
|
ilblackdragon
left a comment
There was a problem hiding this comment.
I found a few concrete regressions here, plus one larger architectural inconsistency in the workspace model.
-
High: workspace-scoped history authorization is bypassable for active in-memory threads.
Insrc/channels/web/handlers/chat.rs,chat_history_handler()correctly checksconversation_belongs_to_user(thread_id, user_id, workspace_id), but on failure it only returns404if the thread is not present insess.threads. It then falls through to the in-memory branch and returns that thread’s turns anyway. In practice, the same user can request a thread under the wrong?workspace=and still read it as long as that thread is active in session. That defeats the new workspace scoping check for active threads. -
High: workspace-scoped SSE/WS subscribers no longer receive user-scoped auth events.
Insrc/channels/web/sse.rs,event_matches_scope()rejects(Some(event_user), None)whenever the subscriber has a workspace scope, because it requiressubscriber_workspace_id.is_none(). But auth-required/auth-completed events are still emitted withbroadcast_for_user(...)from both the HTTP and WS chat flows. That means a user connected to/api/chat/events?workspace=...or the workspace-scoped WS path will miss extension auth events in that workspace tab. -
Medium: workspace-scoped settings dropped the active-provider safety invariant.
Insrc/channels/web/handlers/settings.rs, the guard that prevents removing the currently active custom provider only runs whenscope.is_none(). Oncellm_backend/llm_custom_providersare workspace-scoped, that invariant needs to hold there too. The helper itself is also still hard-wired to user-scopedget_setting(...), so the feature surface has expanded without the corresponding safety rule. -
Medium: the PR is halfway between a shared-workspace conversation model and a user-owned conversation model.
Live response/status events are now broadcast to the whole workspace insrc/channels/web/mod.rs, but persisted conversation listing and ownership checks in the DB layer still key conversations byuser_id + workspace_id. That leaves an unresolved product/security boundary: either workspace members can see live thread activity they cannot later list/open, or the storage/authorization model has not actually been generalized to match the new workspace-sharing behavior. I would want that model made explicit and enforced consistently before merging.
At a higher level, the implementation is extending workspace scope through routing and transport faster than through the underlying ownership model. That is exactly where subtle multi-tenant bugs show up. I’d recommend tightening the model first, then adding focused tests for:
chat/historywith the wrong workspace against an active thread- workspace-scoped SSE/WS auth-required/auth-completed flows
- workspace-scoped
llm_custom_providersupdates/deletes when the active backend is custom
|
Addressed the April 3 review in
Verification run on this follow-up:
Earlier targeted checks for the other three items are still green on this branch as well:
|
# Conflicts: # src/agent/agent_loop.rs # src/agent/routine_engine.rs # src/channels/web/handlers/chat.rs # src/channels/web/handlers/jobs.rs # src/channels/web/server.rs # src/db/libsql_migrations.rs # src/db/mod.rs # src/db/postgres.rs # src/tools/builtin/memory.rs
ilblackdragon
left a comment
There was a problem hiding this comment.
Automated review — NEEDS DISCUSSION (do not merge as-is)
TL;DR: Sound architecture and a good backward-compatibility story, but four critical safety issues plus three security/correctness items require explicit decisions before merge. The "delicate" flag is warranted — this touches IronClaw's most valuable data. Recommend additional reviewers (DB + security).
Critical issues (blockers)
| # | Severity | File:Line | Issue | Suggestion |
|---|---|---|---|---|
| 1 | CRITICAL | `src/db/libsql_migrations.rs` (base SCHEMA) | FK ordering bug. `conversations`/`agent_jobs`/`memory_documents`/`routines`/`settings` reference `workspaces(id)` before `workspaces` is created. Fresh libSQL databases will fail on migration. | Move `workspaces` + `workspace_members` CREATE TABLE to the top of SCHEMA, before `conversations`. Test on a fresh libSQL instance |
| 2 | CRITICAL | `src/db/libsql_migrations.rs` (`INCREMENTAL_MIGRATIONS`) | Missing libSQL incremental migration. The base SCHEMA has workspace tables but `INCREMENTAL_MIGRATIONS` has no entry — existing libSQL deployments won't get workspace tables or scoping columns | Add migration #18 with all workspace CREATE TABLEs + ALTER TABLEs for scoping columns. Per CLAUDE.md: "All new persistence features must support both backends" |
| 3 | CRITICAL | `src/db/postgres.rs::set_setting` + `handlers/settings.rs` | Settings upsert race. PG rewrite uses INSERT+UPDATE without a transaction wrapper. Two concurrent writes can both succeed at INSERT (neither sees the other's uncommitted row) before UPDATE runs | Wrap `set_setting` in a transaction, or use `ON CONFLICT ON CONSTRAINT` with the partial index name |
| 4 | CRITICAL | `handlers/chat.rs` + `channels/web/mod.rs` | Authorization model incomplete. `chat_history_handler` checks workspace scope but falls through to the in-memory active thread without re-checking scope. Workspace SSE fan-out broadcasts live events to all workspace subscribers, but conversation ownership remains user-keyed — members see live activity from threads they can't later list/query | Audit the conversation ownership model. Is it user-owned or workspace-shared? If shared, listing queries must be workspace-keyed. If user-owned, what should live SSE show? Make the model explicit in code + tests |
High-severity issues
| # | Severity | File:Line | Issue | Suggestion |
|---|---|---|---|---|
| 5 | High | `handlers/workspaces.rs` (`workspace_members_upsert_handler`) | No role validation — `body.role` accepted as-is; any string passes ("superadmin", ""). `is_valid_role()` helper already exists in `users.rs` | Validate against `["owner","admin","member","viewer"]`. Return 400 with an error. Add regression test |
| 6 | High | `handlers/workspaces.rs` | No slug validation. Slugs appear in URLs and accept any string (spaces, special chars, path-traversal attempts) | Validate: `^[a-z0-9][a-z0-9-]{1,62}[a-z0-9]$`. Return 400. Add regression test |
| 7 | High | `src/db/workspace.rs::remove_workspace_member` | No guard preventing removal of the last owner. An admin could delete the sole owner, orphaning the workspace | Count members with `role='owner'`; if count==1 and target is owner, return 409. Test: `workspace_cannot_remove_last_owner` |
| 8 | High | `handlers/webhooks.rs` | The `if state.workspace_pool.is_some() { return Err(GONE, ...) }` guard was removed entirely. Unscoped webhooks can now trigger routines across all users in multi-tenant mode. Doc was updated to "advisory guidance" but nothing enforces it | Either restore the guard or make unscoped webhooks opt-in via a config flag. Security regression from the perspective of multi-tenant isolation. Clarify intent with maintainers |
Medium / Low
| # | Severity | File:Line | Issue | Suggestion |
|---|---|---|---|---|
| 9 | Medium | `handlers/chat.rs` + `channels/web/server.rs` | `resolve_chat_workspace_id` defined identically in two files — will diverge | Extract to `src/channels/web/scope.rs` and import once |
| 10 | Medium | `handlers/settings.rs` | Settings key extraction (secret injection) is skipped for workspace-scoped settings. Workspace LLM API keys are stored plaintext while personal settings are encrypted | Apply the same secret extraction/encryption logic for workspace settings, or document explicitly why workspace members are trusted to see raw API keys |
| 11 | Medium | `src/db/libsql/workspace.rs` (`row_to_workspace`) | `.parse().unwrap_or_default()` on UUID silently produces nil UUID on malformed input | Use `?` error propagation; log on error |
| 12 | Medium | `handlers/workspaces.rs` | Admin can self-promote to owner via the upsert endpoint's `ON CONFLICT DO UPDATE` path | Clarify intent. If only owners should promote to owner: return 403 for non-owner attempts to set `role='owner'`. Test: `workspace_member_cannot_self_promote_to_owner` |
| 13 | Low | `src/db/libsql/workspace.rs` (`list_workspaces_for_user`) | Includes archived workspaces | Comment the rationale or filter with a test for both modes |
| 14 | Low | `routine_engine.rs` (`fire_manual(routine_id, None)`) | Lost user tracking (previously `Some(&user.user_id)`) | Verify `load_visible_routine` fully replaces internal checks and preserves the audit trail |
Dual-backend support assessment
| Component | PostgreSQL | libSQL | Coverage |
|---|---|---|---|
| Schema | V16__workspaces.sql ✓ | Base SCHEMA + INCREMENTAL_MIGRATIONS ? | PARTIAL |
| FK ordering | Correct | Broken | NO |
| Concurrency | `ON CONFLICT DO UPDATE` | INSERT+UPDATE, no txn wrapper | DIVERGENT |
| Role validation | Validated at DB level | Validated at DB level | ✓ |
| Slug validation | Missing in both | Missing in both | ✗ |
The libSQL backend is incompletely implemented. Fresh libSQL init will fail; existing libSQL deployments silently lack workspace functionality.
Test coverage — gaps
Present:
- `tests/workspaces_integration.rs` covers membership endpoints, archived 410, scoped chat forwarding, settings isolation, thread listing isolation, Responses API ownership, slug/role validation tests (`workspace_creation_rejects_invalid_slug`, `workspace_member_updates_validate_roles_and_owner_permissions`, `workspace_cannot_remove_or_demote_last_owner`)
Missing:
- SSE/WebSocket auth-event delivery test to workspace-scoped subscribers
- Active in-memory thread access-control bypass test
- Custom provider deletion guard on workspace-scoped settings
- libSQL backend not tested with migration — integration tests run on PostgreSQL; no freshness check for libSQL
- Concurrent workspace settings write race test
Risk assessment (production)
- Deployment blocker (libSQL): fresh init fails; existing libSQL silently lacks workspace functionality
- Multi-tenant visibility regression: SSE is workspace-wide but listing is user-keyed
- Settings race (libSQL): concurrent writes can create duplicates/overwrites (low probability)
- Webhook tenant isolation: removal of the multi-tenant guard allows unscoped webhooks across user boundaries
- Unencrypted workspace API keys: DB breach or log leak exposes them
- Authorization ambiguity: no clear story for "what does it mean to be a workspace member?"
Open questions for maintainer
- libSQL migration strategy: include the `INCREMENTAL_MIGRATIONS` entry in this PR, or separate? How do we prevent deploying the base schema without the incremental migrations?
- Webhook multi-tenant isolation: was removing the `workspace_pool` guard intentional?
- Conversation ownership: when a conversation is in a workspace, is it user-owned (only the creating user sees it) or workspace-shared (all members see it)? Affects listing, history, ownership checks. Currently user-owned for DB queries but workspace-wide for SSE — inconsistent.
- Workspace settings encryption: why skip secret extraction for workspace-scoped LLM settings? By design (members are trusted) or oversight?
- Self-promotion to owner: is it intentional that an admin can promote themselves to owner via the upsert endpoint?
Recommended additional reviewers
- DB expert — migration strategy, FK ordering, transaction safety, dual-backend equivalence
- Security reviewer — multi-tenant isolation (webhook guard removal, SSE scope filtering, settings encryption)
- Product/design — explicit conversation ownership model decision
Conclusion
The foundational work is solid. Backward compatibility is clean, scope propagation is thorough, and the integration tests are well-structured. But this PR cannot merge without:
- Fixing the four criticals
- Resolving the security questions (last-owner verification in impl, webhook guard intent, workspace settings encryption policy)
- Addressing the high-priority gaps (role validation, slug validation, duplicate function extraction)
The "delicate" flag from the maintainer is warranted — take the time to get it right. Consider a follow-up PR for non-critical improvements (e.g., the scope-resolution helper extraction).
zmanian
left a comment
There was a problem hiding this comment.
Review
+6848/-735, 70+ files — comprehensive implementation of DB-backed workspace entities, membership, cross-workspace sharing, scoped data access, and workspace-aware SSE fan-out. Both PG and libSQL migrations included. Well-structured for the scope.
Must Fix
-
Dead/duplicate handlers in
handlers/chat.rsvsserver.rs—handlers/chat.rsdefines publicchat_send_handler,chat_approval_handler, etc. that are never used in routing.server.rshas its own local versions that are actually wired up. This is a significant maintenance hazard — a reader won't know which handler is live. Should consolidate to one location. -
workspace_idtype inconsistency:StringvsUuid—workspace_idisOption<String>inIncomingMessage,JobContext,PendingAuth,PendingApproval, butOption<Uuid>inRoutine,MemoryDocument,SandboxJobRecord,AgentJobRecord, and the DB layer. This leads to ~10+ scatteredUuid::parse_str(id).ok()conversions where a malformed UUID silently becomesNone. Should parse once at the boundary and propagate asUuidthroughout. -
joined_atreset on member role update — the UPSERT in both PG and libSQL resetsjoined_attonow()when updating an existing member's role. This loses historical data. Should only updateroleandinvited_byon conflict.
Suggestions
-
Operator precedence in
validate_workspace_slug— the||/&&chain is correct by accident (De Morgan's). Add explicit parens for clarity:!(bytes[0].is_ascii_lowercase() || bytes[0].is_ascii_digit()). -
Dual sentinel schemes —
WORKSPACE_SETTINGS_SENTINEL_USER_ID(__workspace_settings__) vsworkspace_scope_user_id()(workspace:{uuid}) serve different purposes but could confuse future contributors. Add a clarifying comment. -
Webhook multi-tenant guard ordering — the check moved from
webhook_trigger_handlerintofire_webhook_inner, meaning rate limit now runs before the GONE rejection. Minor, but in a DDoS scenario the ordering matters. -
No Postgres test coverage — integration tests all use
LibSqlBackend::new_memory(). The PostgresWorkspaceMgmtStoreimplementation is untested. -
No way to list archived workspaces —
list_workspaces_for_userfilters them out with noinclude_archivedoption for audit/cleanup.
Cross-Cutting Observations
This PR and others in the current review batch share a dual-backend testing gap — multiple PRs only test libSQL, not Postgres, which could let PG-specific bugs slip through. Worth establishing a pattern for testing both backends on persistence-heavy PRs like this one.
Code Style
No .unwrap()/.expect() in production code, proper error mapping with context, thiserror types, strong types (WorkspaceRecord, WorkspaceMembership, ResolvedWorkspace), consistent crate:: imports. The RoutineNotification struct replacing #[allow(clippy::too_many_arguments)] is a nice refactor.
ilblackdragon
left a comment
There was a problem hiding this comment.
Verdict: Request changes (foundational PR — #1977 stacks on this)
src/channels/web/handlers/workspaces.rs:88-101validate_workspace_slug— operator-precedence + boolean logic does not faithfully implement the documented^[a-z0-9][a-z0-9-]{1,62}[a-z0-9]$regex. Rewrite as a clear.all()pass plus first/last anchors.- Routine viewer privilege:
routines_trigger_handleronly callsload_visible_routine(read check). Workspaceviewers can therefore execute routines via the gateway. Likely unintended. chat_approval_handlerTOCTOU: readspending.workspace_idfrom in-memory session but doesn't re-check membership at submission time. A user removed between request and approval can still complete it. Re-check membership againstpending.workspace_id.workspace_members_upsert_handlerTOCTOU:get_member_role/is_last_workspace_owner/add_workspace_memberis not transactional — two concurrent owner-removal upserts can both pass. Wrap in a single SQL transaction or conditional UPDATE.- PG query planner risk: hot lookups now use
workspace_id IS NOT DISTINCT FROM $N, which PG generally cannot use a btree index for.EXPLAIN ANALYZElist_conversations_all_channelson a populated DB; consider splitting personal vs workspace queries or rewriting as(workspace_id = $N OR (workspace_id IS NULL AND $N IS NULL)). WORKSPACE_SETTINGS_SENTINEL_USER_ID = "__workspace_settings__"— if a real user ever ends up with this id, they could read all workspace settings. Add a CHECK constraint or user-creation guard.migrations/V16__workspaces.sql:158-160dropssettings_pkeyleaving the table with NO primary key. Acceptable for partial-unique-index design but call out in release notes; logical replication tools may misbehave.- No down migration. Document the rollback plan.
src/db/postgres.rs:~2480set_setting— usesINSERT … ON CONFLICT DO NOTHINGthenUPDATEinstead of an inferred-target upsert. Functionally correct but fragile if a partial index is missing.- Strong dual-backend implementation throughout. Migration safety is mostly additive. Excellent integration test suite (
tests/workspaces_integration.rs, +715 lines) plus libSQL migration upgrade tests. - Per-message
Uuid::parse_str(workspace_id)happens in many hot paths — consider storing asUuidend-to-end in a follow-up.
Rebased squash of nearai#1734 onto current staging with all review feedback addressed. Core feature: DB-backed workspace entities (users, workspaces, workspace_members) with membership-based access control. Adds nullable workspace_id scoping to conversations, jobs, memory, routines, and settings. Personal workspace (workspace_id=NULL) remains the default. Review feedback fixes included in this squash: - Slug validation rewritten for clarity (operator precedence) - joined_at no longer reset on member role update (both backends) - Archive requires owner role, not just manager - TOCTOU: new update_member_role_checked/remove_workspace_member_checked with transactional last-owner guard (both PG and libsql) - Thread ownership check runs for all threads including session fallback - Settings export/import reject workspace-scoped requests (501) - Routine trigger rejects workspace viewers (403) - Job save hard-fails on malformed workspace_id - PG migration renumbered to V25 (staging is at V24) - libSQL incremental migration renumbered to 25 Original-PR: nearai#1734 Original-Author: serrrfirat Co-Authored-By: serrrfirat <firat@nearai.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
This PR implements
#1607: first-class workspace entities with membership and cross-workspace sharing.The branch now moves workspace scope from implicit per-
user_idisolation to explicit DB-backed workspace entities and membership, while preserving the existing personal workspace behavior as the backward-compatible default.What Changed
1. Database and schema
usersapi_tokensworkspacesworkspace_membersworkspace_idscoping to existing persisted entities:conversationsagent_jobsmemory_documentsroutinessettingsworkspace_id = NULL, so existing user-scoped data keeps working without migration into a stored “personal workspace” row.2. Workspace management store
owner.3. Web API surface
POST /api/workspacesGET /api/workspacesGET /api/workspaces/{slug}PUT /api/workspaces/{slug}POST /api/workspaces/{slug}/archiveGET /api/workspaces/{slug}/membersPUT /api/workspaces/{slug}/members/{user_id}DELETE /api/workspaces/{slug}/members/{user_id}?workspace=<slug>across existing gateway flows.410 Goneon scoped access paths instead of behaving like active workspaces.4. Workspace scoping across existing feature areas
?workspace=no longer falls through to personal settings5. Runtime propagation
workspace_idthrough the runtime instead of keeping it as a web-only concept:IncomingMessage6. Workspace event delivery
7. OpenAI-compatible Responses API
/v1/responsesworkspace-aware:workspace_id = NULL.8. Compatibility behavior
workspacequery paramworkspace_id = NULLworkspace_read_scopeswas not removed outright, but it is now treated as compatibility behavior for the personal workspace path rather than the source of truth for shared workspace authorization.Tests and Verification
Targeted verification completed on this branch:
cargo check --all-featurescargo test --test workspaces_integrationcargo clippy --all-features --all-targets -- -D warningsThe workspace integration coverage now exercises:
410 Gonebehaviorworkspace_idAdditional unit/integration coverage was also updated where API signatures changed.
Notable Implementation Notes
WorkspacePoolnow behaves correctly for shared workspace scopes and no longer treats legacy token scopes as the mechanism for workspace entity sharing.Closes #1607