Skip to content

feat: first-class workspace entities with membership and cross-workspace sharing#1734

Open
serrrfirat wants to merge 30 commits intostagingfrom
feat/1607-workspace-entities-db-users
Open

feat: first-class workspace entities with membership and cross-workspace sharing#1734
serrrfirat wants to merge 30 commits intostagingfrom
feat/1607-workspace-entities-db-users

Conversation

@serrrfirat
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat commented Mar 29, 2026

Summary

This PR implements #1607: first-class workspace entities with membership and cross-workspace sharing.

The branch now moves workspace scope from implicit per-user_id isolation to explicit DB-backed workspace entities and membership, while preserving the existing personal workspace behavior as the backward-compatible default.

What Changed

1. Database and schema

  • Added DB-backed user/token primitives needed by workspace membership:
    • users
    • api_tokens
  • Added first-class workspace tables:
    • workspaces
    • workspace_members
  • Added nullable workspace_id scoping to existing persisted entities:
    • conversations
    • agent_jobs
    • memory_documents
    • routines
    • settings
  • Preserved personal scope as workspace_id = NULL, so existing user-scoped data keeps working without migration into a stored “personal workspace” row.
  • Implemented the schema changes for both backends:
    • PostgreSQL migrations / queries
    • libSQL base schema + incremental migrations

2. Workspace management store

  • Added a dedicated workspace-management trait in the DB abstraction layer.
  • Implemented workspace CRUD and membership operations for both PostgreSQL and libSQL:
    • create workspace
    • fetch by slug / ID
    • list workspaces for a user
    • update workspace metadata/settings
    • archive workspace
    • add / remove members
    • list members
    • get / update member role
    • membership checks
  • Workspace creator is automatically inserted as owner.

3. Web API surface

  • Added workspace management endpoints:
    • POST /api/workspaces
    • GET /api/workspaces
    • GET /api/workspaces/{slug}
    • PUT /api/workspaces/{slug}
    • POST /api/workspaces/{slug}/archive
    • GET /api/workspaces/{slug}/members
    • PUT /api/workspaces/{slug}/members/{user_id}
    • DELETE /api/workspaces/{slug}/members/{user_id}
  • Added workspace scope resolution from ?workspace=<slug> across existing gateway flows.
  • Added membership enforcement before workspace-scoped reads/writes.
  • Archived workspaces now return 410 Gone on scoped access paths instead of behaving like active workspaces.

4. Workspace scoping across existing feature areas

  • Jobs:
    • workspace-scoped listing / summaries / visibility checks
  • Routines:
    • workspace-aware lookup, listing, and ownership checks
  • Memory:
    • workspace-aware resolver path and shared workspace backing scope
  • Settings:
    • workspace and personal settings are now actually isolated
    • ?workspace= no longer falls through to personal settings
  • Conversations / thread listing:
    • workspace threads stay out of personal thread listings
    • conversation ownership checks now consider workspace scope

5. Runtime propagation

  • Propagated workspace_id through the runtime instead of keeping it as a web-only concept:
    • IncomingMessage
    • chat send path
    • websocket path
    • approval resubmission path
    • session/pending approval state
    • job/routine context
    • conversation persistence / ownership checks
  • Chat and approval flows now preserve workspace scope end-to-end.

6. Workspace event delivery

  • Added workspace-wide SSE fan-out semantics.
  • Workspace-scoped events are no longer only user-targeted events with an extra tag; they now broadcast to all subscribers scoped to the same workspace.
  • Updated gateway response/status broadcasting so workspace events go to the workspace stream rather than only the initiating user.

7. OpenAI-compatible Responses API

  • Made /v1/responses workspace-aware:
    • previous-response/thread resolution now verifies workspace ownership correctly
    • response fetches no longer assume personal scope
    • new Responses API requests can be created in a workspace scope
  • This closes a real parity gap where the main chat flows had workspace support but the Responses API path still behaved as if workspace_id = NULL.

8. Compatibility behavior

  • Personal workspaces still behave as before:
    • no workspace query param
    • workspace_id = NULL
    • user-scoped queries remain valid
  • Legacy workspace_read_scopes was not removed outright, but it is now treated as compatibility behavior for the personal workspace path rather than the source of truth for shared workspace authorization.
  • Shared workspace access is now membership-based.

Tests and Verification

Targeted verification completed on this branch:

  • cargo check --all-features
  • cargo test --test workspaces_integration
  • cargo clippy --all-features --all-targets -- -D warnings

The workspace integration coverage now exercises:

  • workspace membership endpoints and access control
  • archived workspace 410 Gone behavior
  • workspace-scoped chat forwarding carrying workspace_id
  • workspace vs personal settings isolation
  • workspace thread isolation from personal thread listing
  • Responses API workspace ownership behavior

Additional unit/integration coverage was also updated where API signatures changed.

Notable Implementation Notes

  • The existing memory “Workspace” type remains the document/memory system; this PR adds workspace entities on top of it rather than replacing it.
  • WorkspacePool now behaves correctly for shared workspace scopes and no longer treats legacy token scopes as the mechanism for workspace entity sharing.
  • libSQL test setup was adjusted so ephemeral test databases are reliable across the backend’s per-operation connection model.

Closes #1607

@github-actions github-actions bot added scope: agent Agent core (agent loop, router, scheduler) scope: channel/cli TUI / CLI channel scope: channel/web Web gateway channel scope: db Database trait / abstraction scope: db/postgres PostgreSQL backend scope: db/libsql libSQL / Turso backend scope: llm LLM integration scope: config Configuration scope: sandbox Docker sandbox scope: docs Documentation scope: dependencies Dependency updates size: XL 500+ changed lines risk: high Safety, secrets, auth, or critical infrastructure contributor: experienced 6-19 merged PRs labels Mar 29, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive database-backed user management system, enabling multi-tenant deployments with admin APIs for user CRUD, secret provisioning, and self-service token management. It also enhances workspace isolation, LLM usage tracking, and Docker build efficiency. Feedback focuses on preventing potential panics during UTF-8 string slicing, resolving inconsistencies between documentation and database migrations, and improving type safety by replacing hardcoded role strings with enums. Additionally, replacing browser-native prompts with custom UI components is recommended for better security and user experience.

Comment thread src/main.rs Outdated
Comment on lines +663 to +666
let prefix = if auth_token.len() >= 8 {
&auth_token[..8]
} else {
auth_token
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for prefix calculation can panic if auth_token[..8] slices in the middle of a multi-byte UTF-8 character or if the string is shorter than 8 bytes. Use character-aware methods like char_indices to safely truncate the string to a maximum of 8 characters.

let prefix = match auth_token.char_indices().nth(8) {
    Some((idx, _)) => &auth_token[..idx],
    None => &auth_token,
};
References
  1. When truncating a UTF-8 string, use character-aware methods (e.g., char_indices() or iterating over chars()) to avoid panics caused by slicing in the middle of a multi-byte character.

Comment thread docs/USER_MANAGEMENT_API.md Outdated
| Column | Type (PG / libSQL) | Notes |
|--------|--------------------|-------|
| `id` | `UUID` / `TEXT` | Primary key |
| `user_id` | `TEXT NOT NULL` | FK to `users.id` (PG cascades; libSQL explicit cleanup) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The note for api_tokens.user_id states "libSQL explicit cleanup", but the migrations/V14__users.sql file for libSQL actually uses ON DELETE CASCADE for this foreign key. This creates a slight inconsistency between the documentation and the implementation.

Comment thread migrations/V14__users.sql
email TEXT UNIQUE, -- nullable for token-only users
display_name TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'active', -- active | suspended | deactivated
role TEXT NOT NULL DEFAULT 'member', -- admin | member
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment mentions deactivated as a possible status, but this status is not used in the UserRecord definition in src/db/mod.rs or in the API documentation. Consider removing deactivated from the comment or implementing its usage for consistency.

Comment thread migrations/V14__users.sql

CREATE TABLE api_tokens (
id UUID PRIMARY KEY,
user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ON DELETE CASCADE for api_tokens.user_id in this libSQL migration contradicts the documentation in docs/USER_MANAGEMENT_API.md, which states "libSQL explicit cleanup" for this foreign key. Please update either the documentation or the migration to be consistent.

Comment thread src/channels/web/handlers/users.rs Outdated
Comment on lines +46 to +51
if role != "admin" && role != "member" {
return Err((
StatusCode::BAD_REQUEST,
"role must be 'admin' or 'member'".to_string(),
));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding role strings like "admin" and "member" can lead to errors if role names change or new roles are introduced. Consider using an enum or constants for roles to improve type safety and maintainability.

    let role = body
        .get("role")
        .and_then(|v| v.as_str())
        .unwrap_or(DEFAULT_ROLE)
        .to_string();
    if !is_valid_role(&role) {
        return Err((
            StatusCode::BAD_REQUEST,
            "role must be 'admin' or 'member'".to_string(),
        ));
    }

Comment thread src/channels/web/handlers/users.rs Outdated
Comment on lines +231 to +236
if role != "admin" && role != "member" {
return Err((
StatusCode::BAD_REQUEST,
"role must be 'admin' or 'member'".to_string(),
));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding role strings like "admin" and "member" can lead to errors if role names change or new roles are introduced. Consider using an enum or constants for roles to improve type safety and maintainability.

Suggested change
if role != "admin" && role != "member" {
return Err((
StatusCode::BAD_REQUEST,
"role must be 'admin' or 'member'".to_string(),
));
}
if let Some(role) = body.get("role").and_then(|v| v.as_str()) {
if !is_valid_role(role) {
return Err((
StatusCode::BAD_REQUEST,
"role must be 'admin' or 'member'".to_string(),
));
}

Comment thread src/channels/web/static/app.js Outdated
.catch(function(e) { alert(I18n.t('users.failedRoleChange') + ': ' + e.message); });
}

function createTokenForUser(userId, displayName) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using prompt() for user input is generally discouraged due to poor user experience and potential security implications (e.g., phishing if the browser's UI is spoofed). Consider replacing this with a custom modal or an inline input field for a more controlled and secure experience.

ilblackdragon and others added 15 commits March 29, 2026 13:15
…tion, heartbeat cycling

Finishes the remaining isolation work from phases 2–4 of #59:

Phase 2 (DB scoping): Fix /status and /list commands to use _for_user
DB variants instead of global queries that leaked cross-user job data.

Phase 3 (Runtime isolation): Per-user workspace in routine engine's
spawn_fire so lightweight routines run in the correct user context.
Per-user daily cost tracking in CostGuard with configurable budget via
MAX_COST_PER_USER_PER_DAY_CENTS. Multi-user heartbeat that cycles
through all users with routines, auto-detected from GATEWAY_USER_TOKENS.

Phase 4 (Provider/tools): Per-user model selection via preferred_model
setting — looked up from SettingsStore on first iteration, threaded
through ReasoningContext.model_override to CompletionRequest. Works
with providers that support per-request model overrides (NearAI).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…verride

Three follow-up fixes for multi-tenant isolation:

1. Multi-user heartbeat now runs memory hygiene per user before each
   heartbeat check, matching single-user heartbeat behavior.

2. /model command in multi-tenant mode only persists to per-user
   settings (selected_model) without calling set_model() on the shared
   LlmProvider. The per-request model_override in the dispatcher reads
   from the same setting. Added multi_tenant flag to AgentConfig
   (auto-detected from GATEWAY_USER_TOKENS).

3. RigAdapter now supports per-request model overrides by injecting the
   model name into rig-core's additional_params. OpenAI/Anthropic/Ollama
   API servers use last-key-wins for duplicate JSON keys, so the override
   takes effect via serde's flatten serialization order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…y, pruning

Fixes from review comments on #1614:

- Cost tracking now uses the override model name (not active_model_name)
  when a per-user model override is active, for accurate attribution.
- Multi-user heartbeat runs per-user checks concurrently via JoinSet
  instead of sequentially, preventing one slow user from blocking others.
- Per-user failure counts tracked independently; users exceeding
  max_failures are skipped (matching single-user semantics).
- per_user_daily_cost HashMap pruned on day rollover to prevent
  unbounded growth in long-lived deployments.
- Doc comment fixed: says "routines" not "active routines".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses second round of PR review on #1614:

- /status <job_id> DB path now validates job.user_id == requesting user
  before returning data (was missing ownership check, security fix).

- persist_selected_model takes user_id param instead of owner_id, and
  skips .env/TOML writes in multi-tenant mode (these are shared global
  files). handle_system_command now receives user_id from caller.

- JoinSet collection handles Err(JoinError) explicitly instead of
  silently dropping panicked tasks.

- Notification forwarder extracts owner_id from response metadata in
  multi-tenant mode for per-user routing instead of broadcasting to
  the agent owner.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Round 3 review fixes:

- Cost tracking passes None for cost_per_token when model override is
  active, letting CostGuard look up pricing by model name instead of
  using the default provider's rates (serrrfirat).

- fire_manual() now uses per-user workspace, matching spawn_fire()
  pattern (serrrfirat).

- Removed MULTI_TENANT env var — multi-tenant mode is auto-detected
  solely from GATEWAY_USER_TOKENS presence (serrrfirat + Copilot).

- Multi-user heartbeat capped at 8 concurrent tasks to avoid flooding
  the LLM provider (serrrfirat + Copilot).

- Fixed inject_model_override doc comment accuracy (Copilot).

- Added comment explaining multi-tenant notification routing priority
  (Copilot).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds POST /api/webhooks/u/{user_id}/{path} — a user-scoped webhook
endpoint that filters the routine lookup by user_id, preventing
cross-user webhook triggering when paths collide.

The existing /api/webhooks/{path} endpoint remains unchanged for
backward compatibility in single-user deployments.

Changes:
- get_webhook_routine_by_path gains user_id: Option<&str> param
- Both postgres and libsql implementations add AND user_id = ? filter
  when user_id is provided
- New webhook_trigger_user_scoped_handler extracts (user_id, path)
  from URL and passes to shared fire_webhook_inner logic
- Route registered on public router (webhooks are called by external
  services that can't send bearer tokens)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the web gateway layer for DB-backed user management (#1605):

Auth refactor:
- CombinedAuthState wraps env-var tokens (MultiAuthState) + optional
  DbAuthenticator for DB-backed token lookup with LRU cache (60s TTL,
  1024 max entries)
- auth_middleware tries env-var tokens first, then DB fallback
- From<MultiAuthState> impl for backward compatibility
- main.rs wires with_db_auth when database is available

API handlers (12 new endpoints):
- /api/admin/users — CRUD: create, list, detail, update, suspend, activate
- /api/tokens — create (returns plaintext once), list, revoke
- /api/invitations — create, list, accept (creates user + first token)

Token creation: 32 random bytes → hex plaintext, SHA-256 hash stored.
Invitation accept: validates hash + pending + not expired, creates
user record and first API token atomically.

All test files updated for CombinedAuthState type change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes the DB-backed user management feature (#1605):

- Startup migration: when GATEWAY_USER_TOKENS is set and the users
  table is empty, inserts env-var users + hashed tokens into DB.
  Logs deprecation notice when DB already has users.
- hash_token made pub for reuse in migration code.
- 10 integration tests for UserStore (libsql file-backed):
  - has_any_users bootstrap detection
  - create/get/get_by_email/list/update user lifecycle
  - token create → authenticate → revoke → reject cycle
  - suspended user tokens rejected
  - wrong-user token revoke returns false
  - invitation create → accept → user created
  - record_login and record_token_usage timestamps
- libSQL migration: removed FK constraints from V14 (incompatible
  with execute_batch inside transactions). Tables in both base SCHEMA
  and incremental migration for fresh and existing databases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GATEWAY_USER_TOKENS never went to production — replaced entirely by
DB-backed user management via /api/admin/users and /api/tokens.

Removed:
- UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing
- user_tokens field from GatewayConfig
- GatewayChannel::new_multi_auth() constructor
- Env-var user migration block in main.rs (~90 lines)
- multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime
  via db.has_any_users() in app.rs)

Review fixes (zmanian):
- User ID generation: UUID instead of display-name derivation (#1)
- Invitation accept moved to public router (no auth needed) (#3)
- libSQL get_invitation_by_hash aligned with postgres: filters
  status='pending' AND expires_at > now (#4)
- UUID parse: returns DatabaseError::Serialization instead of
  unwrap_or_default (#7)
- PostgreSQL SELECT * replaced with explicit column lists (#8)
- Sort order aligned (both backends use DESC) (#6)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a `role` field (admin|member) to user management:

Schema:
- `role TEXT NOT NULL DEFAULT 'member'` added to users table in both
  PostgreSQL V14 migration and libSQL schema/incremental migration
- UserRecord gains `role: String` field
- UserIdentity gains `role: String` field, populated from DB in
  DbAuthenticator and defaulting to "admin" for single-user mode

Access control:
- AdminUser extractor: returns 403 Forbidden if role != "admin"
- /api/admin/users/* handlers: require AdminUser (create, list,
  detail, update, suspend, activate)
- POST /api/invitations: requires AdminUser (only admins can invite)
- User creation accepts optional "role" param (defaults to "member")
- Invitation acceptance creates users with "member" role

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a Users tab to the web gateway UI for managing users, tokens,
and roles without needing direct API calls.

Features:
- User list table with ID, name, email, role, status, created date
- Create user form with display name, email, role selector
- Suspend/activate actions per user
- Create API token for any user (shows plaintext once with copy button)
- Role badges (admin highlighted, member muted)
- Non-admin users see "Admin access required" message
- Keyboard shortcut: Cmd/Ctrl+5 switches to Users tab

CSS:
- Reuses routines-table styles for the user list
- Badge, token-display, btn-small, btn-danger, btn-primary components

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical auth bug: token creation hashed the raw 32 bytes
(hasher.update(token_bytes)) but authentication hashed the hex-encoded
string (hash_token(candidate) where candidate is the hex string the
user sends). This meant newly created tokens could never authenticate.

Fixed all 4 token creation sites (users, tokens, invitations create,
invitations accept) to use hash_token(&plaintext_token) which hashes
the hex string consistently with the auth lookup path.

Removed now-unused sha2::Digest imports from handlers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The invitation flow is redundant — admin create user already generates
a token and shows a login link. Invitations add complexity without
value until email integration exists.

Removed:
- InvitationRecord struct and 4 UserStore trait methods
- invitations table from V14 migration (postgres + both libsql schemas)
- PostgreSQL Store methods (create/get/accept/list invitations)
- libSQL UserStore invitation methods + row_to_invitation helper
- invitations.rs handler file (212 lines)
- /api/invitations routes (create, list, accept)
- test_invitation_lifecycle test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements zmanian's architectural proposal from #1614 review:
two-tier scoped database access (TenantScope/AdminScope) so handler
code cannot accidentally bypass tenant scoping.

TenantScope (default): wraps user_id + Arc<dyn Database>, auto-binds
user_id on every operation. ID-based lookups return None for cross-
tenant resources. No escape hatch — forgetting to scope is a compile
error.

AdminScope (explicit opt-in): cross-tenant access for system-level
components (heartbeat, routine engine, self-repair, scheduler, worker).

TenantCtx bundles TenantScope + workspace + cost guard + per-user
rate limiting. Constructed once per request in handle_message, threaded
through all command handlers and ChatDelegate.

Key changes:
- New src/tenant.rs (~920 lines): TenantScope, AdminScope, TenantCtx,
  TenantRateState, TenantRateRegistry
- All command handlers: user_id: &str → ctx: &TenantCtx
- ChatDelegate: cost check/record/settings via self.tenant
- System components: store field changed to AdminScope
- Config: TENANT_MAX_LLM_CONCURRENT, TENANT_MAX_JOBS_CONCURRENT env vars
- Fixes bug: /status <job_id> cross-tenant leak (now auto-filtered)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temporary diagnostic endpoint that tests DB INSERT to users table
with full error logging. No auth required. Will be removed after
debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Security Concern — Webhook Guard Removal

The diff removes the multi-tenant isolation guard from webhook_trigger_handler (src/channels/web/handlers/webhooks.rs):

// REMOVED:
if state.workspace_pool.is_some() {
    return Err((StatusCode::GONE, "Unscoped webhooks disabled..."));
}

This guard prevents cross-user routine triggering in multi-tenant deployments. Removing it weakens tenant isolation — any attacker with a webhook secret can trigger routines across all users via the unscoped /api/webhooks/{path} endpoint. The doc comment is updated from "Disabled in multi-tenant mode" to advisory guidance, but nothing enforces that recommendation.

Recommendation: Keep the guard, or make unscoped webhooks in multi-tenant mode opt-in via config.

Good Changes

  • token_prefix() helper — properly uses char_indices().nth(8) to avoid panicking on multi-byte UTF-8. Applied consistently across tokens.rs, users.rs, and main.rs.
  • Role constants (ROLE_ADMIN/ROLE_MEMBER + is_valid_role()) — reduces stringly-typed bugs.
  • created_by validation — checks user existence before recording. However, the _ => None arm silently swallows DB errors — consider logging on Err(e).

Issues

  1. No teststoken_prefix() is a pure function that's trivially testable. The created_by validation change also warrants a test. Per project rules, new code should include tests.
  2. PR scope mismatch — title says "first-class workspace entities with membership and cross-workspace sharing" but the diff is incremental cleanups. The PR notes mention uncommitted changes — consider scoping the title to what's actually here.
  3. Whitespace-only changes in heartbeat.rs and cli/mod.rs are cosmetic noise.
  4. Migration comment removes "invitation flow" from the header — may be premature if that's planned for a follow-up.

@github-actions github-actions bot added scope: channel Channel infrastructure scope: tool/builtin Built-in tools scope: workspace Persistent memory / workspace size: XL 500+ changed lines and removed size: M 50-199 changed lines labels Mar 31, 2026
…e-entities-db-users

# Conflicts:
#	src/agent/scheduler.rs
#	src/agent/thread_ops.rs
#	src/channels/web/auth.rs
#	src/channels/web/handlers/chat.rs
#	src/channels/web/server.rs
#	src/db/libsql/conversations.rs
#	src/db/libsql_migrations.rs
#	src/db/postgres.rs
#	src/history/store.rs
#	src/tenant.rs
@serrrfirat serrrfirat marked this pull request as ready for review April 1, 2026 12:58
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review -- REQUEST_CHANGES

Well-structured transition from implicit per-user isolation to explicit DB-backed workspace entities. The backward compatibility story (personal workspace = workspace_id IS NULL) is clean. Authorization model consistently applied across all handler types. Good integration test coverage.

Must fix

1. No role validation on member upsert -- body.role is passed directly to DB with no validation. Any string accepted ("superadmin", "", etc.). Validate against {owner, admin, member, viewer}.

2. Slug validation missing -- workspaces_create_handler accepts any slug including special characters, spaces, path traversal. Slugs appear in URL paths. Validate with pattern like ^[a-z0-9][a-z0-9-]{1,62}[a-z0-9]$.

3. Last-owner protection -- No guard preventing the sole owner from removing themselves, orphaning the workspace.

Medium

4. Admin can promote themselves to owner via the upsert endpoint's ON CONFLICT DO UPDATE. Is this intentional?

5. Workspace-scoped settings bypass key extraction -- API keys in workspace LLM settings stored unencrypted (secrets store extraction skipped when scope.is_some()).

6. Duplicate resolve_chat_workspace_id -- identical function in both chat.rs and server.rs. Extract to shared location.

7. fire_manual(routine_id, None) loses caller tracking -- previously passed Some(&user.user_id) for audit trail.

Low

  • list_workspaces_for_user includes archived workspaces (intentional?)
  • row_to_workspace uses .parse().unwrap_or_default() -- silently produces nil UUID on parse failure
  • PostgreSQL migration drops/recreates unique indexes without data guards

Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: First-class Workspace Entities

Overview

This PR adds DB-backed workspace entities with membership, cross-workspace sharing, and scoped access across the entire stack — from schema through DB traits, API handlers, SSE event delivery, WebSocket connections, the Responses API, and runtime propagation. It's a large (~4.5k additions), ambitious, and well-structured change that touches nearly every subsystem.

Strengths

  • Clean backward compatibility: Personal workspace (workspace_id = NULL) continues to work transparently. Existing data requires no migration into stored workspace rows.
  • Consistent scope propagation: workspace_id flows end-to-end through IncomingMessageJobContextPendingApproval → SSE events. This is done thoroughly.
  • Good handler refactoring: Routines handlers extracted load_visible_routine() and resolve_routine_scope(), eliminating repeated fetch-then-check-ownership patterns.
  • SSE scope filtering: The event_matches_scope() function and tests for workspace fan-out are well-designed — personal events stay personal, workspace events reach all workspace subscribers.
  • Both backends implemented: PostgreSQL and libSQL both implement all new trait methods, meeting the dual-backend requirement.
  • New tests: Workspace integration tests exercise membership, archived gating, scoped chat, settings isolation, thread isolation, and Responses API ownership.

Issues

Critical

  1. libSQL base schema FK ordering bug: In libsql_migrations.rs, the conversations, agent_jobs, memory_documents, routines, and settings tables all have workspace_id TEXT REFERENCES workspaces(id) ON DELETE CASCADE, but the workspaces table is defined after them in the base schema. On a fresh libSQL database, this will fail because the referenced table doesn't exist when the FK is created. The workspaces and workspace_members table definitions must be moved before any table that references them.

  2. PostgreSQL set_setting race condition: The PG set_setting was rewritten from ON CONFLICT ... DO UPDATE to INSERT ... ON CONFLICT DO NOTHING + UPDATE. Because the unique constraint was replaced with partial indexes, ON CONFLICT can't reference the old constraint. However, the single-setting path does not wrap in a transaction, so two concurrent writes could both succeed at INSERT ON CONFLICT DO NOTHING (neither sees the other's uncommitted row) before the UPDATE runs. Either wrap in a transaction or use ON CONFLICT ON CONSTRAINT <index_name> DO UPDATE.

  3. Missing incremental libSQL migration: The workspace schema changes appear only in the base schema, not as incremental migration steps. Existing libSQL databases won't get the new workspaces/workspace_members tables or the workspace_id columns on existing tables. There needs to be a migration step for existing databases.

Security

  1. Owner can be removed from workspace: remove_workspace_member has no guard against removing the last owner. An admin could remove the sole owner, leaving an orphaned workspace with no owner. Add a check similar to is_last_admin in user management.

  2. No role validation on add_workspace_member: The workspace_members_upsert_handler accepts any body.role string without validation. The WorkspaceMemberWriteRequest has just pub role: String — there's no check that it's one of the valid roles (owner/admin/member/viewer). The is_valid_role() helper was added in users.rs but not used here.

  3. Unscoped webhooks re-enabled in multi-tenant mode: The guard in webhook_trigger_handler that rejected unscoped webhooks in multi-tenant mode was removed entirely. The comment says "For multi-tenant isolation, use the user-scoped endpoint" but the old guard prevented cross-user routine triggering. This is a regression — the per-routine webhook secret provides weak protection compared to tenant isolation.

Correctness

  1. conversation_belongs_to_user workspace semantics: When workspace_id is Some(id), the query matches correctly. But parsed_workspace_id() in thread_ops.rs silently drops invalid workspace UUIDs (returns None). If a message has a malformed workspace_id string, it would be treated as a personal-scope check, potentially allowing access to personal conversations when workspace access was intended.

  2. fire_manual lost user verification: In routines_trigger_handler, the call changed from fire_manual(routine_id, Some(&user.user_id)) to fire_manual(routine_id, None). While the handler now does load_visible_routine before firing, verify that this fully replaces the internal ownership check that the user_id parameter provided.

  3. Duplicate resolve_chat_workspace_id function: This function is defined identically in both src/channels/web/handlers/chat.rs and src/channels/web/server.rs. Extract to a shared location to avoid divergence.

Code Quality

  1. #[allow(clippy::too_many_arguments)] on send_notification: Adding workspace_id pushed this to 8 parameters. Consider grouping into a struct instead of suppressing the lint.

  2. workspace_settings_user_id convention: Both libSQL and PG settings backends use format!("workspace:{workspace_id}") as a synthetic user_id for workspace settings. This convention-based coupling is fragile — if a real user_id ever starts with workspace:, it would collide. Consider using a fixed sentinel or removing the user_id requirement for workspace-scoped settings.

  3. Repeated scope-resolution boilerplate: Nearly every handler has the same resolve_*_scope → branch on workspace/personal pattern. Consider an axum extractor that resolves workspace scope once and injects it.

Missing

  1. No slug validation: WorkspaceCreateRequest.slug is accepted as-is with no validation (length, allowed characters, reserved words). Slugs are used in URLs — validate them.

  2. Settings key extraction skipped for workspace scope: settings_set_handler skips API key extraction for workspace-scoped llm_builtin_overrides and llm_custom_providers (just uses body.value.clone()). This means workspace settings could store plaintext API keys that personal settings would encrypt.

Summary

This is solid foundational work. The architecture is clean, the scope propagation is thorough, and backward compatibility is well-preserved. The critical items (libSQL schema ordering, PG settings race, missing incremental migration) need to be fixed before merge. The security items (owner removal, role validation, webhook guard) should be addressed.

@serrrfirat
Copy link
Copy Markdown
Collaborator Author

Addressed the review items in 172729c1 and pushed them onto the PR branch feat/1607-workspace-entities-db-users.

Summary of fixes:

  • Added workspace role validation, slug validation, archived-workspace filtering, last-owner protection, and owner-only owner-role mutations.
  • Restored caller-aware authorization for manual routine triggers, restored the multi-tenant guard for unscoped webhooks, and fail-closed malformed workspace_id parsing in thread hydration.
  • Removed the duplicate chat workspace resolver by consolidating onto a shared helper.
  • Fixed workspace-scoped settings secret extraction so workspace LLM API keys are vaulted instead of stored in plaintext, and switched workspace settings storage away from the workspace:{id} user-id convention to a fixed sentinel.
  • Refactored routine notifications to use a struct instead of #[allow(clippy::too_many_arguments)].
  • Fixed libSQL schema ordering for workspace FKs, added an idempotent incremental libSQL migration path for existing databases, and added regression coverage for fresh and legacy libSQL DBs.
  • Wrapped the PostgreSQL single-setting write paths in transactions to close the insert/update race on the new partial-index scheme.
  • Added tests for token_prefix(), created_by validation, workspace membership/owner/archive behavior, workspace secret scoping, the multi-tenant webhook guard, and libSQL migration behavior.
  • Removed the whitespace-only diff noise that was called out.
  • Replaced the browser prompt() token-name flow with inline UI.

Verification run:

  • cargo test --test workspaces_integration -- --nocapture
  • cargo test --lib libsql_migrations::tests -- --nocapture
  • cargo test --lib token_prefix -- --nocapture
  • cargo test --lib workspace_scope -- --nocapture
  • cargo test --lib unscoped_webhooks_are_disabled_in_multi_tenant_mode -- --nocapture
  • cargo test --lib user_creation_ -- --nocapture
  • cargo fmt --all
  • git diff --check

Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few concrete regressions here, plus one larger architectural inconsistency in the workspace model.

  1. High: workspace-scoped history authorization is bypassable for active in-memory threads.
    In src/channels/web/handlers/chat.rs, chat_history_handler() correctly checks conversation_belongs_to_user(thread_id, user_id, workspace_id), but on failure it only returns 404 if the thread is not present in sess.threads. It then falls through to the in-memory branch and returns that thread’s turns anyway. In practice, the same user can request a thread under the wrong ?workspace= and still read it as long as that thread is active in session. That defeats the new workspace scoping check for active threads.

  2. High: workspace-scoped SSE/WS subscribers no longer receive user-scoped auth events.
    In src/channels/web/sse.rs, event_matches_scope() rejects (Some(event_user), None) whenever the subscriber has a workspace scope, because it requires subscriber_workspace_id.is_none(). But auth-required/auth-completed events are still emitted with broadcast_for_user(...) from both the HTTP and WS chat flows. That means a user connected to /api/chat/events?workspace=... or the workspace-scoped WS path will miss extension auth events in that workspace tab.

  3. Medium: workspace-scoped settings dropped the active-provider safety invariant.
    In src/channels/web/handlers/settings.rs, the guard that prevents removing the currently active custom provider only runs when scope.is_none(). Once llm_backend / llm_custom_providers are workspace-scoped, that invariant needs to hold there too. The helper itself is also still hard-wired to user-scoped get_setting(...), so the feature surface has expanded without the corresponding safety rule.

  4. Medium: the PR is halfway between a shared-workspace conversation model and a user-owned conversation model.
    Live response/status events are now broadcast to the whole workspace in src/channels/web/mod.rs, but persisted conversation listing and ownership checks in the DB layer still key conversations by user_id + workspace_id. That leaves an unresolved product/security boundary: either workspace members can see live thread activity they cannot later list/open, or the storage/authorization model has not actually been generalized to match the new workspace-sharing behavior. I would want that model made explicit and enforced consistently before merging.

At a higher level, the implementation is extending workspace scope through routing and transport faster than through the underlying ownership model. That is exactly where subtle multi-tenant bugs show up. I’d recommend tightening the model first, then adding focused tests for:

  • chat/history with the wrong workspace against an active thread
  • workspace-scoped SSE/WS auth-required/auth-completed flows
  • workspace-scoped llm_custom_providers updates/deletes when the active backend is custom

Copy link
Copy Markdown
Collaborator Author

Addressed the April 3 review in cfd72b7f and pushed it to feat/1607-workspace-entities-db-users.

  1. chat/history now fails closed on workspace mismatch before falling through to active in-memory session threads. This is fixed in the gateway handler path and covered by workspace_chat_history_rejects_active_thread_from_wrong_workspace.

  2. Workspace-scoped auth events now preserve workspace scope end-to-end. Pending auth stores the thread workspace, and the HTTP, WS, OAuth callback, and extension setup completion paths now emit auth_required / auth_completed with broadcast_for_user_in_workspace(...) so the correct workspace tab receives them.

  3. The active custom-provider safety guard now applies to workspace-scoped settings too. The guard reads llm_backend / llm_custom_providers from the resolved scope instead of assuming personal settings, and the workspace update/delete cases are covered by targeted tests.

  4. I made the ownership model explicit for this PR by choosing the user-owned conversation path rather than shared workspace conversations. Live chat response/status events are no longer workspace-wide fan-out when a user scope exists; they are now scoped to user_id + workspace_id, which matches the persisted conversation listing / ownership model. I also updated the /v1/responses subscriber path to use the same scoped subscription so workspace requests still receive their own events.

Verification run on this follow-up:

  • cargo test --lib routing_scope -- --nocapture
  • cargo test --lib no_silent_drop -- --nocapture
  • cargo test --lib response_id_round_trip -- --nocapture
  • cargo fmt --all
  • git diff --check

Earlier targeted checks for the other three items are still green on this branch as well:

  • cargo test --lib test_chat_auth_token_handler_broadcasts_workspace_scoped_auth_required -- --nocapture
  • cargo test --lib test_handle_client_auth_token_broadcasts_workspace_scoped_auth_completed -- --nocapture
  • cargo test --lib guard_active_provider_not_removed_for_workspace -- --nocapture
  • cargo test --lib test_pending_auth_ -- --nocapture
  • cargo test --test workspaces_integration workspace_chat_history_rejects_active_thread_from_wrong_workspace -- --nocapture

# Conflicts:
#	src/agent/agent_loop.rs
#	src/agent/routine_engine.rs
#	src/channels/web/handlers/chat.rs
#	src/channels/web/handlers/jobs.rs
#	src/channels/web/server.rs
#	src/db/libsql_migrations.rs
#	src/db/mod.rs
#	src/db/postgres.rs
#	src/tools/builtin/memory.rs
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review — NEEDS DISCUSSION (do not merge as-is)

TL;DR: Sound architecture and a good backward-compatibility story, but four critical safety issues plus three security/correctness items require explicit decisions before merge. The "delicate" flag is warranted — this touches IronClaw's most valuable data. Recommend additional reviewers (DB + security).


Critical issues (blockers)

# Severity File:Line Issue Suggestion
1 CRITICAL `src/db/libsql_migrations.rs` (base SCHEMA) FK ordering bug. `conversations`/`agent_jobs`/`memory_documents`/`routines`/`settings` reference `workspaces(id)` before `workspaces` is created. Fresh libSQL databases will fail on migration. Move `workspaces` + `workspace_members` CREATE TABLE to the top of SCHEMA, before `conversations`. Test on a fresh libSQL instance
2 CRITICAL `src/db/libsql_migrations.rs` (`INCREMENTAL_MIGRATIONS`) Missing libSQL incremental migration. The base SCHEMA has workspace tables but `INCREMENTAL_MIGRATIONS` has no entry — existing libSQL deployments won't get workspace tables or scoping columns Add migration #18 with all workspace CREATE TABLEs + ALTER TABLEs for scoping columns. Per CLAUDE.md: "All new persistence features must support both backends"
3 CRITICAL `src/db/postgres.rs::set_setting` + `handlers/settings.rs` Settings upsert race. PG rewrite uses INSERT+UPDATE without a transaction wrapper. Two concurrent writes can both succeed at INSERT (neither sees the other's uncommitted row) before UPDATE runs Wrap `set_setting` in a transaction, or use `ON CONFLICT ON CONSTRAINT` with the partial index name
4 CRITICAL `handlers/chat.rs` + `channels/web/mod.rs` Authorization model incomplete. `chat_history_handler` checks workspace scope but falls through to the in-memory active thread without re-checking scope. Workspace SSE fan-out broadcasts live events to all workspace subscribers, but conversation ownership remains user-keyed — members see live activity from threads they can't later list/query Audit the conversation ownership model. Is it user-owned or workspace-shared? If shared, listing queries must be workspace-keyed. If user-owned, what should live SSE show? Make the model explicit in code + tests

High-severity issues

# Severity File:Line Issue Suggestion
5 High `handlers/workspaces.rs` (`workspace_members_upsert_handler`) No role validation — `body.role` accepted as-is; any string passes ("superadmin", ""). `is_valid_role()` helper already exists in `users.rs` Validate against `["owner","admin","member","viewer"]`. Return 400 with an error. Add regression test
6 High `handlers/workspaces.rs` No slug validation. Slugs appear in URLs and accept any string (spaces, special chars, path-traversal attempts) Validate: `^[a-z0-9][a-z0-9-]{1,62}[a-z0-9]$`. Return 400. Add regression test
7 High `src/db/workspace.rs::remove_workspace_member` No guard preventing removal of the last owner. An admin could delete the sole owner, orphaning the workspace Count members with `role='owner'`; if count==1 and target is owner, return 409. Test: `workspace_cannot_remove_last_owner`
8 High `handlers/webhooks.rs` The `if state.workspace_pool.is_some() { return Err(GONE, ...) }` guard was removed entirely. Unscoped webhooks can now trigger routines across all users in multi-tenant mode. Doc was updated to "advisory guidance" but nothing enforces it Either restore the guard or make unscoped webhooks opt-in via a config flag. Security regression from the perspective of multi-tenant isolation. Clarify intent with maintainers

Medium / Low

# Severity File:Line Issue Suggestion
9 Medium `handlers/chat.rs` + `channels/web/server.rs` `resolve_chat_workspace_id` defined identically in two files — will diverge Extract to `src/channels/web/scope.rs` and import once
10 Medium `handlers/settings.rs` Settings key extraction (secret injection) is skipped for workspace-scoped settings. Workspace LLM API keys are stored plaintext while personal settings are encrypted Apply the same secret extraction/encryption logic for workspace settings, or document explicitly why workspace members are trusted to see raw API keys
11 Medium `src/db/libsql/workspace.rs` (`row_to_workspace`) `.parse().unwrap_or_default()` on UUID silently produces nil UUID on malformed input Use `?` error propagation; log on error
12 Medium `handlers/workspaces.rs` Admin can self-promote to owner via the upsert endpoint's `ON CONFLICT DO UPDATE` path Clarify intent. If only owners should promote to owner: return 403 for non-owner attempts to set `role='owner'`. Test: `workspace_member_cannot_self_promote_to_owner`
13 Low `src/db/libsql/workspace.rs` (`list_workspaces_for_user`) Includes archived workspaces Comment the rationale or filter with a test for both modes
14 Low `routine_engine.rs` (`fire_manual(routine_id, None)`) Lost user tracking (previously `Some(&user.user_id)`) Verify `load_visible_routine` fully replaces internal checks and preserves the audit trail

Dual-backend support assessment

Component PostgreSQL libSQL Coverage
Schema V16__workspaces.sql ✓ Base SCHEMA + INCREMENTAL_MIGRATIONS ? PARTIAL
FK ordering Correct Broken NO
Concurrency `ON CONFLICT DO UPDATE` INSERT+UPDATE, no txn wrapper DIVERGENT
Role validation Validated at DB level Validated at DB level
Slug validation Missing in both Missing in both

The libSQL backend is incompletely implemented. Fresh libSQL init will fail; existing libSQL deployments silently lack workspace functionality.

Test coverage — gaps

Present:

  • `tests/workspaces_integration.rs` covers membership endpoints, archived 410, scoped chat forwarding, settings isolation, thread listing isolation, Responses API ownership, slug/role validation tests (`workspace_creation_rejects_invalid_slug`, `workspace_member_updates_validate_roles_and_owner_permissions`, `workspace_cannot_remove_or_demote_last_owner`)

Missing:

  • SSE/WebSocket auth-event delivery test to workspace-scoped subscribers
  • Active in-memory thread access-control bypass test
  • Custom provider deletion guard on workspace-scoped settings
  • libSQL backend not tested with migration — integration tests run on PostgreSQL; no freshness check for libSQL
  • Concurrent workspace settings write race test

Risk assessment (production)

  1. Deployment blocker (libSQL): fresh init fails; existing libSQL silently lacks workspace functionality
  2. Multi-tenant visibility regression: SSE is workspace-wide but listing is user-keyed
  3. Settings race (libSQL): concurrent writes can create duplicates/overwrites (low probability)
  4. Webhook tenant isolation: removal of the multi-tenant guard allows unscoped webhooks across user boundaries
  5. Unencrypted workspace API keys: DB breach or log leak exposes them
  6. Authorization ambiguity: no clear story for "what does it mean to be a workspace member?"

Open questions for maintainer

  1. libSQL migration strategy: include the `INCREMENTAL_MIGRATIONS` entry in this PR, or separate? How do we prevent deploying the base schema without the incremental migrations?
  2. Webhook multi-tenant isolation: was removing the `workspace_pool` guard intentional?
  3. Conversation ownership: when a conversation is in a workspace, is it user-owned (only the creating user sees it) or workspace-shared (all members see it)? Affects listing, history, ownership checks. Currently user-owned for DB queries but workspace-wide for SSE — inconsistent.
  4. Workspace settings encryption: why skip secret extraction for workspace-scoped LLM settings? By design (members are trusted) or oversight?
  5. Self-promotion to owner: is it intentional that an admin can promote themselves to owner via the upsert endpoint?

Recommended additional reviewers

  • DB expert — migration strategy, FK ordering, transaction safety, dual-backend equivalence
  • Security reviewer — multi-tenant isolation (webhook guard removal, SSE scope filtering, settings encryption)
  • Product/design — explicit conversation ownership model decision

Conclusion

The foundational work is solid. Backward compatibility is clean, scope propagation is thorough, and the integration tests are well-structured. But this PR cannot merge without:

  1. Fixing the four criticals
  2. Resolving the security questions (last-owner verification in impl, webhook guard intent, workspace settings encryption policy)
  3. Addressing the high-priority gaps (role validation, slug validation, duplicate function extraction)

The "delicate" flag from the maintainer is warranted — take the time to get it right. Consider a follow-up PR for non-critical improvements (e.g., the scope-resolution helper extraction).

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

+6848/-735, 70+ files — comprehensive implementation of DB-backed workspace entities, membership, cross-workspace sharing, scoped data access, and workspace-aware SSE fan-out. Both PG and libSQL migrations included. Well-structured for the scope.

Must Fix

  1. Dead/duplicate handlers in handlers/chat.rs vs server.rshandlers/chat.rs defines public chat_send_handler, chat_approval_handler, etc. that are never used in routing. server.rs has its own local versions that are actually wired up. This is a significant maintenance hazard — a reader won't know which handler is live. Should consolidate to one location.

  2. workspace_id type inconsistency: String vs Uuidworkspace_id is Option<String> in IncomingMessage, JobContext, PendingAuth, PendingApproval, but Option<Uuid> in Routine, MemoryDocument, SandboxJobRecord, AgentJobRecord, and the DB layer. This leads to ~10+ scattered Uuid::parse_str(id).ok() conversions where a malformed UUID silently becomes None. Should parse once at the boundary and propagate as Uuid throughout.

  3. joined_at reset on member role update — the UPSERT in both PG and libSQL resets joined_at to now() when updating an existing member's role. This loses historical data. Should only update role and invited_by on conflict.

Suggestions

  1. Operator precedence in validate_workspace_slug — the ||/&& chain is correct by accident (De Morgan's). Add explicit parens for clarity: !(bytes[0].is_ascii_lowercase() || bytes[0].is_ascii_digit()).

  2. Dual sentinel schemesWORKSPACE_SETTINGS_SENTINEL_USER_ID (__workspace_settings__) vs workspace_scope_user_id() (workspace:{uuid}) serve different purposes but could confuse future contributors. Add a clarifying comment.

  3. Webhook multi-tenant guard ordering — the check moved from webhook_trigger_handler into fire_webhook_inner, meaning rate limit now runs before the GONE rejection. Minor, but in a DDoS scenario the ordering matters.

  4. No Postgres test coverage — integration tests all use LibSqlBackend::new_memory(). The Postgres WorkspaceMgmtStore implementation is untested.

  5. No way to list archived workspaceslist_workspaces_for_user filters them out with no include_archived option for audit/cleanup.

Cross-Cutting Observations

This PR and others in the current review batch share a dual-backend testing gap — multiple PRs only test libSQL, not Postgres, which could let PG-specific bugs slip through. Worth establishing a pattern for testing both backends on persistence-heavy PRs like this one.

Code Style

No .unwrap()/.expect() in production code, proper error mapping with context, thiserror types, strong types (WorkspaceRecord, WorkspaceMembership, ResolvedWorkspace), consistent crate:: imports. The RoutineNotification struct replacing #[allow(clippy::too_many_arguments)] is a nice refactor.

Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Request changes (foundational PR — #1977 stacks on this)

  • src/channels/web/handlers/workspaces.rs:88-101 validate_workspace_slug — operator-precedence + boolean logic does not faithfully implement the documented ^[a-z0-9][a-z0-9-]{1,62}[a-z0-9]$ regex. Rewrite as a clear .all() pass plus first/last anchors.
  • Routine viewer privilege: routines_trigger_handler only calls load_visible_routine (read check). Workspace viewers can therefore execute routines via the gateway. Likely unintended.
  • chat_approval_handler TOCTOU: reads pending.workspace_id from in-memory session but doesn't re-check membership at submission time. A user removed between request and approval can still complete it. Re-check membership against pending.workspace_id.
  • workspace_members_upsert_handler TOCTOU: get_member_role / is_last_workspace_owner / add_workspace_member is not transactional — two concurrent owner-removal upserts can both pass. Wrap in a single SQL transaction or conditional UPDATE.
  • PG query planner risk: hot lookups now use workspace_id IS NOT DISTINCT FROM $N, which PG generally cannot use a btree index for. EXPLAIN ANALYZE list_conversations_all_channels on a populated DB; consider splitting personal vs workspace queries or rewriting as (workspace_id = $N OR (workspace_id IS NULL AND $N IS NULL)).
  • WORKSPACE_SETTINGS_SENTINEL_USER_ID = "__workspace_settings__" — if a real user ever ends up with this id, they could read all workspace settings. Add a CHECK constraint or user-creation guard.
  • migrations/V16__workspaces.sql:158-160 drops settings_pkey leaving the table with NO primary key. Acceptable for partial-unique-index design but call out in release notes; logical replication tools may misbehave.
  • No down migration. Document the rollback plan.
  • src/db/postgres.rs:~2480 set_setting — uses INSERT … ON CONFLICT DO NOTHING then UPDATE instead of an inferred-target upsert. Functionally correct but fragile if a partial index is missing.
  • Strong dual-backend implementation throughout. Migration safety is mostly additive. Excellent integration test suite (tests/workspaces_integration.rs, +715 lines) plus libSQL migration upgrade tests.
  • Per-message Uuid::parse_str(workspace_id) happens in many hot paths — consider storing as Uuid end-to-end in a follow-up.

standardtoaster added a commit to standardtoaster/ironclaw that referenced this pull request Apr 16, 2026
Rebased squash of nearai#1734 onto current staging with all review feedback
addressed.

Core feature: DB-backed workspace entities (users, workspaces,
workspace_members) with membership-based access control. Adds nullable
workspace_id scoping to conversations, jobs, memory, routines, and
settings. Personal workspace (workspace_id=NULL) remains the default.

Review feedback fixes included in this squash:
- Slug validation rewritten for clarity (operator precedence)
- joined_at no longer reset on member role update (both backends)
- Archive requires owner role, not just manager
- TOCTOU: new update_member_role_checked/remove_workspace_member_checked
  with transactional last-owner guard (both PG and libsql)
- Thread ownership check runs for all threads including session fallback
- Settings export/import reject workspace-scoped requests (501)
- Routine trigger rejects workspace viewers (403)
- Job save hard-fails on malformed workspace_id
- PG migration renumbered to V25 (staging is at V24)
- libSQL incremental migration renumbered to 25

Original-PR: nearai#1734
Original-Author: serrrfirat

Co-Authored-By: serrrfirat <firat@nearai.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: experienced 6-19 merged PRs risk: high Safety, secrets, auth, or critical infrastructure scope: agent Agent core (agent loop, router, scheduler) scope: channel/cli TUI / CLI channel scope: channel/web Web gateway channel scope: channel Channel infrastructure scope: config Configuration scope: db/libsql libSQL / Turso backend scope: db/postgres PostgreSQL backend scope: db Database trait / abstraction scope: dependencies Dependency updates scope: docs Documentation scope: llm LLM integration scope: sandbox Docker sandbox scope: tool/builtin Built-in tools scope: workspace Persistent memory / workspace size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: first-class workspace entities with membership and cross-workspace sharing

3 participants