Skip to content

feat: admin management panel — web UI for users and usage monitoring#1963

Merged
ilblackdragon merged 18 commits intostagingfrom
firat/admin-management-panel-restage
Apr 14, 2026
Merged

feat: admin management panel — web UI for users and usage monitoring#1963
ilblackdragon merged 18 commits intostagingfrom
firat/admin-management-panel-restage

Conversation

@serrrfirat
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat commented Apr 3, 2026

Summary

Supersedes #1916. This recreates the admin management panel on top of current staging instead of carrying forward the old merge train.

Ref #1609

This branch adds:

  • a standalone admin SPA served from /admin, /admin.css, and /admin.js
  • dashboard, users, user detail, and usage views for admin operations
  • GET /api/admin/usage/summary for dashboard aggregation

It also carries forward the small correctness fixes that were still outstanding on the original PR:

  • invalidate deleted-user auth cache immediately
  • include job_count, total_cost, and last_active_at in the user detail response
  • clear proxy-auth mode when switching back to manual token auth
  • preserve the one-time token banner after creating a user
  • document /api/admin/usage/summary in the web gateway route table

Testing

  • cargo test --lib admin_user_endpoints -- --nocapture
  • cargo check --workspace --all-targets

@github-actions github-actions bot added scope: channel/web Web gateway channel scope: docs Documentation size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: experienced 6-19 merged PRs labels Apr 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a system-wide admin dashboard, including a new /api/admin/usage/summary endpoint and a single-page application (SPA) for administration. Key backend changes include adding job counts and cost statistics to user details, and ensuring that deleted users are immediately evicted from the authentication cache. A critical performance issue was identified in the new usage summary handler, which currently aggregates all user and usage records in memory; this should be refactored to use targeted database queries to prevent potential performance bottlenecks and DoS vulnerabilities as the system scales.

Comment thread src/channels/web/handlers/users.rs Outdated
Comment on lines +581 to +610
let users = store
.list_users(None)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;

let total = users.len();
let active = users.iter().filter(|u| u.status == "active").count();
let suspended = users.iter().filter(|u| u.status == "suspended").count();
let admins = users.iter().filter(|u| u.role == "admin").count();

let summary_stats = store
.user_summary_stats(None)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;

let total_jobs: i64 = summary_stats.iter().map(|s| s.job_count).sum();
let total_cost: rust_decimal::Decimal = summary_stats.iter().map(|s| s.total_cost).sum();

let since_30d = chrono::Utc::now() - chrono::Duration::days(30);
let usage_stats = store
.user_usage_stats(None, since_30d)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;

let llm_calls: i64 = usage_stats.iter().map(|s| s.call_count).sum();
let input_tokens: i64 = usage_stats.iter().map(|s| s.input_tokens).sum();
let output_tokens: i64 = usage_stats.iter().map(|s| s.output_tokens).sum();
let usage_cost: rust_decimal::Decimal = usage_stats.iter().map(|s| s.total_cost).sum();

let uptime_seconds = state.startup_time.elapsed().as_secs();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation loads all user and usage records into memory to perform aggregations. This approach can lead to significant performance bottlenecks and potential DoS vulnerabilities as the dataset grows. In accordance with repository rules, please use targeted database queries to perform these calculations (e.g., using SQL COUNT and SUM) at the database level instead of fetching all records and filtering in the application code.

Suggested change
let users = store
.list_users(None)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let total = users.len();
let active = users.iter().filter(|u| u.status == "active").count();
let suspended = users.iter().filter(|u| u.status == "suspended").count();
let admins = users.iter().filter(|u| u.role == "admin").count();
let summary_stats = store
.user_summary_stats(None)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let total_jobs: i64 = summary_stats.iter().map(|s| s.job_count).sum();
let total_cost: rust_decimal::Decimal = summary_stats.iter().map(|s| s.total_cost).sum();
let since_30d = chrono::Utc::now() - chrono::Duration::days(30);
let usage_stats = store
.user_usage_stats(None, since_30d)
.await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let llm_calls: i64 = usage_stats.iter().map(|s| s.call_count).sum();
let input_tokens: i64 = usage_stats.iter().map(|s| s.input_tokens).sum();
let output_tokens: i64 = usage_stats.iter().map(|s| s.output_tokens).sum();
let usage_cost: rust_decimal::Decimal = usage_stats.iter().map(|s| s.total_cost).sum();
let uptime_seconds = state.startup_time.elapsed().as_secs();
let since_30d = chrono::Utc::now() - chrono::Duration::days(30);
let summary = store.get_usage_summary(since_30d).await
.map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?;
let active = summary.active_count;
let suspended = summary.suspended_count;
let admins = summary.admin_count;
let total_jobs = summary.total_jobs;
let total_cost = summary.total_cost;
let llm_calls = summary.llm_calls;
let input_tokens = summary.input_tokens;
let output_tokens = summary.output_tokens;
let usage_cost = summary.usage_cost;
let uptime_seconds = state.startup_time.elapsed().as_secs();
References
  1. Use targeted database queries to fetch specific records instead of loading all records and filtering in the application, especially for public endpoints, to prevent performance bottlenecks and DoS vulnerabilities.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in fde14287.

The handler no longer aggregates by loading user and usage rows into memory. I added a DB-level admin_usage_summary(since) operation on the shared UserStore trait, implemented it for both PostgreSQL and libSQL with SQL-side aggregate subqueries, and switched usage_summary_handler to consume that summary directly.

Verification run on this branch:

  • cargo test test_admin_usage_summary_aggregates_in_db -- --nocapture
  • cargo test --lib admin_api_contracts -- --nocapture
  • cargo test --lib admin_user_endpoints -- --nocapture
  • cargo check --workspace --all-targets

Copy link
Copy Markdown
Collaborator Author

Addressed in this PR since the original review:

  • fixed the correctness issues called out earlier
  • extracted shared theme tokens into /theme.css so the admin surface no longer forks the main theme variables
  • moved the admin API responses in users.rs onto Rust-owned DTOs in src/channels/web/types.rs
  • added contract tests covering the admin create/list/detail/usage/summary response shapes so frontend/backend drift is caught in CI

I’m intentionally not widening this PR into the larger structural refactor. The remaining maintainability work is now tracked in #1968:

  • split src/channels/web/static/admin.js into smaller no-build modules/helpers
  • move admin route/static assembly out of src/channels/web/server.rs into a dedicated admin module

That keeps this feature branch mergeable while still capturing the follow-up refactor work explicitly.

@github-actions github-actions bot added scope: db Database trait / abstraction scope: db/postgres PostgreSQL backend labels Apr 3, 2026
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review — LGTM with follow-ups

TL;DR: Solid foundation: admin SPA with user and usage monitoring endpoints. Auth is enforced correctly via the `AdminUser` extractor; the DB-aggregation performance fix was already applied in `fde14287`. Recommend approval with three follow-ups below.

Security assessment

# Severity File:Line Issue Suggestion
1 Medium `handlers/users.rs` (suspend/activate/delete endpoints) No audit logging on privileged state-changes Add structured tracing: `tracing::warn!(admin=%user.user_id, action="user_suspended", target_user=%id, ...)`. Needed for incident response and compliance. Consider a dedicated audit table as a follow-up
2 Low `admin.js:1461` Admin token is only sent if `!oidcProxyAuth`; the OIDC flow is unclear when `GATEWAY_OIDC_ENABLED=true` Clarify/document: how does the admin panel authenticate under OIDC proxy mode? Is the `/api/gateway/status` check sufficient? Add a test for the OIDC proxy path
3 Low `admin.js:1510` Token stored in `sessionStorage` (cleared on close) Good — not `localStorage`. For multi-user deployments consider HttpOnly cookies. CSP headers would further reduce XSS token-theft risk
4 Info POST endpoints CSRF protection via Bearer-header SPA auth (not form tokens) Works because every POST has `Authorization: Bearer`; XSS → token theft. Consider adding `Content-Security-Policy: default-src 'self'` headers
5 Info `handlers/users.rs:432` Adds `db_auth.invalidate_user()` on delete Good correctness fix — deleted users lose access immediately
6 Info `admin_usage_summary` Performance concern flagged by gemini-bot was resolved in `fde14287` via SQL subqueries

DB compatibility ✓

Dual-backend support is present:

  • `admin_usage_summary()` implemented in both `/db/libsql/users.rs` (~696-785) and the PostgreSQL backend
  • SQL-level `SELECT COUNT/SUM` subqueries → avoids in-memory aggregation
  • Test: `test_admin_usage_summary_aggregates_in_db` validates the libSQL path
  • New trait method: `UserStore::admin_usage_summary(since) -> AdminUsageSummary`

Test gaps to consider

  • Non-admin → permission denied on delete (does this exist? hard to tell from diff)
  • Cannot demote sole admin
  • OIDC proxy admin auth path

Follow-ups for a subsequent PR (not blockers)

  1. Audit logging for suspend/activate/delete (structured tracing first, dedicated audit table later)
  2. OIDC admin auth docs + test
  3. Input validation: email format (RFC 5322 minimum), display name length. The current endpoints accept `serde_json::Value` for body — add explicit validation
  4. Rate limiting on the summary endpoint if user-count growth is expected
  5. CSP headers for the admin SPA

@ilblackdragon
Copy link
Copy Markdown
Member

CI fails + need to merge origin staging

Resolve conflicts in users.rs (keep typed DTOs + add pairing_store eviction
from staging) and db/mod.rs (keep AdminUsageSummary + add PairingRequestRecord).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added contributor: core 20+ merged PRs and removed contributor: experienced 6-19 merged PRs labels Apr 6, 2026
serrrfirat and others added 2 commits April 6, 2026 14:53
Add structured tracing (warn-level) to suspend, activate, delete, and
update handlers so that privileged admin actions are recorded with the
acting admin's user_id, the action performed, and the target user.
Addresses security assessment item #1 from PR review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use PairingStore::new_noop() since the test doesn't need a real DB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the scope: channel/wasm WASM channel runtime label Apr 6, 2026
Resolve conflicts:
- server.rs: keep Korean i18n route from staging + admin panel routes from HEAD
- style.css: keep HEAD's theme.css extraction (tokens live in theme.css, not style.css)
- theme.css: add --info token from staging to both dark and light themes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Request changes (minor)

  • total_jobs semantics bug in both src/db/libsql/users.rs:~3220 and src/history/store.rs:~3415SELECT COUNT(DISTINCT j.id) FROM llm_calls l LEFT JOIN agent_jobs j … counts only jobs that have at least one LLM call. Dashboard "Total Jobs" undercounts. Fix: query agent_jobs directly or rename to jobs_with_llm_calls.
  • admin.js:~2230 showConfirmModal injects message as raw HTML. Currently safe (only called with escapeHtml(name) literals) but unsafe-by-default — a future caller passing untrusted data trivially XSSes the admin panel. Rename to messageHtml or escape internally.
  • PG COALESCE(SUM(l.cost), 0) in src/history/store.rs:~3450 — the literal 0 may type-infer as integer; verify with explicit ::numeric cast.
  • Tests cover wire contracts (DTO deserialization) and libSQL aggregation. Missing PG parity test for admin_usage_summary (dual-backend rule).
  • Bearer-token auth (no cookies) — not CSRF-vulnerable. Dual-backend UserStore::admin_usage_summary properly implemented in both backends.
  • Verify llm_calls.created_at and llm_calls.job_id indexes exist; un-windowed SUM(cost) could be slow on large tables.
  • Consider info! instead of warn! for successful admin audit events (warn implies anomaly).

- Fix total_jobs semantics: query agent_jobs directly instead of
  counting via LEFT JOIN on llm_calls (which missed jobs without
  LLM calls). Fixed in both libSQL and PostgreSQL backends.
- Fix showConfirmModal XSS: escape message parameter internally
  instead of relying on callers to sanitize.
- Add explicit ::numeric cast to PG COALESCE(SUM(cost), 0) to
  prevent integer type inference.
- Use info! instead of warn! for successful admin audit events
  (update, suspend, activate, delete) — warn implies anomaly.
- Add missing index on llm_calls.created_at for both PG (V21
  migration) and libSQL (incremental migration 21).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the scope: db/libsql libSQL / Turso backend label Apr 8, 2026
@ilblackdragon
Copy link
Copy Markdown
Member

Review — Admin Management Panel

Scope: Standalone admin SPA at /admin, new GET /api/admin/usage/summary with backing admin_usage_summary query for both libSQL and Postgres, strongly-typed AdminUser*Response structs replacing ad-hoc serde_json::json!, extracted theme.css, and a migration adding idx_llm_calls_created_at.

Strengths

  • Strong typing migration is the headline win: ad-hoc serde_json::json! responses → named AdminUser*Response structs in types.rs. Enables real contract tests (admin_api_contracts module) that deserialize into the same types the handlers serialize from — exactly the right shape.
  • Audit logging added on suspend/activate/update/delete handlers (tracing::info!(admin = …, action = …, target_user = …)).
  • Auth-cache eviction on delete is wired correctly (db_auth.invalidate_user + ps.evict_user).
  • Tests cover both layers: admin role enforcement (member gets 403 for /api/admin/usage/summary) and a libSQL DB-level test for admin_usage_summary with realistic seeded data.
  • Theme extraction (theme.css) cleanly removes ~300 lines of duplication and is shared by both SPAs.
  • Dual-backend coverage: both src/db/libsql/users.rs and src/history/store.rs implement admin_usage_summary.

Issues & Suggestions

Correctness / DB

  • Postgres index is created without CONCURRENTLYmigrations/V21__llm_calls_created_at_index.sql. On a large llm_calls table this takes an ACCESS EXCLUSIVE-blocking lock for writes. Refinery typically wraps migrations in a transaction (which forbids CONCURRENTLY) — if so, this may need to be applied out-of-band or via the no-transaction escape hatch. Worth verifying with how migrations/ is applied here, since llm_calls is one of the hottest tables.
  • admin_usage_summary issues 4 separate scoped subqueries against llm_calls (count, input_tokens, output_tokens, cost) all filtered by created_at >= \$1, plus a 5th for total cost. Collapsing the 30-day window into one aggregation:
    SELECT COUNT(*), COALESCE(SUM(input_tokens),0), COALESCE(SUM(output_tokens),0), COALESCE(SUM(cost),0::numeric)
    FROM llm_calls WHERE created_at >= \$1
    cuts work ~4x.
  • total_cost (all-time) does a full table scan of llm_calls every dashboard load. Either cache it or precompute a rolling counter — this will get expensive fast.

API / Typing

  • AdminUserDetailResponse duplicates 12 fields from AdminUserInfo plus metadata (src/channels/web/types.rs). Use #[serde(flatten)] info: AdminUserInfo to keep the wire shape and DRY the struct. Otherwise the next field addition has to be made in two places.
  • AdminUsageSummaryUsers uses usize for counts, while sibling fields (total_jobs, llm_calls) are i64. The handler does fallible usize::try_from(i64) for each field. Just use i64 for symmetry and drop the four try_from blocks in users.rs.

Frontend (admin.js)

  • OIDC-proxy auto-detection is fragile: autoAuth() pings /api/gateway/status and treats any res.ok as 'we are behind OIDC proxy'. If that endpoint is publicly reachable, members and unauthenticated users get pushed into the proxy-auth path and bounced to access-denied with no manual-token fallback until they clear sessionStorage. Tighten the heuristic.
  • Token survives in browser history briefly: ?token=... is stripped via history.replaceState, but in-flight referrers and any error path that throws before the strip can leak it. Consider switching to #token= (never sent to server, never indexed) or POSTing it.
  • renderUsersPage rebuilds the entire users table on every keystroke in the search input. For installations with hundreds of users this will feel laggy and drop focus/selection state. Filter via DOM hide/show, or debounce.
  • apiFetch mutates the caller's options object — not strictly a bug since callers use throwaway literals, but a footgun.
  • Inline style=\"...\" everywhere (color, display, margin) bypasses the freshly extracted theme tokens and makes light-mode regressions likely.
  • alert() for every error path is rough UX — existing modal infra is right there.

Tests

  • Admin contract tests don't exercise created_at/updated_at parseability — they assert presence of T in since but never round-trip the user timestamps through chrono. A single DateTime::parse_from_rfc3339(&body.created_at).unwrap() would catch format drift.
  • No test for the is_last_admin guard interacting with role demotion via the admin panel flow — would be worth at least one negative test given the panel makes self-demotion one click away.

Minor

  • usage_stats_handler rebinds _user → _admin but doesn't add audit logging like the mutating handlers do. Read-only, so fine — just noting the inconsistency.
  • admin_html_handler is registered for /admin, /admin/, and /admin/{*path} — verify no future /api/admin/... route collisions (none today).

Risk Summary

Risk Severity
Postgres index migration could lock llm_calls Med-High — verify migration runner
Full-table SUM(cost) on every dashboard load Med — will degrade
OIDC-proxy detection false positives Med — UX trap
Token in URL during initial auth Low-Med
Struct duplication / usize vs i64 Low
Search-input full re-render Low

Overall this is a solid, well-tested restage with the right architectural moves (typed responses + contract tests). The DB index migration strategy and the all-time SUM(cost) are the two things I'd want addressed before merge; everything else is polish.

This was referenced Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs DB MIGRATION PR adds or modifies PostgreSQL or libSQL migration definitions risk: medium Business logic, config, or moderate-risk modules scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: db/libsql libSQL / Turso backend scope: db/postgres PostgreSQL backend scope: db Database trait / abstraction scope: docs Documentation size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants