feat: admin management panel — web UI for users and usage monitoring#1963
feat: admin management panel — web UI for users and usage monitoring#1963ilblackdragon merged 18 commits intostagingfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a system-wide admin dashboard, including a new /api/admin/usage/summary endpoint and a single-page application (SPA) for administration. Key backend changes include adding job counts and cost statistics to user details, and ensuring that deleted users are immediately evicted from the authentication cache. A critical performance issue was identified in the new usage summary handler, which currently aggregates all user and usage records in memory; this should be refactored to use targeted database queries to prevent potential performance bottlenecks and DoS vulnerabilities as the system scales.
| let users = store | ||
| .list_users(None) | ||
| .await | ||
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | ||
|
|
||
| let total = users.len(); | ||
| let active = users.iter().filter(|u| u.status == "active").count(); | ||
| let suspended = users.iter().filter(|u| u.status == "suspended").count(); | ||
| let admins = users.iter().filter(|u| u.role == "admin").count(); | ||
|
|
||
| let summary_stats = store | ||
| .user_summary_stats(None) | ||
| .await | ||
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | ||
|
|
||
| let total_jobs: i64 = summary_stats.iter().map(|s| s.job_count).sum(); | ||
| let total_cost: rust_decimal::Decimal = summary_stats.iter().map(|s| s.total_cost).sum(); | ||
|
|
||
| let since_30d = chrono::Utc::now() - chrono::Duration::days(30); | ||
| let usage_stats = store | ||
| .user_usage_stats(None, since_30d) | ||
| .await | ||
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | ||
|
|
||
| let llm_calls: i64 = usage_stats.iter().map(|s| s.call_count).sum(); | ||
| let input_tokens: i64 = usage_stats.iter().map(|s| s.input_tokens).sum(); | ||
| let output_tokens: i64 = usage_stats.iter().map(|s| s.output_tokens).sum(); | ||
| let usage_cost: rust_decimal::Decimal = usage_stats.iter().map(|s| s.total_cost).sum(); | ||
|
|
||
| let uptime_seconds = state.startup_time.elapsed().as_secs(); |
There was a problem hiding this comment.
The current implementation loads all user and usage records into memory to perform aggregations. This approach can lead to significant performance bottlenecks and potential DoS vulnerabilities as the dataset grows. In accordance with repository rules, please use targeted database queries to perform these calculations (e.g., using SQL COUNT and SUM) at the database level instead of fetching all records and filtering in the application code.
| let users = store | |
| .list_users(None) | |
| .await | |
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | |
| let total = users.len(); | |
| let active = users.iter().filter(|u| u.status == "active").count(); | |
| let suspended = users.iter().filter(|u| u.status == "suspended").count(); | |
| let admins = users.iter().filter(|u| u.role == "admin").count(); | |
| let summary_stats = store | |
| .user_summary_stats(None) | |
| .await | |
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | |
| let total_jobs: i64 = summary_stats.iter().map(|s| s.job_count).sum(); | |
| let total_cost: rust_decimal::Decimal = summary_stats.iter().map(|s| s.total_cost).sum(); | |
| let since_30d = chrono::Utc::now() - chrono::Duration::days(30); | |
| let usage_stats = store | |
| .user_usage_stats(None, since_30d) | |
| .await | |
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | |
| let llm_calls: i64 = usage_stats.iter().map(|s| s.call_count).sum(); | |
| let input_tokens: i64 = usage_stats.iter().map(|s| s.input_tokens).sum(); | |
| let output_tokens: i64 = usage_stats.iter().map(|s| s.output_tokens).sum(); | |
| let usage_cost: rust_decimal::Decimal = usage_stats.iter().map(|s| s.total_cost).sum(); | |
| let uptime_seconds = state.startup_time.elapsed().as_secs(); | |
| let since_30d = chrono::Utc::now() - chrono::Duration::days(30); | |
| let summary = store.get_usage_summary(since_30d).await | |
| .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e.to_string()))?; | |
| let active = summary.active_count; | |
| let suspended = summary.suspended_count; | |
| let admins = summary.admin_count; | |
| let total_jobs = summary.total_jobs; | |
| let total_cost = summary.total_cost; | |
| let llm_calls = summary.llm_calls; | |
| let input_tokens = summary.input_tokens; | |
| let output_tokens = summary.output_tokens; | |
| let usage_cost = summary.usage_cost; | |
| let uptime_seconds = state.startup_time.elapsed().as_secs(); |
References
- Use targeted database queries to fetch specific records instead of loading all records and filtering in the application, especially for public endpoints, to prevent performance bottlenecks and DoS vulnerabilities.
There was a problem hiding this comment.
Addressed in fde14287.
The handler no longer aggregates by loading user and usage rows into memory. I added a DB-level admin_usage_summary(since) operation on the shared UserStore trait, implemented it for both PostgreSQL and libSQL with SQL-side aggregate subqueries, and switched usage_summary_handler to consume that summary directly.
Verification run on this branch:
cargo test test_admin_usage_summary_aggregates_in_db -- --nocapturecargo test --lib admin_api_contracts -- --nocapturecargo test --lib admin_user_endpoints -- --nocapturecargo check --workspace --all-targets
|
Addressed in this PR since the original review:
I’m intentionally not widening this PR into the larger structural refactor. The remaining maintainability work is now tracked in #1968:
That keeps this feature branch mergeable while still capturing the follow-up refactor work explicitly. |
ilblackdragon
left a comment
There was a problem hiding this comment.
Automated review — LGTM with follow-ups
TL;DR: Solid foundation: admin SPA with user and usage monitoring endpoints. Auth is enforced correctly via the `AdminUser` extractor; the DB-aggregation performance fix was already applied in `fde14287`. Recommend approval with three follow-ups below.
Security assessment
| # | Severity | File:Line | Issue | Suggestion |
|---|---|---|---|---|
| 1 | Medium | `handlers/users.rs` (suspend/activate/delete endpoints) | No audit logging on privileged state-changes | Add structured tracing: `tracing::warn!(admin=%user.user_id, action="user_suspended", target_user=%id, ...)`. Needed for incident response and compliance. Consider a dedicated audit table as a follow-up |
| 2 | Low | `admin.js:1461` | Admin token is only sent if `!oidcProxyAuth`; the OIDC flow is unclear when `GATEWAY_OIDC_ENABLED=true` | Clarify/document: how does the admin panel authenticate under OIDC proxy mode? Is the `/api/gateway/status` check sufficient? Add a test for the OIDC proxy path |
| 3 | Low | `admin.js:1510` | Token stored in `sessionStorage` (cleared on close) | Good — not `localStorage`. For multi-user deployments consider HttpOnly cookies. CSP headers would further reduce XSS token-theft risk |
| 4 | Info | POST endpoints | CSRF protection via Bearer-header SPA auth (not form tokens) | Works because every POST has `Authorization: Bearer`; XSS → token theft. Consider adding `Content-Security-Policy: default-src 'self'` headers |
| 5 | Info | `handlers/users.rs:432` | Adds `db_auth.invalidate_user()` on delete | Good correctness fix — deleted users lose access immediately |
| 6 | Info | `admin_usage_summary` | Performance concern flagged by gemini-bot was resolved in `fde14287` via SQL subqueries | ✓ |
DB compatibility ✓
Dual-backend support is present:
- `admin_usage_summary()` implemented in both `/db/libsql/users.rs` (~696-785) and the PostgreSQL backend
- SQL-level `SELECT COUNT/SUM` subqueries → avoids in-memory aggregation
- Test: `test_admin_usage_summary_aggregates_in_db` validates the libSQL path
- New trait method: `UserStore::admin_usage_summary(since) -> AdminUsageSummary`
Test gaps to consider
- Non-admin → permission denied on delete (does this exist? hard to tell from diff)
- Cannot demote sole admin
- OIDC proxy admin auth path
Follow-ups for a subsequent PR (not blockers)
- Audit logging for suspend/activate/delete (structured tracing first, dedicated audit table later)
- OIDC admin auth docs + test
- Input validation: email format (RFC 5322 minimum), display name length. The current endpoints accept `serde_json::Value` for body — add explicit validation
- Rate limiting on the summary endpoint if user-count growth is expected
- CSP headers for the admin SPA
|
CI fails + need to merge origin staging |
Resolve conflicts in users.rs (keep typed DTOs + add pairing_store eviction from staging) and db/mod.rs (keep AdminUsageSummary + add PairingRequestRecord). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add structured tracing (warn-level) to suspend, activate, delete, and update handlers so that privileged admin actions are recorded with the acting admin's user_id, the action performed, and the target user. Addresses security assessment item #1 from PR review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use PairingStore::new_noop() since the test doesn't need a real DB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts: - server.rs: keep Korean i18n route from staging + admin panel routes from HEAD - style.css: keep HEAD's theme.css extraction (tokens live in theme.css, not style.css) - theme.css: add --info token from staging to both dark and light themes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ilblackdragon
left a comment
There was a problem hiding this comment.
Verdict: Request changes (minor)
total_jobssemantics bug in bothsrc/db/libsql/users.rs:~3220andsrc/history/store.rs:~3415—SELECT COUNT(DISTINCT j.id) FROM llm_calls l LEFT JOIN agent_jobs j …counts only jobs that have at least one LLM call. Dashboard "Total Jobs" undercounts. Fix: queryagent_jobsdirectly or rename tojobs_with_llm_calls.admin.js:~2230showConfirmModalinjectsmessageas raw HTML. Currently safe (only called withescapeHtml(name)literals) but unsafe-by-default — a future caller passing untrusted data trivially XSSes the admin panel. Rename tomessageHtmlor escape internally.- PG
COALESCE(SUM(l.cost), 0)insrc/history/store.rs:~3450— the literal0may type-infer as integer; verify with explicit::numericcast. - Tests cover wire contracts (DTO deserialization) and libSQL aggregation. Missing PG parity test for
admin_usage_summary(dual-backend rule). - Bearer-token auth (no cookies) — not CSRF-vulnerable. Dual-backend
UserStore::admin_usage_summaryproperly implemented in both backends. - Verify
llm_calls.created_atandllm_calls.job_idindexes exist; un-windowedSUM(cost)could be slow on large tables. - Consider
info!instead ofwarn!for successful admin audit events (warn implies anomaly).
- Fix total_jobs semantics: query agent_jobs directly instead of counting via LEFT JOIN on llm_calls (which missed jobs without LLM calls). Fixed in both libSQL and PostgreSQL backends. - Fix showConfirmModal XSS: escape message parameter internally instead of relying on callers to sanitize. - Add explicit ::numeric cast to PG COALESCE(SUM(cost), 0) to prevent integer type inference. - Use info! instead of warn! for successful admin audit events (update, suspend, activate, delete) — warn implies anomaly. - Add missing index on llm_calls.created_at for both PG (V21 migration) and libSQL (incremental migration 21). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review — Admin Management PanelScope: Standalone admin SPA at Strengths
Issues & SuggestionsCorrectness / DB
API / Typing
Frontend (admin.js)
Tests
Minor
Risk Summary
Overall this is a solid, well-tested restage with the right architectural moves (typed responses + contract tests). The DB index migration strategy and the all-time |
Summary
Supersedes #1916. This recreates the admin management panel on top of current
staginginstead of carrying forward the old merge train.Ref #1609
This branch adds:
/admin,/admin.css, and/admin.jsGET /api/admin/usage/summaryfor dashboard aggregationIt also carries forward the small correctness fixes that were still outstanding on the original PR:
job_count,total_cost, andlast_active_atin the user detail response/api/admin/usage/summaryin the web gateway route tableTesting
cargo test --lib admin_user_endpoints -- --nocapturecargo check --workspace --all-targets