Skip to content

scripts/cleanup-sessions.sh: hardcoded PROJECT_ROOT paths + no consistency validation #1825

@nhod

Description

@nhod

The new scripts/cleanup-sessions.sh has a few robustness issues I ran into while syncing a fork:

1. Hardcoded paths ignore NANOCLAW_*_DIR env vars

The script derives STORE_DB, SESSIONS_DIR, and GROUPS_DIR from $PROJECT_ROOT:

STORE_DB="$PROJECT_ROOT/store/messages.db"
SESSIONS_DIR="$PROJECT_ROOT/data/sessions"
GROUPS_DIR="$PROJECT_ROOT/groups"

But NanoClaw itself reads NANOCLAW_STORE_DIR / NANOCLAW_DATA_DIR / NANOCLAW_GROUPS_DIR at runtime (see src/config.ts). On any deploy where those are set to something other than $PROJECT_ROOT/... (e.g. a persistent-volume mount at /data/...), the cleanup script looks in the wrong place: the DB isn't found, and the script exits 1 without cleaning anything. Session artifacts then accumulate forever, and because src/session-cleanup.ts logs the error non-fatally, it's easy to miss.

Suggested fix — fall back to PROJECT_ROOT only when env isn't set:

STORE_DIR=\"\${NANOCLAW_STORE_DIR:-\$PROJECT_ROOT/store}\"
DATA_DIR=\"\${NANOCLAW_DATA_DIR:-\$PROJECT_ROOT/data}\"
GROUPS_DIR=\"\${NANOCLAW_GROUPS_DIR:-\$PROJECT_ROOT/groups}\"
STORE_DB=\"\$STORE_DIR/messages.db\"
SESSIONS_DIR=\"\$DATA_DIR/sessions\"

2. Silent-failure modes even with paths correct

Two related hazards:

a) If sqlite3 CLI isn't available, ACTIVE_IDS=\$(sqlite3 \"\$STORE_DB\" \"...\" 2>/dev/null || true) returns empty, and every session JSONL past the 7-day window is then considered inactive and deleted — including what are still live sessions per the DB the script couldn't read. The || true hides the real failure.

b) If NANOCLAW_STORE_DIR and NANOCLAW_DATA_DIR are set to inconsistent trees (different deployments' data mixed), the script reads ACTIVE_IDS from one and deletes artifacts from the other — potentially wiping sessions still marked active in the other tree's DB.

Suggested fix:

  • Require sqlite3 (fail loudly if missing) rather than swallowing the error.
  • Validate that all three configured directories exist before any deletion, and fail with a clear error if any is missing.
  • Consider deriving all three from a single validated base (or adding an optional sanity check that the trees agree) before trusting ACTIVE_IDS.

Context

Hit this syncing a fork (CoachClaw) that runs on a Fly.io volume at /data. Worked around both issues locally, but they'd bite any NanoClaw deploy that uses the NANOCLAW_*_DIR env vars or runs on a minimal container image without sqlite3 installed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions