Skip to content

fix: sync gateway tokens in init-config to prevent token mismatch after pod restart#23

Merged
thepagent merged 2 commits intomainfrom
fix/gateway-token-mismatch
Mar 28, 2026
Merged

fix: sync gateway tokens in init-config to prevent token mismatch after pod restart#23
thepagent merged 2 commits intomainfrom
fix/gateway-token-mismatch

Conversation

@thepagent
Copy link
Copy Markdown
Owner

@thepagent thepagent commented Mar 27, 2026

Closes #22

The Problem (Issue #22)

On a fresh Helm install, the init container copies openclaw.json from the ConfigMap (which is rendered from .Values.config in values.yaml) onto the PVC. The gateway starts fine.

But after a pod restart, the init container skips the copy (file already exists on PVC), so gateway.remote.token stays stale or unset — while the gateway process always reads its auth token fresh from OPENCLAW_GATEWAY_TOKEN env var.

  Fresh install
  ┌──────────────────────────────────────────────────────┐
  │ init-config                                          │
  │  openclaw.json not on PVC                            │
  │  → copy from ConfigMap (rendered from values.yaml)   │
  └──────────────────────────────────────────────────────┘
           │                          │
           ▼                          ▼
  ┌─────────────────┐        ┌─────────────────┐
  │ gateway process │        │  openclaw CLI   │
  │ auth = PGdODrQd │  ✓ ok  │ remote = PGdODrQd│
  │ (env var)       │        │ (from PVC)      │
  └─────────────────┘        └─────────────────┘

  After pod restart
  ┌──────────────────────────────────────────────────────┐
  │ init-config                                          │
  │  openclaw.json already on PVC → SKIP                 │
  │  gateway.remote.token = (stale / missing)            │
  └──────────────────────────────────────────────────────┘
           │                          │
           ▼                          ▼
  ┌─────────────────┐        ┌─────────────────┐
  │ gateway process │        │  openclaw CLI   │
  │ auth = PGdODrQd │  ✗     │ remote = stale  │
  │ (env var)       │        │ (from PVC)      │
  └─────────────────┘        └─────────────────┘
           └──────── token mismatch ──┘

Why Only remote.token Needs Syncing

The gateway resolves its own auth token with env-first precedence:

  OPENCLAW_GATEWAY_TOKEN (env)  ←── always wins
           │
           ▼
  gateway.auth.token (config)   ←── only used if env var absent

So gateway.auth.token in openclaw.json is irrelevant when OPENCLAW_GATEWAY_TOKEN is set. Only the CLI needs gateway.remote.token in config to know what token to present.

The Fix (This PR)

Run the sync on every pod start, not just first start:

  Every pod start
  ┌─────────────────────────────────────────────────────┐
  │ init-config                                         │
  │  1. copy openclaw.json if not on PVC (unchanged)    │
  │  2. if OPENCLAW_GATEWAY_TOKEN set:                  │
  │       gateway.remote.token = OPENCLAW_GATEWAY_TOKEN │
  └─────────────────────────────────────────────────────┘
           │                          │
           ▼                          ▼
  ┌─────────────────┐        ┌─────────────────┐
  │ gateway process │        │  openclaw CLI   │
  │ auth = PGdODrQd │  ✓ ok  │ remote = PGdODrQd│
  │ (env var)       │        │ (synced)        │
  └─────────────────┘        └─────────────────┘
           └──────────── match ✓ ─────┘

Non-Helm deployments are unaffected — the sync block is guarded by [ -n "$OPENCLAW_GATEWAY_TOKEN" ].

Impact

Deployment Impact
Helm with OPENCLAW_GATEWAY_TOKEN remote.token synced on every pod start
Helm without OPENCLAW_GATEWAY_TOKEN ✅ No change
Non-Helm (binary) ✅ No change

Note: This is a Helm-layer Workaround

The root cause is an asymmetry in the upstream codebase: the gateway reads its auth token from env var, while the CLI reads it from openclaw.json. The proper fix would be for the CLI to also prefer OPENCLAW_GATEWAY_TOKEN over gateway.remote.token in config (env-first on the client side), eliminating the need for any sync logic here.

This PR patches the problem at the Helm layer until upstream adopts that approach. The sync script can be removed from this chart once the upstream CLI supports env-first credential resolution.

…er pod restart

- Fresh install: set only gateway.remote.token (gateway uses OPENCLAW_GATEWAY_TOKEN env var directly)
- Existing install: sync both gateway.auth.token and gateway.remote.token for backward compatibility
- If OPENCLAW_GATEWAY_TOKEN is not set, skip token sync entirely (non-Helm deployments unaffected)

Closes #22
@masami-agent
Copy link
Copy Markdown

Thanks—this looks like a solid fix. A few suggestions to make the behavior clearer and easier to operate long-term:

  1. Define a single source of truth
    For Helm deployments, it’d help to explicitly document that OPENCLAW_GATEWAY_TOKEN (K8s Secret) is the source of truth, and openclaw.json is derived/synced at startup to avoid config/secret drift.

  2. Token rotation expectations
    Since tokens are re-synced on every pod start, rotating the Secret + restarting the pod will apply the new token (great). It may be worth adding a short note in README/upgrade notes that clients/CLI must be updated accordingly because old tokens will stop working.

  3. Avoid persisting secrets to disk (optional/configurable)
    Fresh install behavior (not persisting auth.token) is great. For existing installs where gateway.auth.token is written for backwards compatibility, consider a value flag like gateway.persistTokenInConfig (default false) or at least document the security implication (ensure the config volume/backups/logs don’t leak secrets).

thepagent

This comment was marked as duplicate.

…logic

gateway uses env-first precedence (OPENCLAW_GATEWAY_TOKEN > gateway.auth.token),
so writing auth.token to config is redundant. Remove IS_FRESH distinction and
only sync gateway.remote.token on every pod start.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: gateway token mismatch after pod restart

3 participants