fix: preserve gateway token across helm upgrades#2
fix: preserve gateway token across helm upgrades#2aehrt55 wants to merge 2 commits intofeat/extensibility-fieldsfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the OpenClaw Helm chart to version 1.4.1, introducing support for standard Kubernetes scheduling fields (nodeSelector, affinity, and tolerations) and adding documentation for tool persistence. It also implements a mechanism to reuse existing gateway tokens during upgrades and syncs them into the configuration via an init container. However, a critical bug was identified in the deployment template where the secret key reference does not match the key defined in the secret manifest, which will prevent the token from being correctly applied.
| {{- else }} | ||
| name: {{ .Values.gateway.token.secretName }} | ||
| {{- end }} | ||
| key: token |
There was a problem hiding this comment.
The secretKeyRef.key is set to token, but the corresponding Secret generated by templates/secret.yaml uses the key OPENCLAW_GATEWAY_TOKEN. This mismatch will cause the OPENCLAW_GATEWAY_TOKEN environment variable in the init-config container to be empty, as optional: true is set. Consequently, the script to sync the token into openclaw.json will not execute, which undermines the goal of persisting the token correctly across upgrades.
key: OPENCLAW_GATEWAY_TOKEN
aehrt55
left a comment
There was a problem hiding this comment.
PR Review: fix: preserve gateway token across helm upgrades
🔴 Critical Issues
1. secretKeyRef key mismatch — init-config token sync is a silent no-op
deployment.yaml references key: token, but secret.yaml stores the value under key: OPENCLAW_GATEWAY_TOKEN.
# deployment.yaml (init-config container)
secretKeyRef:
key: token # ← does not exist in the Secret
optional: true # ← silently swallows the missing key
# secret.yaml
data:
OPENCLAW_GATEWAY_TOKEN: ... # ← actual key nameBecause optional: true is set, Kubernetes won't error — the env var is simply empty. The if [ -n "$OPENCLAW_GATEWAY_TOKEN" ] guard evaluates to false, and the entire token sync is silently skipped on every pod start. The fix this PR intends to deliver is completely non-functional.
Fix: Change key: token to key: OPENCLAW_GATEWAY_TOKEN in the init-config secretKeyRef.
2. PR branch is stale — needs rebase onto main
The branch predates PRs thepagent#19, thepagent#21, and thepagent#23, which are already merged. As a result:
- 5 of 6 changed files already exist identically on
main Chart.yamlwill conflict (1.3.14 → 1.4.1, butmainis already at1.4.1)- 17 commits in history, most already merged via other PRs
After rebasing, the only net-new change should be the lookup pattern in secret.yaml. Bump version to 1.4.2 or appropriate.
🟡 High Severity
3. Node.js token sync script has zero error handling
node -e "
const c = JSON.parse(fs.readFileSync(p));
...
fs.writeFileSync(p, JSON.stringify(c, null, 2));
"Multiple silent failure modes:
openclaw.jsondoesn't exist yet →ENOENT→ script exits non-zero but shell continues- Invalid JSON in file →
SyntaxError→ same - PVC full or read-only →
ENOSPC/EACCES→ same
No set -e, no try/catch, no error logging. The init container will succeed regardless, and the token mismatch persists silently.
Suggestion: Wrap in try/catch with process.exit(1) on failure, and add || { echo "ERROR: token sync failed"; exit 1; } after the node -e call.
4. optional: true hides missing Secret misconfiguration
If gateway.token.generate: false and the user forgets to create the external Secret, the pod starts without any gateway token — potentially leaving the gateway unauthenticated. Consider making optional conditional:
optional: {{ not .Values.gateway.token.generate }}5. lookup returns empty in GitOps/dry-run contexts
helm template, --dry-run, ArgoCD diff, and Flux render all return {} from lookup. This causes rendered output to always show a new random token, which may cause constant drift detection in GitOps workflows.
This is a known limitation of the standard Helm pattern and is acceptable, but should be documented in a template comment:
{{- /* lookup returns empty during helm template / --dry-run; token regenerated in output only */ -}}📝 Documentation Issues
6. persistent-tools.md minor inaccuracies
| Line | Issue | Fix |
|---|---|---|
| 5 | Uses ~/.openclaw without clarifying absolute path |
Add "(i.e., /home/node/.openclaw inside the container)" on first mention |
| 25 | "no apt-get" — apt-get exists but requires root |
→ "runs as non-root user and cannot install system packages" |
| 53 | # binary lands at ~/.openclaw/workspace/bin/<package> |
Actual binary name depends on package's bin field, not package name |
7. Missing template comments
secret.yaml: Thelookuppattern needs a comment explaining intent:# Reuse existing token if Secret already exists (prevents regeneration on upgrade)deployment.yaml: Thechecksum/configannotation needs:# Force pod restart when config values change
8. values.yaml scheduling comments are generic
# nodeSelector constrains the pod to nodes with matching labels just restates Kubernetes docs. Either remove or make chart-specific.
✅ What Works Well
- The core
lookuppattern insecret.yamlis correct and follows the standard bitnami/ingress-nginx approach - Defensive triple-check (
$existingSecret && .data && index .data "KEY") handles edge cases properly helm.sh/resource-policy: keepretained as belt-and-suspenders- The env-first precedence comment in init-config explains "why" well
- ASCII diagrams in persistent-tools.md effectively communicate complex concepts
Action Items Summary
| Priority | Item |
|---|---|
| Must | Fix key: token → key: OPENCLAW_GATEWAY_TOKEN in secretKeyRef |
| Must | Rebase onto main, resolve conflicts, bump to 1.4.2+ |
| Should | Add error handling to init-config Node.js script |
| Should | Add explanatory comments to lookup pattern and checksum annotation |
| Could | Make optional: true conditional on gateway.token.generate |
| Could | Fix persistent-tools.md minor inaccuracies |
🤖 PR ReviewBranch: 📊 Summary
Status: ✅ Ready to merge (all important issues reviewed and deferred to backlog) 🟡 Important Issues1. ⏭️ lookup silently returns empty on RBAC failureFile: Problem: Helm Decision: Deferred — this is a known Helm limitation shared by all charts using 2. ⏭️ optional: true silently skips token injection when Secret is missingFile: Problem: When Secret is expected but missing, Decision: Deferred — currently works correctly in normal operation. Conditional 3. ⏭️ node -e failures produce raw stack tracesFile: Problem: Decision: Deferred — 4. ⏭️ No validation when generate=false and secretName is emptyFile: Problem: Decision: Deferred — will be addressed together with 💡 SuggestionsView 7 suggestions (0 addressed)
✨ Strengths
📋 Type Design Ratings
🎯 Action PlanBefore Merge:
After Merge (Backlog):
Generated by pr-review-and-document skill | Round 1 | [View edit history](click edited) |
29f7bc6 to
3ca3480
Compare
3ca3480 to
be428df
Compare
5c9e4a8 to
3702451
Compare
Use Helm lookup to retain existing gateway token Secret instead of regenerating on every helm upgrade. The randAlphaNum call is only used for initial creation when no Secret exists yet. Without this fix, every helm upgrade generates a new random token, invalidating all paired Control UI sessions and requiring users to re-enter the gateway token. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The token sync in init-config was silently non-functional due to two compounding issues that happened to cancel out: 1. secretKeyRef used `key: token` but Secret stores `OPENCLAW_GATEWAY_TOKEN`, so the env var was always empty and sync was always skipped 2. Since config had no gateway.auth.token, the gateway fell back to the env var from the main container's secretRef — masking the issue This worked by accident, but breaks if auth.token is ever set in config. Fixes: - secretKeyRef key: `token` → `OPENCLAW_GATEWAY_TOKEN` to match Secret - Sync gateway.auth.token (not remote.token), because the gateway process uses config-first precedence when OPENCLAW_SERVICE_KIND=gateway - Add `set -e` so init-config fails loudly on JSON parse or write errors - Correct misleading comment that claimed env-first precedence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d912e9b to
aee7a62
Compare
Summary
Use Helm
lookupto retain existing gateway token Secret instead of regenerating on everyhelm upgrade.Problem
randAlphaNum 32is re-evaluated on every template render, overwriting the Secret on each upgrade. Thehelm.sh/resource-policy: keepannotation only prevents deletion on uninstall, not overwrites on upgrade.This causes all paired Control UI sessions to lose their token after every
helm upgrade.Fix
Check if the Secret already exists via
lookup. If it does, reuse the existing token. Only generate a new random token on first install.This is the standard Helm pattern used by bitnami, ingress-nginx, and other charts for generated secrets.
🤖 Generated with Claude Code