Skip to content
This repository was archived by the owner on Feb 21, 2026. It is now read-only.

feat: Add Hetzner Cloud backend for ephemeral VM agents#12

Merged
Peyton-Spencer merged 4 commits intomainfrom
feat/hetzner-backend
Feb 14, 2026
Merged

feat: Add Hetzner Cloud backend for ephemeral VM agents#12
Peyton-Spencer merged 4 commits intomainfrom
feat/hetzner-backend

Conversation

@Peyton-Spencer
Copy link
Copy Markdown

@Peyton-Spencer Peyton-Spencer commented Feb 13, 2026

Summary

Add Hetzner Cloud backend to support ephemeral VM-based agent execution with S3-based I/O. This provides a cost-effective alternative to persistent cloud backends like Sprites/Railway by creating VMs on-demand and destroying them immediately after use.

New files

File Purpose
src/backends/hetzner-api.ts Hetzner Cloud API wrapper with server and SSH key lifecycle management
src/backends/hetzner-backend.ts Hetzner backend implementation following Railway's S3 inbox/outbox pattern

Changes

  • Add 'hetzner' to BackendType union in src/backends/types.ts
  • Register HetznerBackend in src/backends/index.ts
  • Add Hetzner config vars to src/config.ts:
    • HETZNER_API_TOKEN - API token from Hetzner Cloud Console
    • HETZNER_LOCATION - Datacenter location (default: ash - Ashburn, US)
    • HETZNER_SERVER_TYPE - VM size (default: cpx11 - 2 vCPU, 2GB RAM, €4.15/mo)
    • HETZNER_IMAGE - OS image (default: ubuntu-22.04)

Architecture

Ephemeral VM Lifecycle

  1. Create VM: API call to Hetzner Cloud creates server with cloud-init
  2. Bootstrap: Cloud-init installs Docker and starts nanoclaw container
  3. Execute: Agent polls S3 inbox, executes task, writes to S3 outbox
  4. Destroy: VM is deleted immediately after agent completes (ephemeral!)

S3-based I/O (same pattern as Railway)

  • Host writes prompt to S3 inbox → Agent polls inbox
  • Agent writes results to S3 outbox → Host polls outbox
  • File sync via S3 for workspace files

SSH Key Management

  • Generate RSA keypair on agent creation
  • Upload public key to Hetzner API
  • Include SSH key ID in server creation request
  • Delete SSH key when destroying server

Cloud-init Startup Script

#cloud-config
package_update: true
package_upgrade: true

packages:
  - docker.io
  - docker-compose

runcmd:
  - systemctl start docker
  - systemctl enable docker
  - docker pull <CONTAINER_IMAGE>
  - docker run -d --name nanoclaw-agent \
      -e NANOCLAW_S3_ENDPOINT=... \
      -e NANOCLAW_S3_BUCKET=... \
      -e NANOCLAW_AGENT_ID=... \
      <CONTAINER_IMAGE>

Benefits

Cost-Effective

  • Hourly billing: ~€0.006/hr for cpx11 (vs €4.15/mo for persistent)
  • Ephemeral: Only pay for actual usage time (minutes to hours)
  • Example: 2 hours of agent work = €0.012 total cost

Simple Infrastructure

  • Standard VMs (no SaaS abstraction layer)
  • Proven, reliable infrastructure
  • Easy to debug (SSH access if needed)

US Datacenter Support

  • Ashburn (ash) and Hillsboro (hil) locations
  • 1TB bandwidth included per server

Scalable

  • Multiple server types available:
    • cpx11: 2 vCPU, 2GB RAM, €4.15/mo
    • cpx21: 3 vCPU, 4GB RAM, €8.05/mo
    • cpx31: 4 vCPU, 8GB RAM, €14.75/mo
  • Configure via HETZNER_SERVER_TYPE env var

Usage

Set agent backend to 'hetzner' and configure required env vars:

const agent = {
  id: 'agent-id',
  name: 'Agent Name',
  folder: 'agent-folder',
  backend: 'hetzner',
  // ...
};
export HETZNER_API_TOKEN="your-hetzner-api-token"
export B2_ENDPOINT="s3.us-west-000.backblazeb2.com"
export B2_BUCKET="your-bucket"
export B2_ACCESS_KEY_ID="..."
export B2_SECRET_ACCESS_KEY="..."

Test plan

  • Verify bun run build compiles cleanly
  • Test Hetzner API wrapper (create/delete server, SSH keys)
  • Test cloud-init bootstrap with Docker installation
  • Test full agent run with S3 inbox/outbox
  • Verify VM destruction after completion
  • Test concurrent agent runs (multiple VMs)
  • Test error handling (VM creation failure, timeout, etc.)

Migration Path

As discussed with @future Trees:

  • Current: All agents on local (Apple Container / Docker)
  • Next: Local → Hetzner migration for cloud agents
  • Why Hetzner: Simpler and more cost-effective than Sprites/Daytona/Railway SaaS sandbox providers

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added Hetzner Cloud as a supported backend infrastructure option.
    • Introduced Hetzner configuration settings for API token, location, server type, and image.
    • Extended backend filtering to include Hetzner.
    • Enables ephemeral Hetzner VM execution with S3-based inbox/outbox I/O for agent runs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

Adds Hetzner Cloud support: new Hetzner API client and HetznerBackend for ephemeral VM provisioning with cloud-init and S3-based IPC, updates backend types/factory and configuration exports, and extends the agent-runner filter to accept 'hetzner'.

Changes

Cohort / File(s) Summary
Backend types & integration
src/backends/types.ts, src/backends/index.ts
Adds 'hetzner' to BackendType and integrates lazy-loaded Hetzner backend into the factory.
Configuration
src/config.ts
Exports new Hetzner configuration env values: HETZNER_API_TOKEN, HETZNER_LOCATION, HETZNER_SERVER_TYPE, HETZNER_IMAGE.
Agent-runner CLI
container/agent-runner/src/ipc-mcp-stdio.ts
Extends list_agents filter_backend enum to include 'hetzner'.
Hetzner API client
src/backends/hetzner-api.ts
New HTTP wrapper with token guard, types (server, ssh key, action), CRUD operations for SSH keys/servers, action polling and wait helpers, and robust error handling/logging.
Hetzner backend implementation
src/backends/hetzner-backend.ts
New exported HetznerBackend implementing agent run flow: S3-based inbox/outbox IPC, cloud-init VM provisioning, ephemeral server lifecycle, polling for results, process wrapper abstraction, and cleanup/shutdown logic.

Sequence Diagram

sequenceDiagram
    actor Client
    participant HB as HetznerBackend
    participant S3 as S3 Storage
    participant HAPI as Hetzner API
    participant HC as Hetzner Cloud / VM

    Client->>HB: runAgent(group, input)
    HB->>S3: upload workspace + push inbox message
    HB->>HAPI: createServer(cloud-init)
    HAPI->>HC: POST /servers
    HC-->>HAPI: server id + action id
    HAPI-->>HB: {server, action}
    HB->>HAPI: waitForServerRunning(server_id)
    loop polling
      HAPI->>HC: GET /servers/{id}
      HC-->>HAPI: status
    end
    HC->>S3: VM/agent reads inbox, runs container, writes outbox
    loop poll outbox
      HB->>S3: check outbox for results
      S3-->>HB: output files / signals
    end
    HB->>HAPI: deleteServer(server_id)
    HAPI->>HC: DELETE /servers/{id}
    HB-->>Client: return ContainerOutput
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through clouds of Hetzner light,

spun up a VM, kept IPC tight.
S3 crumbs left the path so neat,
ephemeral paws danced on bare-iron heat.
New backend blooms — a carrot-shaped byte.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Add Hetzner Cloud backend for ephemeral VM agents' clearly and accurately summarizes the main objective of the changeset, which introduces a complete Hetzner Cloud backend implementation with ephemeral VM support.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/hetzner-backend

No actionable comments were generated in the recent review. 🎉


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Fix all issues with AI agents
In `@container/agent-runner/src/ipc-mcp-stdio.ts`:
- Around line 576-577: The filter_backend enum definition (the z.enum call for
filter_backend) is missing the new 'hetzner' option—add 'hetzner' to the enum
array so users can filter by the Hetzner backend; update the
z.enum(['apple-container','docker','sprites','daytona','railway']) to include
'hetzner' and run/typecheck any code that consumes filter_backend to ensure
compatibility (e.g., places that narrow or switch on filter_backend values).

In `@src/backends/hetzner-api.ts`:
- Around line 44-52: The code calls await resp.json() unconditionally which will
throw on empty bodies (e.g. 204 No Content); modify the block in
src/backends/hetzner-api.ts to first detect empty responses (check resp.status
=== 204 or resp.headers.get('content-length') === '0' or
resp.headers.get('content-type') missing) and only call await resp.json() when a
body exists; if empty, set json to null/undefined and handle error branching so
the existing resp.ok check still throws Hetzner errors when appropriate and
successful empty responses return a sensible value (e.g., undefined) castable to
T.

In `@src/backends/hetzner-backend.ts`:
- Around line 223-244: The cloud-init generated by generateCloudInit embeds
sensitive B2/S3 credentials (B2_ACCESS_KEY_ID, B2_SECRET_ACCESS_KEY, B2_BUCKET,
B2_ENDPOINT, B2_REGION) into the user-data which can be exposed; update
generateCloudInit to stop writing secrets into the cloud-init payload (keep only
non-sensitive values like agentId and CONTAINER_IMAGE) and instead either (a)
document this limitation clearly and require callers to supply scoped/temporary
credentials, or (b) change the bootstrap flow so the container fetches
credentials at runtime from a secure mechanism (instance metadata service, a
secrets endpoint, or an ephemeral token service) rather than baking them into
runcmd; reference generateCloudInit, CONTAINER_IMAGE and the B2_* symbols when
implementing the change.
- Around line 312-320: The initialize() method currently returns early when
HETZNER_API_TOKEN or B2_ENDPOINT are missing, leaving this.s3 uninitialized and
causing runAgent() to crash when it calls syncFilesToS3(this.s3,...); either
make initialize() fail fast by throwing a descriptive Error when required env
vars are missing (so callers cannot proceed without a valid Hetzner backend) or
add a defensive guard at the start of runAgent() to check that this.s3 is
initialized (log/throw a clear error and return) before calling syncFilesToS3;
update the code paths around initialize(), runAgent(), this.s3, and
syncFilesToS3 to ensure one of these fixes is applied consistently.
- Around line 247-257: The pemToOpenSSH method is producing invalid OpenSSH keys
by naively slicing the PEM; replace its logic to parse the SPKI PEM properly
using the sshpk library: import sshpk, call sshpk.parseKey(pemPublicKey, "pem")
and then serialize with key.toString("ssh") in pemToOpenSSH so Hetzner receives
a valid OpenSSH-formatted public key (add sshpk as a dependency).
- Around line 57-331: HetznerBackend is missing the shutdown(): Promise<void>
required by AgentBackend — add an async shutdown method on the HetznerBackend
class that cleanly tears down resources: call destroyEphemeralServer for any
tracked servers (use this.servers.keys() or iterate this.servers to remove
them), await deletion of any remaining SSH keys/servers via
destroyEphemeralServer or HetznerAPI, and gracefully close or flush the
NanoClawS3 client (this.s3) if it exposes a close/cleanup method; ensure
shutdown returns a Promise<void> and does not throw on missing initialization.
🧹 Nitpick comments (4)
src/backends/hetzner-backend.ts (3)

10-11: Minor: Redundant crypto imports.

crypto is imported as default and generateKeyPairSync is imported separately. Consider consolidating.

♻️ Suggested consolidation
-import crypto from 'crypto';
-import { generateKeyPairSync } from 'crypto';
+import crypto, { generateKeyPairSync } from 'crypto';

164-169: Unused privateKey variable.

The privateKey is generated but never used. If SSH access isn't needed for ephemeral VMs, consider documenting this explicitly or removing the unnecessary generation.

♻️ If private key isn't needed
-    const { publicKey, privateKey } = generateKeyPairSync('rsa', {
+    const { publicKey } = generateKeyPairSync('rsa', {
       modulusLength: 2048,
       publicKeyEncoding: { type: 'spki', format: 'pem' },
-      privateKeyEncoding: { type: 'pkcs8', format: 'pem' },
     });

149-154: Consider returning error status on timeout.

When the agent times out, it returns lastOutput which may still have status: 'success' with result: null. This could be misleading to callers. Consider explicitly returning an error status for timeouts.

♻️ Explicit timeout error
       logger.warn({ group: groupName, timeout: configTimeout }, 'Hetzner agent timed out waiting for S3 outbox');
-      return lastOutput;
+      return { status: 'error', result: lastOutput.result, error: `Agent timed out after ${configTimeout}ms` };
src/backends/hetzner-api.ts (1)

11-13: Generic response type is confusing and imprecise.

HetznerResponse<T> with [key: string]: T doesn't accurately model Hetzner's actual response structure. Different endpoints return different shapes:

  • /ssh_keys{ ssh_key: {...} }
  • /servers{ server: {...}, action: {...} }
  • /actions{ action: {...} }

Consider using specific response types or a union to make the code more self-documenting.

Comment thread container/agent-runner/src/ipc-mcp-stdio.ts Outdated
Comment thread src/backends/hetzner-api.ts Outdated
Comment thread src/backends/hetzner-backend.ts
Comment on lines +223 to +244
private generateCloudInit(agentId: string): string {
return `#cloud-config
package_update: true
package_upgrade: true

packages:
- docker.io
- docker-compose

runcmd:
- systemctl start docker
- systemctl enable docker
- docker pull ${CONTAINER_IMAGE}
- docker run -d --name nanoclaw-agent \\
-e NANOCLAW_S3_ENDPOINT=${B2_ENDPOINT} \\
-e NANOCLAW_S3_REGION=${B2_REGION} \\
-e NANOCLAW_S3_ACCESS_KEY_ID=${B2_ACCESS_KEY_ID} \\
-e NANOCLAW_S3_SECRET_ACCESS_KEY=${B2_SECRET_ACCESS_KEY} \\
-e NANOCLAW_S3_BUCKET=${B2_BUCKET} \\
-e NANOCLAW_AGENT_ID=${agentId} \\
${CONTAINER_IMAGE}
`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Security consideration: Credentials in cloud-init user-data.

S3/B2 credentials (access key ID and secret) are embedded in the cloud-init script which may be visible in:

  • Hetzner Cloud console (server details)
  • VM logs (/var/log/cloud-init.log)
  • Instance metadata endpoints

For ephemeral short-lived VMs this risk may be acceptable, but consider documenting this limitation or exploring alternatives like:

  • Using scoped/temporary credentials with limited permissions
  • Hetzner Cloud's metadata service with encrypted user-data
🤖 Prompt for AI Agents
In `@src/backends/hetzner-backend.ts` around lines 223 - 244, The cloud-init
generated by generateCloudInit embeds sensitive B2/S3 credentials
(B2_ACCESS_KEY_ID, B2_SECRET_ACCESS_KEY, B2_BUCKET, B2_ENDPOINT, B2_REGION) into
the user-data which can be exposed; update generateCloudInit to stop writing
secrets into the cloud-init payload (keep only non-sensitive values like agentId
and CONTAINER_IMAGE) and instead either (a) document this limitation clearly and
require callers to supply scoped/temporary credentials, or (b) change the
bootstrap flow so the container fetches credentials at runtime from a secure
mechanism (instance metadata service, a secrets endpoint, or an ephemeral token
service) rather than baking them into runcmd; reference generateCloudInit,
CONTAINER_IMAGE and the B2_* symbols when implementing the change.

Comment thread src/backends/hetzner-backend.ts Outdated
Comment thread src/backends/hetzner-backend.ts
@Peyton-Spencer
Copy link
Copy Markdown
Author

Merge Conflicts Resolved ✅

Successfully merged main into feat/hetzner-backend. All conflicts are resolved.

The Hetzner backend implementation looks solid! Key strengths:

Architecture ✨

  • Ephemeral VMs: Cost-effective hourly billing (~€0.006/hr for cpx11)
  • S3-based I/O: Same proven pattern as Railway backend
  • Cloud-init bootstrap: Clean Docker container deployment
  • Full lifecycle management: SSH keys + server creation/deletion

Implementation Quality

  • Comprehensive error handling
  • Proper resource cleanup (SSH keys deleted after VM destruction)
  • Polling with timeouts for async operations
  • Extensive logging for debuggability

Next Steps

  • Test full agent run with real Hetzner account
  • Verify concurrent agent execution (multiple VMs)
  • Confirm cleanup on error paths
  • Document cost comparison vs. Sprites/Railway

This provides a great middle ground between local execution and full SaaS platforms. Ready for testing! 🚀

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@src/backends/hetzner-backend.ts`:
- Around line 163-189: createEphemeralServer can orphan a VM if
HetznerAPI.createServer succeeds but HetznerAPI.waitForAction or
HetznerAPI.waitForServerRunning throws; wrap the post-create waits in a
try/catch/finally and on any failure perform a best‑effort cleanup by calling
the Hetzner delete method (e.g., HetznerAPI.deleteServer(server.id)) and log the
cleanup attempt with logger.error/info including serverId and error details;
ensure the function rethrows the original error after cleanup so callers still
see the failure.
- Around line 164-245: The code hard-codes the project name "nanoclaw" in
server/container identifiers (serverName in the ephemeral server creation and
string literals inside generateCloudInit); make this configurable by introducing
an APP_NAME (or similar) config/env var with a neutral default (e.g., "app") and
replace occurrences of the literal "nanoclaw" used in serverName creation, the
ssh-key comment, container name, and any mounted paths or labels inside
generateCloudInit and related logging; ensure serverName =
`${appName}-${agentId}-${Date.now()}` and all template strings inside
generateCloudInit interpolate the appName variable instead of the hard-coded
value, preserving existing behavior when the env/config value is absent.

Comment thread src/backends/hetzner-backend.ts
Comment thread src/backends/hetzner-backend.ts Outdated
Comment on lines +164 to +245
const serverName = `nanoclaw-${agentId}-${Date.now()}`;

// No host-side SSH key needed — VMs are fully managed via cloud-init + S3.
// If the agent needs git SSH keys, cloud-init generates them on the VM
// and the agent can share the pubkey back via S3 outbox.
const userData = this.generateCloudInit(agentId);

const { server, action } = await HetznerAPI.createServer(
serverName,
HETZNER_SERVER_TYPE,
HETZNER_IMAGE,
HETZNER_LOCATION,
[], // No SSH keys — ephemeral VM, no SSH access needed
userData,
);

await HetznerAPI.waitForAction(action.id);
await HetznerAPI.waitForServerRunning(server.id);

logger.info(
{ serverId: server.id, serverName, ip: server.public_net.ipv4.ip },
'Hetzner ephemeral server ready',
);

return { serverId: server.id };
}

private async destroyEphemeralServer(agentId: string): Promise<void> {
const serverCtx = this.servers.get(agentId);
if (!serverCtx) {
logger.warn({ agentId }, 'No Hetzner server context found to destroy');
return;
}

try {
await HetznerAPI.deleteServer(serverCtx.serverId);
logger.info({ serverId: serverCtx.serverId }, 'Destroyed Hetzner ephemeral server');
} catch (err) {
logger.warn({ serverId: serverCtx.serverId, error: err }, 'Failed to destroy Hetzner server');
} finally {
this.servers.delete(agentId);
}
}

/**
* Generate cloud-init user-data for ephemeral Hetzner VMs.
*
* The VM generates its own SSH key for git operations via ssh-keygen.
* The agent can share its pubkey back to the user via S3 outbox.
*
* NOTE: B2/S3 credentials are embedded in the cloud-init script. This is acceptable
* for ephemeral VMs that are destroyed after each agent run, but be aware that:
* - Credentials may be visible in Hetzner Cloud console (server details)
* - Credentials persist in VM logs (/var/log/cloud-init.log) until VM destruction
* For higher-security deployments, consider using scoped/temporary B2 application keys
* with limited bucket permissions and short TTLs.
*/
private generateCloudInit(agentId: string): string {
return `#cloud-config
package_update: true
package_upgrade: true

packages:
- docker.io
- docker-compose

runcmd:
- systemctl start docker
- systemctl enable docker
- ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519 -N "" -C "nanoclaw-${agentId}"
- ssh-keyscan github.com >> /root/.ssh/known_hosts 2>/dev/null
- docker pull ${CONTAINER_IMAGE}
- docker run -d --name nanoclaw-agent \\
-v /root/.ssh:/home/bun/.ssh:ro \\
-e NANOCLAW_S3_ENDPOINT=${B2_ENDPOINT} \\
-e NANOCLAW_S3_REGION=${B2_REGION} \\
-e NANOCLAW_S3_ACCESS_KEY_ID=${B2_ACCESS_KEY_ID} \\
-e NANOCLAW_S3_SECRET_ACCESS_KEY=${B2_SECRET_ACCESS_KEY} \\
-e NANOCLAW_S3_BUCKET=${B2_BUCKET} \\
-e NANOCLAW_AGENT_ID=${agentId} \\
${CONTAINER_IMAGE}
`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove hard-coded project name from server/container identifiers.

Project-specific identifiers (nanoclaw-*) should be configurable rather than embedded in source. Consider using a generic env var (e.g., APP_NAME) with a neutral default.

🛠️ Suggested change (make app name configurable)
-    const serverName = `nanoclaw-${agentId}-${Date.now()}`;
+    const appName = process.env.APP_NAME || 'agent';
+    const serverName = `${appName}-${agentId}-${Date.now()}`;
-  - ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519 -N "" -C "nanoclaw-${agentId}"
+  - ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519 -N "" -C "${appName}-${agentId}"
...
-  - docker run -d --name nanoclaw-agent \\
+  - docker run -d --name ${appName}-agent \\

As per coding guidelines, "NEVER edit checked-in templates or source files with user-specific content like project names, directory paths, personal preferences, or org-specific details".

🤖 Prompt for AI Agents
In `@src/backends/hetzner-backend.ts` around lines 164 - 245, The code hard-codes
the project name "nanoclaw" in server/container identifiers (serverName in the
ephemeral server creation and string literals inside generateCloudInit); make
this configurable by introducing an APP_NAME (or similar) config/env var with a
neutral default (e.g., "app") and replace occurrences of the literal "nanoclaw"
used in serverName creation, the ssh-key comment, container name, and any
mounted paths or labels inside generateCloudInit and related logging; ensure
serverName = `${appName}-${agentId}-${Date.now()}` and all template strings
inside generateCloudInit interpolate the appName variable instead of the
hard-coded value, preserving existing behavior when the env/config value is
absent.

nanoclaw and others added 3 commits February 13, 2026 22:16
Add Hetzner Cloud backend to support ephemeral VM-based agent execution with S3-based I/O.

**New files:**
- `src/backends/hetzner-api.ts` - Hetzner Cloud API wrapper with server and SSH key lifecycle management
- `src/backends/hetzner-backend.ts` - Hetzner backend implementation with S3 inbox/outbox pattern

**Changes:**
- Add 'hetzner' to BackendType union in `src/backends/types.ts`
- Register HetznerBackend in `src/backends/index.ts`
- Add Hetzner config vars to `src/config.ts`:
  - HETZNER_API_TOKEN
  - HETZNER_LOCATION (default: ash - Ashburn, US)
  - HETZNER_SERVER_TYPE (default: cpx11 - 2 vCPU, 2GB RAM)
  - HETZNER_IMAGE (default: ubuntu-22.04)

**Architecture:**
- Ephemeral VMs: Create on-demand, destroy after each agent run
- S3-based I/O: Host writes to inbox, agent writes to outbox (same pattern as Railway)
- Cloud-init bootstrap: Installs Docker and runs nanoclaw container on VM startup
- SSH key management: Generates keypair, uploads to Hetzner, includes in server creation
- Cost-effective: Hourly billing (~€0.006/hr for cpx11), VMs destroyed immediately after use

**Benefits:**
- Lower cost than persistent cloud backends (pay per hour used, not per month)
- Simple infrastructure (standard VMs, no SaaS abstraction)
- US datacenter support (Ashburn, Hillsboro)
- Scales easily with different server types (cpx11, cpx21, cpx31, etc.)

**Usage:**
Set agent.backend = 'hetzner' and configure HETZNER_API_TOKEN + B2 credentials.

**Test plan:**
- [ ] Verify TypeScript compilation
- [ ] Test server creation/deletion API calls
- [ ] Test SSH key lifecycle
- [ ] Test full agent run with S3 I/O
- [ ] Verify VM destruction after completion
- [ ] Test with multiple concurrent agents

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove host-side SSH key generation entirely — VMs are ephemeral, no SSH
  access needed. Cloud-init runs ssh-keygen on the VM for git operations,
  mounts keys into the container. Agent can share pubkey via S3 outbox.
- Handle HTTP 204 empty responses in Hetzner API (DELETE operations)
- Add runAgent() guard for uninitialized S3 client
- Add shutdown() method to clean up ephemeral servers
- Add 'hetzner' to filter_backend enum in agent MCP tools
- Return explicit timeout error instead of misleading success
- Replace generic HetznerResponse<T> with specific response types
- Document cloud-init credential exposure for ephemeral VMs
- Fix pre-existing async translateJid in WhatsApp reaction handler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Wrap waitForAction/waitForServerRunning in try/catch with best-effort
  server deletion to prevent orphaned VMs on startup failure
- Replace hard-coded "nanoclaw" in server/container identifiers with
  configurable ASSISTANT_NAME

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@src/backends/hetzner-api.ts`:
- Around line 1-4: The file-level JSDoc header includes the project-specific
name "NanoClaw"; remove that name and make the header generic by changing the
comment that currently reads "Hetzner Cloud API wrapper for NanoClaw. Provides
lifecycle management for Hetzner Cloud servers (VMs)." to a neutral form such as
"Hetzner Cloud API wrapper. Provides lifecycle management for Hetzner Cloud
servers (VMs)." — update the top-of-file header/JSDoc comment so it no longer
contains any project-specific identifiers.
- Around line 25-54: The hetznerApi function currently calls fetch without a
timeout; wrap the request in an AbortController with a configurable timeout
(e.g., constant or env var) so requests can be aborted on network hangs: create
an AbortController, set a timer to call controller.abort() after the timeout,
pass controller.signal into fetch (options.signal), and clear the timer after
fetch completes; handle the abort by catching the thrown error (e.g., checking
for DOMException/AbortError) and rethrowing or translating to a clear timeout
error. Ensure changes are applied inside hetznerApi (use the existing options
RequestInit, add signal) and clean up the timeout to avoid leaks.

In `@src/backends/hetzner-backend.ts`:
- Around line 314-330: The initialize() method currently only checks B2_ENDPOINT
before instantiating NanoClawS3, so missing B2 credentials cause later errors;
add explicit checks for B2_ACCESS_KEY_ID, B2_SECRET_ACCESS_KEY, and B2_BUCKET
before creating the NanoClawS3 client, log a clear warning (similar to the
existing B2_ENDPOINT warning) and return early if any are missing so NanoClawS3
is not constructed with undefined credentials.
- Around line 164-167: The server name built in createEphemeralServer uses
ASSISTANT_NAME.toLowerCase() directly which can include invalid characters for
Hetzner; sanitize ASSISTANT_NAME before composing serverName by normalizing to
lowercase, replacing any characters not allowed by RFC1123 (allow only a-z, 0-9,
hyphen, dot), collapsing multiple invalid chars to single hyphens, trimming
leading/trailing non-alphanumeric/hyphen/dot, and ensuring length limits
(truncate if needed) before creating serverName =
`${sanitizedAppName}-${agentId}-${Date.now()}`; update createEphemeralServer to
use the sanitizedAppName and add a small unit/validation helper (e.g.,
sanitizeServerName or normalizeAppName) to centralize the logic and reuse where
needed.

Comment thread src/backends/hetzner-api.ts
Comment on lines +25 to +54
async function hetznerApi<T>(
method: string,
endpoint: string,
body?: unknown,
): Promise<T> {
if (!HETZNER_API_TOKEN) {
throw new Error('HETZNER_API_TOKEN not set');
}

const url = `${HETZNER_API_URL}${endpoint}`;
const options: RequestInit = {
method,
headers: {
'Authorization': `Bearer ${HETZNER_API_TOKEN}`,
'Content-Type': 'application/json',
},
};

if (body) {
options.body = JSON.stringify(body);
}

const resp = await fetch(url, options);

// Handle empty responses (204 No Content, e.g. DELETE operations)
let json: unknown = {};
const contentLength = resp.headers.get('content-length');
if (resp.status !== 204 && contentLength !== '0') {
json = await resp.json();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add request timeouts to avoid hanging Hetzner API calls.
fetch without a timeout can stall indefinitely on network issues and block backend workflows.

🛠️ Suggested fix (AbortController timeout)
-  const resp = await fetch(url, options);
+  const controller = new AbortController();
+  const timeoutId = setTimeout(() => controller.abort(), 30000);
+  let resp: Response;
+  try {
+    resp = await fetch(url, { ...options, signal: controller.signal });
+  } finally {
+    clearTimeout(timeoutId);
+  }
🤖 Prompt for AI Agents
In `@src/backends/hetzner-api.ts` around lines 25 - 54, The hetznerApi function
currently calls fetch without a timeout; wrap the request in an AbortController
with a configurable timeout (e.g., constant or env var) so requests can be
aborted on network hangs: create an AbortController, set a timer to call
controller.abort() after the timeout, pass controller.signal into fetch
(options.signal), and clear the timer after fetch completes; handle the abort by
catching the thrown error (e.g., checking for DOMException/AbortError) and
rethrowing or translating to a clear timeout error. Ensure changes are applied
inside hetznerApi (use the existing options RequestInit, add signal) and clean
up the timeout to avoid leaks.

Comment thread src/backends/hetzner-backend.ts
Comment thread src/backends/hetzner-backend.ts
… validation

- Remove project-specific name from hetzner-api.ts header
- Add 30s AbortController timeout to all Hetzner API fetch calls
- Sanitize server names to RFC 1123 (strip invalid chars, truncate to 63)
- Validate all B2 credentials (not just endpoint) before creating S3 client
- Remove non-null assertions now that credentials are validated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Peyton-Spencer Peyton-Spencer merged commit 0bb9124 into main Feb 14, 2026
1 of 3 checks passed
@Peyton-Spencer Peyton-Spencer deleted the feat/hetzner-backend branch February 14, 2026 04:13
Peyton-Spencer pushed a commit that referenced this pull request Feb 14, 2026
…omment

Database improvements from stability audit:

1. **Transaction Support for deleteTask (MEDIUM)**
   - Wrap DELETE operations in explicit transaction
   - Ensures both child and parent deletions succeed atomically
   - Prevents partial deletion leaving orphaned task_run_logs

2. **SQL Injection Safety Documentation (HIGH)**
   - Add security comment to updateTask explaining safety assumptions
   - Document that field names are hardcoded (not user-controlled)
   - Warn future maintainers about SQL injection risks if logic changes

Impact:
- Prevents database corruption from partial task deletions
- Documents security assumptions for future code reviewers
- Hardens codebase against accidental SQL injection introduction

Related:
- Audit report: nanoclaw-stability-audit-2026-02-14.md
- Issues #3, #12 from stability audit

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Peyton-Spencer added a commit that referenced this pull request Feb 14, 2026
* fix: stability quick wins from 2026-02-14 audit

Implements three critical stability fixes identified in the audit:

1. **Unhandled Promise Rejection Handler (CRITICAL)**
   - Add process.on('unhandledRejection') to prevent crashes
   - Logs rejections instead of exiting to maintain service uptime
   - Prevents complete service outage from uncaught promise errors

2. **WhatsApp Event Listener Memory Leak (CRITICAL)**
   - Store event handlers and remove them before reconnection
   - Prevents exponential handler accumulation on reconnects
   - Fixes memory leak leading to eventual OOM crashes

3. **Group Folder Path Traversal (MEDIUM)**
   - Validate folder names with regex (alphanumeric + _ -)
   - Verify resolved paths stay within groups directory
   - Prevents malicious group registration from writing to arbitrary paths

Impact:
- Prevents process crashes from unhandled rejections
- Fixes production memory leak in WhatsApp channel
- Hardens security against path traversal attacks

Related:
- Audit report: nanoclaw-stability-audit-2026-02-14.md
- Issues #1, #4, #16 from stability audit

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add transaction support to deleteTask and SQL injection safety comment

Database improvements from stability audit:

1. **Transaction Support for deleteTask (MEDIUM)**
   - Wrap DELETE operations in explicit transaction
   - Ensures both child and parent deletions succeed atomically
   - Prevents partial deletion leaving orphaned task_run_logs

2. **SQL Injection Safety Documentation (HIGH)**
   - Add security comment to updateTask explaining safety assumptions
   - Document that field names are hardcoded (not user-controlled)
   - Warn future maintainers about SQL injection risks if logic changes

Impact:
- Prevents database corruption from partial task deletions
- Documents security assumptions for future code reviewers
- Hardens codebase against accidental SQL injection introduction

Related:
- Audit report: nanoclaw-stability-audit-2026-02-14.md
- Issues #3, #12 from stability audit

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: NanoClaw Agent <nanoclaw@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants