Skip to content

feat: exo-cli — management CLI for controlling a running exo cluster #1728

@ecohash-co

Description

@ecohash-co

Motivation

The exo command starts a node — it's a long-running daemon. There's currently no CLI tool for managing a running cluster: checking status, loading/unloading models, monitoring downloads, etc.

This is the same pattern as kubectl (manages k8s) vs kubelet (runs a node), or obsidian-cli vs the Obsidian desktop app. The daemon and the management tool are fundamentally different entrypoints.

Proposal

Add exo-cli as a separate entrypoint that talks to a running exo cluster over HTTP:

exo-cli status                        # Cluster overview (nodes, models, memory)
exo-cli health                        # Quick liveness check
exo-cli nodes                         # List all nodes
exo-cli nodes <id>                    # Single node detail
exo-cli models                        # Loaded models + downloads
exo-cli models status <name>          # Poll model readiness
exo-cli models load <name>            # Load model (auto-placement)
exo-cli models load --wait <name>     # Load + block until ready
exo-cli models unload <name>          # Unload by name
exo-cli models swap <old> <new>       # Atomic unload-then-load
exo-cli models swap --wait <old> <new> # Swap + block until new model ready

Key features

  • --wait flag — blocks until async operations complete (model loaded, swap finished). Eliminates polling loops in scripts.
  • --json flag — machine-readable output for piping into jq or other tools
  • --host/--port — connect to any node in the cluster (defaults to localhost:52415)
  • Human-friendly table output by default, JSON when scripting

Example: day/night model rotation cron

#!/bin/bash
# 11pm: swap to large model for overnight batch work
exo-cli models swap --wait "Qwen3-30B-A3B-4bit" "mlx-community/MiniMax-M1-80B-A45B-4bit"

# Run batch inference...
curl -X POST http://localhost:52415/v1/chat/completions -d '{...}'

# 6am: swap back to fast model
exo-cli models swap --wait "MiniMax-M1-80B-A45B-4bit" "mlx-community/Qwen3-30B-A3B-4bit"

Implementation

The CLI would be a thin HTTP client against the /v1/cluster/* endpoints proposed in #1727. Separate entrypoint in pyproject.toml:

[project.scripts]
exo = "exo.main:main"
exo-cli = "exo.cli.main:main"

No new dependencies beyond what exo already has (httpx or urllib for HTTP, argparse for CLI parsing).

Relationship to #1727

This depends on the cluster management API endpoints in #1727. The CLI is purely a client — it doesn't touch any server-side code. The two PRs can be reviewed independently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions