kubectl kaito deploy

Deploy a Kaito workspace for AI model inference or fine-tuning.

Synopsis

Deploy creates a new Kaito workspace resource for AI model deployment. This command supports both inference and fine-tuning scenarios:

Inference: Deploy models for real-time inference with OpenAI-compatible APIs
Fine-tuning: Fine-tune existing models with your own datasets using methods like QLoRA

The workspace will automatically provision the required GPU resources and deploy the specified model according to Kaito's preset configurations.

Usage

kaito deploy [flags]

Flags

Required Flags

Flag	Type	Description
`--workspace-name string`	string	Name of the workspace to create (required)
`--model string`	string	Model name to deploy (required)
`--instance-type string`	string	GPU instance type (e.g., Standard_NC6s_v3)

Optional Flags

Flag	Type	Default	Description

Inference-Specific Flags

These flags can only be used when --tuning is not enabled (default inference mode):

Flag	Type	Description
`--model-access-secret string`	string	Secret for private model access
`--adapters strings`	[]string	Model adapters to load
`--inference-config string`	string	Custom inference configuration (either a YAML file path or ConfigMap name)

Fine-tuning Flags

These flags can only be used when --tuning is enabled:

Flag	Type	Default	Description
`--tuning`	bool	false	Enable fine-tuning mode
`--tuning-method string`	string	qlora	Fine-tuning method (qlora, lora)
`--model-image string`	string		Custom image for the model preset
`--input-urls strings`	[]string		URLs to training data
`--input-pvc string`	string		PVC containing training data
`--output-image string`	string		Output image for fine-tuned model
`--output-pvc string`	string		PVC for output storage
`--output-image-secret string`	string		Secret for pushing output image
`--tuning-config string`	string		Custom tuning configuration

Note: You cannot mix inference and tuning flags. When --tuning is enabled, inference-specific flags (--model-access-secret, --adapters, --inference-config) cannot be used. When --tuning is not enabled, tuning-specific flags cannot be used.

Examples

Basic Inference Deployment

# Deploy Llama-3.1 8b for inference
kubectl kaito deploy --workspace-name llama-workspace \
--model llama-3.1-8b-instruct \
--model-access-secret hf-token

Inference with Custom Configuration

# Deploy with custom inference configuration from a YAML file
kubectl kaito deploy \
  --workspace-name llama-workspace \
  --model llama-3.1-8b-instruct \
  --model-access-secret hf-token \
  --inference-config config.yaml

# Example config.yaml:
# vllm:
#   cpu-offload-gb: 0
#   gpu-memory-utilization: 0.95
#   swap-space: 4
#   max-model-len: 16384

# Deploy with custom inference configuration from an existing ConfigMap
kubectl kaito deploy \
  --workspace-name llama-workspace \
  --model llama-3.1-8b-instruct \
  --model-access-secret hf-token \
  --inference-config my-config

Deployment with Specific Instance Type

# Deploy with specific instance type and count  
kubectl kaito deploy \
  --workspace-name phi-workspace \
  --model phi-3.5-mini-instruct \
  --instance-type Standard_NC6s_v3 \
  --count 2

Fine-tuning Deployment

# Deploy for fine-tuning with QLoRA using URLs and custom model image
kubectl kaito deploy \
  --workspace-name tune-phi \
  --model phi-3.5-mini-instruct \
  --tuning \
  --tuning-method qlora \
  --model-image myregistry/phi-base:latest \
  --input-urls "https://example.com/data.parquet" \
  --output-image myregistry/phi-finetuned:latest

# Deploy for fine-tuning with PVC storage and custom model image
kubectl kaito deploy \
  --workspace-name tune-llama \
  --model llama-3.1-8b-instruct \
  --tuning \
  --model-image myregistry/llama-base:latest \
  --input-pvc training-data \
  --output-pvc model-output

External Access Deployment

# Deploy with load balancer for external access
kubectl kaito deploy \
  --workspace-name public-llama \
  --model llama-3.1-8b-instruct \
  --enable-load-balancer

Dry Run

# Preview what would be created
kubectl kaito deploy \
  --workspace-name test-workspace \
  --model phi-3.5-mini-instruct \
  --dry-run

Node Selector Deployment

# Deploy on specific nodes
kubectl kaito deploy \
  --workspace-name selective-workspace \
  --model llama-2-7b \
  --node-selector gpu-type=A100,zone=us-west-2a

LoadBalancer Deployment

# Deploy with LoadBalancer for external access
kubectl kaito deploy \
  --workspace-name public-llama \
  --model llama-3.1-8b-instruct \
  --enable-load-balancer

Important Notes:

The --enable-load-balancer flag adds the kaito.sh/enable-lb: "true" annotation to the workspace
This instructs the Kaito operator to create a LoadBalancer service for external access.
Only works with inference workspaces (cannot be used with --tuning)
May incur additional cloud provider costs for the LoadBalancer service

Inference Configuration Notes:

When providing a YAML file for --inference-config, the plugin will:
1. Create a ConfigMap named {workspace-name}-inference-config in the same namespace
2. Store the YAML file contents in the ConfigMap
3. Reference this ConfigMap in the workspace configuration
If a ConfigMap with the same name already exists, it will be updated with the new configuration
When providing an existing ConfigMap name, the plugin will reference it directly in the workspace configuration

Required Parameters by Mode

Inference Mode (default)

Required: --workspace-name, --model
Optional: --model-access-secret, --adapters, --inference-config, --instance-type, --count, etc.

Tuning Mode (`--tuning` enabled)

Required: --workspace-name, --model, --tuning
Required (one of): --input-urls OR --input-pvc
Required (one of): --output-image OR --output-pvc
Optional: --tuning-method, --output-image-secret, --tuning-config, --instance-type, --count, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubectl kaito deploy

Synopsis

Usage

Flags

Required Flags

Optional Flags

Inference-Specific Flags

Fine-tuning Flags

Examples

Basic Inference Deployment

Inference with Custom Configuration

Deployment with Specific Instance Type

Fine-tuning Deployment

External Access Deployment

Dry Run

Node Selector Deployment

LoadBalancer Deployment

Required Parameters by Mode

Inference Mode (default)

Tuning Mode (`--tuning` enabled)

FilesExpand file tree

deploy.md

Latest commit

History

deploy.md

File metadata and controls

kubectl kaito deploy

Synopsis

Usage

Flags

Required Flags

Optional Flags

Inference-Specific Flags

Fine-tuning Flags

Examples

Basic Inference Deployment

Inference with Custom Configuration

Deployment with Specific Instance Type

Fine-tuning Deployment

External Access Deployment

Dry Run

Node Selector Deployment

LoadBalancer Deployment

Required Parameters by Mode

Inference Mode (default)

Tuning Mode (--tuning enabled)

Tuning Mode (`--tuning` enabled)