A comprehensive, opinionated guide for building a production-grade homelab using Proxmox VE and Kubernetes. Born from years of running a multi-node cluster at home, this guide covers everything from hardware selection to GPU passthrough for AI inference — including the hard-won lessons you won't find in official documentation.
- Infrastructure engineers who want a real lab environment at home
- DevOps/SRE practitioners looking to sharpen Kubernetes skills on bare metal
- AI enthusiasts who want local GPU inference without cloud costs
- Self-hosters ready to graduate from a single Raspberry Pi to a proper cluster
You should be comfortable with Linux, basic networking, and the command line. Prior Kubernetes experience helps but isn't required — the guide explains concepts as they come up.
We build a full production-grade homelab stack:
Hardware (Mini PCs) → Proxmox VE Cluster → Kubernetes (kubeadm)
→ Calico CNI → Longhorn Storage → MetalLB Load Balancer
→ Traefik Ingress → ArgoCD GitOps → Prometheus/Grafana Monitoring
→ Velero Backups → GPU Passthrough → AI Inference (Ollama)
This is not a "click through the GUI" tutorial. It's a guide for people who want to understand why things are configured a certain way, not just how.
| Chapter | Topic | What You'll Learn |
|---|---|---|
| 01 - Hardware Selection | Choosing the right hardware | Mini PCs vs servers, CPU/RAM/storage sizing, power budgets |
| 02 - Proxmox Setup | Proxmox VE cluster | Installation, clustering, storage pools, VM vs LXC |
| 03 - Kubernetes | K8s on Proxmox | Control plane, workers, CNI, storage, load balancing |
| 04 - GitOps | ArgoCD and CI/CD | App-of-apps, repo structure, sealed secrets, GitHub Actions |
| 05 - Monitoring | Observability stack | Prometheus, Grafana, alerting, dashboards |
| 06 - Backups | Backup strategy | Velero, PBS, Longhorn snapshots, restore testing |
| 07 - GPU Passthrough | GPU for AI inference | PCI passthrough, NVIDIA drivers, Ollama on K8s |
| 08 - Gotchas | Lessons learned | The stuff that cost hours of debugging |
| Component | Recommendation | Est. Cost |
|---|---|---|
| Mini PC | AMD Ryzen 5 (5500U/5600U), 6C/12T | $180-220 |
| RAM | 32 GB DDR4 SO-DIMM (2x16 GB) | $60-80 |
| NVMe | 1 TB NVMe Gen3 | $60-70 |
| Network | Built-in 1 GbE (sufficient for single node) | $0 |
| UPS | APC BE425M (basic surge + battery) | $50-60 |
| Total | ~$350-430 |
Good for: learning Proxmox, running a few VMs/LXCs, single-node K8s (k3s or kubeadm). Won't do real clustering but teaches the fundamentals.
| Component | Recommendation | Est. Cost |
|---|---|---|
| 3x Mini PCs | AMD Ryzen 7 (5700U/5800H), 8C/16T each | $550-700 |
| RAM | 64 GB per node (2x32 GB DDR4) | $300-400 |
| NVMe | 1 TB NVMe per node | $180-210 |
| Network | 5-port unmanaged gigabit switch | $20-30 |
| UPS | CyberPower CP1500AVRLCD | $150-180 |
| Total | ~$1,200-1,520 |
Good for: full Proxmox cluster with quorum, 3-node K8s (1 control + 2 workers or combined), running 20-30 containers comfortably. This is the sweet spot for most homelabbers.
| Component | Recommendation | Est. Cost |
|---|---|---|
| 5-7x Mini PCs | Mix of Ryzen 7/9 models | $1,000-2,000 |
| RAM | 32-128 GB per node | $600-1,200 |
| NVMe | 1-2 TB per node | $300-700 |
| GPU Node | Desktop with PCIe slot + used Tesla P40/P100 | $400-800 |
| Network | Managed switch + optional 2.5/10 GbE | $100-300 |
| UPS | Rack UPS, 1500VA+ | $200-300 |
| NAS/Backup | Dedicated NAS or PBS node with large drives | $200-500 |
| Total | ~$2,800-5,800 |
Good for: production-grade homelab with proper HA, dedicated GPU inference, comprehensive monitoring, and a setup that mirrors real enterprise infrastructure.
- Cattle, not pets — VMs and containers should be reproducible. If you can't rebuild it from config, it's technical debt.
- GitOps everything — the Git repo is the source of truth. Manual
kubectl applyis for emergencies only. - Security by default — unprivileged LXCs, RBAC everywhere, secrets encrypted at rest, no wildcard admin tokens.
- Document the pain — the gotchas chapter exists because every "obvious" fix cost someone hours. Write it down.
- Budget-conscious — used enterprise hardware and mini PCs beat new rack servers for homelab use. The cloud bill for equivalent compute would be $500+/month.
Found an error? Have a gotcha to add? PRs are welcome. Please keep the tone practical and include specific commands/configs where applicable.
MIT License — Use this however you want. Attribution appreciated but not required.