Skip to content

mmontes11/k8s-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

428 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 k8s-ai

Tenant repository bootstrapped by k8s-infrastructure that contains the manifests for AI related applications

Overview

This repository manages AI workloads on Kubernetes using GitOps with Flux CD. It includes deployments for LLM inference services, web UIs, and model serving infrastructure.

Applications

Open WebUI

  • Path: ./apps/open-webui
  • Type: HelmRelease (ollama-webui chart)
  • Description: Web interface for interacting with LLMs
  • Features:
    • Persistent storage via PVC
    • Integration with Ollama backend
    • Model access control bypass enabled

ComfyUI

  • Path: ./apps/comfyui
  • Type: Native Kubernetes resources
  • Description: Graph-based interface for Stable Diffusion
  • Features:
    • Persistent volume for model caching
    • Replication source/destination for data synchronization
    • RESTic backup support

opencode

  • Path: ./apps/opencode
  • Type: Native Kubernetes resources
  • Description: Coding agent and AI workspace for interactive development
  • Features:
    • NVIDIA GPU support for accelerated model training and inference
    • Persistent storage (100Gi PVC)
    • Pre-configured development environment with tools
    • RESTic backup support
    • Replication source/destination for data synchronization
    • Integration with GitHub, HuggingFace, and n8n via tokens

Infrastructure

Model Serving

Ollama

  • Path: ./infrastructure/ollama
  • Description: Lightweight LLM inference server
  • Features:
    • Native GPU support
    • Simple HTTP API
    • Model caching

llama.cpp

  • Path: ./infrastructure/llamacpp
  • Description: High-performance C/C++ inference engine optimized for CPU and GPU
  • Features:
    • Qwen3.5-35B model support with full precision
    • 256k context window for agentic AI workflows
    • StatefulSet deployment with persistent storage
    • Prometheus ServiceMonitor integration
    • Ingress routing via HTTPRoute

vLLM

  • Path: ./infrastructure/vllm
  • Description: High-throughput LLM serving with PagedAttention
  • Use Case: Production workloads requiring high concurrency

KServe

  • Path: ./infrastructure/kserve
  • Example: ./examples/llminferenceservice.yaml
  • Description: Kubernetes-native ML serving platform
  • Features:
    • LLMInferenceService CRD
    • Custom model templates

MCP Servers

  • MCP Kubernetes: Kubernetes model context protocol server
  • MCP Grafana: Grafana monitoring integration
  • MCP GitHub: GitHub API integration
  • MCP Photoprism: Photo management (mmontes & xiaowen)

Architecture

β”œβ”€β”€ apps/                    # Application deployments
β”‚   β”œβ”€β”€ comfyui/            # ComfyUI deployment
β”‚   β”œβ”€β”€ n8n/                # n8n workflow automation
β”‚   β”œβ”€β”€ opencode/           # opencode AI development workspace
β”‚   └── open-webui/         # Open WebUI deployment
β”œβ”€β”€ clusters/               # Cluster-specific configurations
β”‚   └── homelab/
β”‚       β”œβ”€β”€ apps.yaml       # Application Kustomizations
β”‚       β”œβ”€β”€ infrastructure.yaml
β”‚       └── namespaces.yaml
β”œβ”€β”€ examples/               # Example configurations
β”‚   └── llminferenceservice.yaml
└── infrastructure/         # Shared infrastructure
    β”œβ”€β”€ kserve/             # KServe ML serving
    β”œβ”€β”€ vllm/               # vLLM serving engine
    β”œβ”€β”€ llamacpp/           # llama.cpp inference engine
    β”œβ”€β”€ lws/                # LeaderWorkerSet
    β”œβ”€β”€ ollama/             # Ollama LLM backend
    └── mcp-*/             # MCP server integrations

AI Benchmarks

LLM benchmarks using llama.cpp on Kubernetes: mmontes11/llm-bench

Key benchmarks (NVIDIA RTX PRO 4000 Blackwell, 23.5 GiB VRAM):

  • Qwen3.5-35B-A3B (Q4_K_Medium): 2995 t/s @ 2048 prompts, 84.87 t/s token generation
  • GPT-OSS 20B (MXFP4 MoE): 2704 t/s @ 2048 prompts, 109.68 t/s token generation

License

MIT

About

🧠 Tenant repository bootstrapped by k8s-infrastructure that contains the manifests for AI related applications

Topics

Resources

License

Stars

Watchers

Forks

Packages