Skip to content

LoRA fine-tuning example#120

Draft
pohlchri wants to merge 1 commit intomainfrom
aks-finetuning-demo
Draft

LoRA fine-tuning example#120
pohlchri wants to merge 1 commit intomainfrom
aks-finetuning-demo

Conversation

@pohlchri
Copy link

Description

Adds an end-to-end example for LoRA fine-tuning on AKS with 2 NCH100 GPUs, including:

  • Automated Azure infrastructure setup (AKS, ACR, Storage, Managed Identity)
  • GPU monitoring with DCGM metrics and Grafana dashboards
  • Fine-tuning workflow using LoRA/PEFT
  • Side-by-side inference comparison (fine-tuned vs baseline) via Web UI

Use Case

Fine-tune LLMs to perform internal reasoning in a specific language (e.g., German for audit compliance) while accepting input/output in any language.

Structure

  • docker/ - Container images for training and inference
  • k8s/ - Kubernetes manifests (Jobs, Deployments)
  • scripts/ - Automated deployment scripts
  • src/ - Python training and inference code

Prerequisites

  • Azure subscription with GPU quota for Standard_NC80adis_H100_v5

@pohlchri pohlchri force-pushed the aks-finetuning-demo branch 2 times, most recently from 686abc9 to 3d52738 Compare February 20, 2026 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant