Skip to content

Commit 9b280dc

Browse files
mudlerlocalai-bot
authored andcommitted
feat(mlx-distributed): add new MLX-distributed backend (mudler#8801)
* feat(mlx-distributed): add new MLX-distributed backend Add new MLX distributed backend with support for both TCP and RDMA for model sharding. This implementation ties in the discovery implementation already in place, and re-uses the same P2P mechanism for the TCP MLX-distributed inferencing. The Auto-parallel implementation is inspired by Exo's ones (who have been added to acknowledgement for the great work!) Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * expose a CLI to facilitate backend starting Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat: make manual rank0 configurable via model configs Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Add missing features from mlx backend Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * Apply suggestion from @mudler Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: Ettore Di Giacinto <mudler@users.noreply.github.com>
1 parent 44553f8 commit 9b280dc

36 files changed

Lines changed: 2016 additions & 73 deletions

.github/workflows/backend.yml

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,19 @@ jobs:
157157
dockerfile: "./backend/Dockerfile.python"
158158
context: "./"
159159
ubuntu-version: '2404'
160+
- build-type: ''
161+
cuda-major-version: ""
162+
cuda-minor-version: ""
163+
platforms: 'linux/amd64'
164+
tag-latest: 'auto'
165+
tag-suffix: '-cpu-mlx-distributed'
166+
runs-on: 'ubuntu-latest'
167+
base-image: "ubuntu:24.04"
168+
skip-drivers: 'true'
169+
backend: "mlx-distributed"
170+
dockerfile: "./backend/Dockerfile.python"
171+
context: "./"
172+
ubuntu-version: '2404'
160173
# CUDA 12 builds
161174
- build-type: 'cublas'
162175
cuda-major-version: "12"
@@ -470,6 +483,19 @@ jobs:
470483
dockerfile: "./backend/Dockerfile.python"
471484
context: "./"
472485
ubuntu-version: '2404'
486+
- build-type: 'cublas'
487+
cuda-major-version: "12"
488+
cuda-minor-version: "8"
489+
platforms: 'linux/amd64'
490+
tag-latest: 'auto'
491+
tag-suffix: '-gpu-nvidia-cuda-12-mlx-distributed'
492+
runs-on: 'ubuntu-latest'
493+
base-image: "ubuntu:24.04"
494+
skip-drivers: 'false'
495+
backend: "mlx-distributed"
496+
dockerfile: "./backend/Dockerfile.python"
497+
context: "./"
498+
ubuntu-version: '2404'
473499
- build-type: 'cublas'
474500
cuda-major-version: "12"
475501
cuda-minor-version: "8"
@@ -822,6 +848,19 @@ jobs:
822848
backend: "mlx-audio"
823849
dockerfile: "./backend/Dockerfile.python"
824850
context: "./"
851+
- build-type: 'l4t'
852+
cuda-major-version: "13"
853+
cuda-minor-version: "0"
854+
platforms: 'linux/arm64'
855+
tag-latest: 'auto'
856+
tag-suffix: '-nvidia-l4t-cuda-13-arm64-mlx-distributed'
857+
runs-on: 'ubuntu-24.04-arm'
858+
base-image: "ubuntu:24.04"
859+
skip-drivers: 'false'
860+
ubuntu-version: '2404'
861+
backend: "mlx-distributed"
862+
dockerfile: "./backend/Dockerfile.python"
863+
context: "./"
825864
- build-type: 'cublas'
826865
cuda-major-version: "13"
827866
cuda-minor-version: "0"
@@ -926,6 +965,19 @@ jobs:
926965
dockerfile: "./backend/Dockerfile.python"
927966
context: "./"
928967
ubuntu-version: '2404'
968+
- build-type: 'cublas'
969+
cuda-major-version: "13"
970+
cuda-minor-version: "0"
971+
platforms: 'linux/amd64'
972+
tag-latest: 'auto'
973+
tag-suffix: '-gpu-nvidia-cuda-13-mlx-distributed'
974+
runs-on: 'ubuntu-latest'
975+
base-image: "ubuntu:24.04"
976+
skip-drivers: 'false'
977+
backend: "mlx-distributed"
978+
dockerfile: "./backend/Dockerfile.python"
979+
context: "./"
980+
ubuntu-version: '2404'
929981
- build-type: 'cublas'
930982
cuda-major-version: "13"
931983
cuda-minor-version: "0"
@@ -1423,6 +1475,19 @@ jobs:
14231475
dockerfile: "./backend/Dockerfile.python"
14241476
context: "./"
14251477
ubuntu-version: '2204'
1478+
- build-type: 'l4t'
1479+
cuda-major-version: "12"
1480+
cuda-minor-version: "0"
1481+
platforms: 'linux/arm64'
1482+
tag-latest: 'auto'
1483+
tag-suffix: '-nvidia-l4t-mlx-distributed'
1484+
runs-on: 'ubuntu-24.04-arm'
1485+
base-image: "nvcr.io/nvidia/l4t-jetpack:r36.4.0"
1486+
skip-drivers: 'true'
1487+
backend: "mlx-distributed"
1488+
dockerfile: "./backend/Dockerfile.python"
1489+
context: "./"
1490+
ubuntu-version: '2204'
14261491
# SYCL additional backends
14271492
- build-type: 'intel'
14281493
cuda-major-version: ""
@@ -2016,6 +2081,9 @@ jobs:
20162081
- backend: "mlx-audio"
20172082
tag-suffix: "-metal-darwin-arm64-mlx-audio"
20182083
build-type: "mps"
2084+
- backend: "mlx-distributed"
2085+
tag-suffix: "-metal-darwin-arm64-mlx-distributed"
2086+
build-type: "mps"
20192087
- backend: "stablediffusion-ggml"
20202088
tag-suffix: "-metal-darwin-arm64-stablediffusion-ggml"
20212089
build-type: "metal"

Makefile

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Disable parallel execution for backend builds
2-
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/voxtral
2+
.NOTPARALLEL: backends/diffusers backends/llama-cpp backends/outetts backends/piper backends/stablediffusion-ggml backends/whisper backends/faster-whisper backends/silero-vad backends/local-store backends/huggingface backends/rfdetr backends/kitten-tts backends/kokoro backends/chatterbox backends/llama-cpp-darwin backends/neutts build-darwin-python-backend build-darwin-go-backend backends/mlx backends/diffuser-darwin backends/mlx-vlm backends/mlx-audio backends/mlx-distributed backends/stablediffusion-ggml-darwin backends/vllm backends/vllm-omni backends/moonshine backends/pocket-tts backends/qwen-tts backends/faster-qwen3-tts backends/qwen-asr backends/nemo backends/voxcpm backends/whisperx backends/ace-step backends/voxtral
33

44
GOCMD=go
55
GOTEST=$(GOCMD) test
@@ -451,6 +451,10 @@ backends/mlx-audio:
451451
BACKEND=mlx-audio $(MAKE) build-darwin-python-backend
452452
./local-ai backends install "ocifile://$(abspath ./backend-images/mlx-audio.tar)"
453453

454+
backends/mlx-distributed:
455+
BACKEND=mlx-distributed $(MAKE) build-darwin-python-backend
456+
./local-ai backends install "ocifile://$(abspath ./backend-images/mlx-distributed.tar)"
457+
454458
backends/stablediffusion-ggml-darwin:
455459
BACKEND=stablediffusion-ggml BUILD_TYPE=metal $(MAKE) build-darwin-go-backend
456460
./local-ai backends install "ocifile://$(abspath ./backend-images/stablediffusion-ggml.tar)"
@@ -495,6 +499,7 @@ BACKEND_NEMO = nemo|python|.|false|true
495499
BACKEND_VOXCPM = voxcpm|python|.|false|true
496500
BACKEND_WHISPERX = whisperx|python|.|false|true
497501
BACKEND_ACE_STEP = ace-step|python|.|false|true
502+
BACKEND_MLX_DISTRIBUTED = mlx-distributed|python|./|false|true
498503

499504
# Helper function to build docker image for a backend
500505
# Usage: $(call docker-build-backend,BACKEND_NAME,DOCKERFILE_TYPE,BUILD_CONTEXT,PROGRESS_FLAG,NEEDS_BACKEND_ARG)
@@ -548,12 +553,13 @@ $(eval $(call generate-docker-build-target,$(BACKEND_NEMO)))
548553
$(eval $(call generate-docker-build-target,$(BACKEND_VOXCPM)))
549554
$(eval $(call generate-docker-build-target,$(BACKEND_WHISPERX)))
550555
$(eval $(call generate-docker-build-target,$(BACKEND_ACE_STEP)))
556+
$(eval $(call generate-docker-build-target,$(BACKEND_MLX_DISTRIBUTED)))
551557

552558
# Pattern rule for docker-save targets
553559
docker-save-%: backend-images
554560
docker save local-ai-backend:$* -o backend-images/$*.tar
555561

556-
docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-voxtral
562+
docker-build-backends: docker-build-llama-cpp docker-build-rerankers docker-build-vllm docker-build-vllm-omni docker-build-transformers docker-build-outetts docker-build-diffusers docker-build-kokoro docker-build-faster-whisper docker-build-coqui docker-build-chatterbox docker-build-vibevoice docker-build-moonshine docker-build-pocket-tts docker-build-qwen-tts docker-build-faster-qwen3-tts docker-build-qwen-asr docker-build-nemo docker-build-voxcpm docker-build-whisperx docker-build-ace-step docker-build-voxtral docker-build-mlx-distributed
557563

558564
########################################################
559565
### Mock Backend for E2E Tests

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -482,6 +482,7 @@ LocalAI couldn't have been built without the help of great software already avai
482482
- https://github.com/EdVince/Stable-Diffusion-NCNN
483483
- https://github.com/ggerganov/whisper.cpp
484484
- https://github.com/rhasspy/piper
485+
- [exo](https://github.com/exo-explore/exo) for the MLX distributed auto-parallel sharding implementation
485486

486487
## 🤗 Contributors
487488

backend/index.yaml

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,31 @@
259259
nvidia-l4t: "nvidia-l4t-mlx-audio"
260260
nvidia-l4t-cuda-12: "nvidia-l4t-mlx-audio"
261261
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-audio"
262+
- &mlx-distributed
263+
name: "mlx-distributed"
264+
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-mlx-distributed"
265+
icon: https://avatars.githubusercontent.com/u/102832242?s=200&v=4
266+
urls:
267+
- https://github.com/ml-explore/mlx-lm
268+
mirrors:
269+
- localai/localai-backends:latest-metal-darwin-arm64-mlx-distributed
270+
license: MIT
271+
description: |
272+
Run distributed LLM inference with MLX across multiple Apple Silicon Macs
273+
tags:
274+
- text-to-text
275+
- LLM
276+
- MLX
277+
- distributed
278+
capabilities:
279+
default: "cpu-mlx-distributed"
280+
nvidia: "cuda12-mlx-distributed"
281+
metal: "metal-mlx-distributed"
282+
nvidia-cuda-12: "cuda12-mlx-distributed"
283+
nvidia-cuda-13: "cuda13-mlx-distributed"
284+
nvidia-l4t: "nvidia-l4t-mlx-distributed"
285+
nvidia-l4t-cuda-12: "nvidia-l4t-mlx-distributed"
286+
nvidia-l4t-cuda-13: "cuda13-nvidia-l4t-arm64-mlx-distributed"
262287
- &rerankers
263288
name: "rerankers"
264289
alias: "rerankers"
@@ -791,6 +816,11 @@
791816
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-audio"
792817
mirrors:
793818
- localai/localai-backends:master-metal-darwin-arm64-mlx-audio
819+
- !!merge <<: *mlx-distributed
820+
name: "mlx-distributed-development"
821+
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-mlx-distributed"
822+
mirrors:
823+
- localai/localai-backends:master-metal-darwin-arm64-mlx-distributed
794824
## mlx
795825
- !!merge <<: *mlx
796826
name: "cpu-mlx"
@@ -944,6 +974,57 @@
944974
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-audio"
945975
mirrors:
946976
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-audio
977+
## mlx-distributed
978+
- !!merge <<: *mlx-distributed
979+
name: "cpu-mlx-distributed"
980+
uri: "quay.io/go-skynet/local-ai-backends:latest-cpu-mlx-distributed"
981+
mirrors:
982+
- localai/localai-backends:latest-cpu-mlx-distributed
983+
- !!merge <<: *mlx-distributed
984+
name: "cpu-mlx-distributed-development"
985+
uri: "quay.io/go-skynet/local-ai-backends:master-cpu-mlx-distributed"
986+
mirrors:
987+
- localai/localai-backends:master-cpu-mlx-distributed
988+
- !!merge <<: *mlx-distributed
989+
name: "cuda12-mlx-distributed"
990+
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-12-mlx-distributed"
991+
mirrors:
992+
- localai/localai-backends:latest-gpu-nvidia-cuda-12-mlx-distributed
993+
- !!merge <<: *mlx-distributed
994+
name: "cuda12-mlx-distributed-development"
995+
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-12-mlx-distributed"
996+
mirrors:
997+
- localai/localai-backends:master-gpu-nvidia-cuda-12-mlx-distributed
998+
- !!merge <<: *mlx-distributed
999+
name: "cuda13-mlx-distributed"
1000+
uri: "quay.io/go-skynet/local-ai-backends:latest-gpu-nvidia-cuda-13-mlx-distributed"
1001+
mirrors:
1002+
- localai/localai-backends:latest-gpu-nvidia-cuda-13-mlx-distributed
1003+
- !!merge <<: *mlx-distributed
1004+
name: "cuda13-mlx-distributed-development"
1005+
uri: "quay.io/go-skynet/local-ai-backends:master-gpu-nvidia-cuda-13-mlx-distributed"
1006+
mirrors:
1007+
- localai/localai-backends:master-gpu-nvidia-cuda-13-mlx-distributed
1008+
- !!merge <<: *mlx-distributed
1009+
name: "nvidia-l4t-mlx-distributed"
1010+
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-mlx-distributed"
1011+
mirrors:
1012+
- localai/localai-backends:latest-nvidia-l4t-mlx-distributed
1013+
- !!merge <<: *mlx-distributed
1014+
name: "nvidia-l4t-mlx-distributed-development"
1015+
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-mlx-distributed"
1016+
mirrors:
1017+
- localai/localai-backends:master-nvidia-l4t-mlx-distributed
1018+
- !!merge <<: *mlx-distributed
1019+
name: "cuda13-nvidia-l4t-arm64-mlx-distributed"
1020+
uri: "quay.io/go-skynet/local-ai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-distributed"
1021+
mirrors:
1022+
- localai/localai-backends:latest-nvidia-l4t-cuda-13-arm64-mlx-distributed
1023+
- !!merge <<: *mlx-distributed
1024+
name: "cuda13-nvidia-l4t-arm64-mlx-distributed-development"
1025+
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-distributed"
1026+
mirrors:
1027+
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-mlx-distributed
9471028
- !!merge <<: *kitten-tts
9481029
name: "kitten-tts-development"
9491030
uri: "quay.io/go-skynet/local-ai-backends:master-kitten-tts"
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
.PHONY: mlx-distributed
2+
mlx-distributed:
3+
bash install.sh
4+
5+
.PHONY: run
6+
run:
7+
@echo "Running mlx-distributed..."
8+
bash run.sh
9+
@echo "mlx-distributed run."
10+
11+
.PHONY: test
12+
test:
13+
@echo "Testing mlx-distributed..."
14+
bash test.sh
15+
@echo "mlx-distributed tested."
16+
17+
.PHONY: protogen-clean
18+
protogen-clean:
19+
$(RM) backend_pb2_grpc.py backend_pb2.py
20+
21+
.PHONY: clean
22+
clean: protogen-clean
23+
rm -rf venv __pycache__

0 commit comments

Comments
 (0)