Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
export_vision_encoder.py	export_vision_encoder.py
fastvlm_mlx-vlm.patch	fastvlm_mlx-vlm.patch

Name

Last commit message

Last commit date

Model Export for inference on Apple Silicon

Disclaimer: this is not an official recommendation, just research and exploration.

Export Vision Encoder

We found that LLaVA trainer does not save all the states needed for auto inference, predominantly used in third party libraries like mlx-vlm. We save additional metadata to model checkpoint directory and export the vision model using coremltools. Export vision encoder and patch the checkpoint using the instruction below.

python export_vision_encoder.py --model-path /path/to/fastvlm-checkpoint

Export VLM

Install mlx-vlm

We provide a patch to mlx-vlm to support inference of FastVLM.

git clone https://github.com/Blaizzy/mlx-vlm.git
cd mlx-vlm 
git checkout 1884b551bc741f26b2d54d68fa89d4e934b9a3de
git apply ../fastvlm_mlx-vlm.patch
pip install -e .

Export model using the following instruction.

python -m mlx_vlm.convert --hf-path  /path/to/fastvlm-checkpoint \
                          --mlx-path /path/to/exported-fastvlm \
                          --only-llm

To quantize the LLM, additional options can be provided as shown below. --q-bits specifies bits per weight, the command below exports the LLM with 8-bit quantization.

python -m mlx_vlm.convert --hf-path  /path/to/fastvlm-checkpoint \
                          --mlx-path /path/to/exported-fastvlm \
                          --only-llm \
                          -q \
                          --q-bits 8       # For 4-bit quantization, specify 4

Generate

The exported model can be used for inference in a python environment following the instruction below.

python -m mlx_vlm.generate --model /path/to/exported-fastvlm \
                           --image /path/to/image.png \
                           --prompt "Describe the image." \ 
                           --max-tokens 256 \
                           --temp 0.0

Troubleshooting

We noticed that sometimes config.json for the LLaVA model incorrectly sets the value for tie_word_embeddings. This causes the following error during conversion, ValueError: Received parameters not in model: language_model.lm_head.weight. If you encounter this error, set the value of tie_word_embeddings accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Model Export for inference on Apple Silicon

Export Vision Encoder

Export VLM

Install mlx-vlm

Generate

Troubleshooting

FilesExpand file tree

model_export

Directory actions

More options

Directory actions

More options

Latest commit

History

model_export

Folders and files

parent directory

README.md

Model Export for inference on Apple Silicon

Export Vision Encoder

Export VLM

Install mlx-vlm

Generate

Troubleshooting