diff --git a/README.md b/README.md index 71a0fe5..a6ceed9 100644 --- a/README.md +++ b/README.md @@ -113,6 +113,7 @@ An awesome & curated list of the best LLMOps tools for developers. | [Infinity](https://github.com/michaelfeil/infinity) | Rest API server for serving text-embeddings | ![GitHub Badge](https://img.shields.io/github/stars/michaelfeil/infinity.svg?style=flat-square) | | [Modelz-LLM](https://github.com/tensorchord/modelz-llm) | OpenAI compatible API for LLMs and embeddings (LLaMA, Vicuna, ChatGLM and many others) | ![GitHub Badge](https://img.shields.io/github/stars/tensorchord/modelz-llm.svg?style=flat-square) | | [Ollama](https://github.com/jmorganca/ollama) | Serve Llama 2 and other large language models locally from command line or through a browser interface. | ![GitHub Badge](https://img.shields.io/github/stars/jmorganca/ollama.svg?style=flat-square) | +| [Rapid-MLX](https://github.com/raullenchai/Rapid-MLX) | OpenAI-compatible LLM inference server for Apple Silicon using MLX. 2-4x faster than Ollama with tool calling and prompt caching. | ![GitHub Badge](https://img.shields.io/github/stars/raullenchai/Rapid-MLX.svg?style=flat-square) | | [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) | Inference engine for TensorRT on Nvidia GPUs | ![GitHub Badge](https://img.shields.io/github/stars/NVIDIA/TensorRT-LLM.svg?style=flat-square) | | [text-generation-inference](https://github.com/huggingface/text-generation-inference) | Large Language Model Text Generation Inference | ![GitHub Badge](https://img.shields.io/github/stars/huggingface/text-generation-inference.svg?style=flat-square) | | [text-embeddings-inference](https://github.com/huggingface/text-embeddings-inference) | Inference for text-embedding models | ![GitHub Badge](https://img.shields.io/github/stars/huggingface/text-embeddings-inference.svg?style=flat-square) |