A FastAPI-based mock LLM API server that simulates multiple Large Language Model API providers.
Supported backends:
| Backend | Endpoints |
|---|---|
| vLLM | • /v1/chat/completions • /v1/models • /health |
| Mistral | • /v1/chat/completions • /v1/models • /v1/embeddings |
| Text Embeddings Inference | • /v1/embeddings • /health • /info • /rerank |
-
Install the package:
pip install git+https://github.com/etalab-ia/openmockllm.git
-
Run the server:
openmockllm
| Argument | Type | Default | Description |
|---|---|---|---|
--backend |
str | vllm |
Backend to use: vllm, mistral, or tei |
--port |
int | 8000 |
Port to run the server on |
--max-context |
int | 128000 |
Maximum context length |
--owned-by |
str | OpenMockLLM |
Owner of the API |
--model-name |
str | openmockllm |
Model name to return in responses |
--embedding-dimension |
int | 1024 |
Embedding dimension |
--api-key |
str | None |
API key for authentication |
--tiktoken-encoder |
str | cl100k_base |
Tiktoken encoder |
--faker-langage |
str | fr_FR |
Langage used for generating prompt responses |
--faker-seed |
str | None |
Seed for Faker generation |
--simulate-latency |
flag | False |
Simulate latency |
--reference-tps |
int | 100 |
Reference tokens per second for latency simulation |
| Argument | Type | Default | Description |
|---|---|---|---|
--payload-limit |
int | 2000000 |
Payload size limit in bytes (2MB) |
--max-client-batch-size |
int | 32 |
Maximum number of inputs per request |
--auto-truncate |
flag | False |
Automatically truncate inputs longer than max size |
--max-batch-tokens |
int | 16384 |
Maximum total tokens in a batch |
- Streaming response:
curl -N -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{ "model": "openmockllm", "messages": [{"role": "user", "content": "Bonjour"}], "stream": true }'- Non-streaming response:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{ "model": "openmockllm", "messages": [{"role": "user", "content": "Bonjour"}], "stream": false }'# Generate embeddings
curl -X POST http://localhost:8002/v1/embeddings \
-H "Content-Type: application/json" \
-d '{ "input": "Hello, world!", "model": "openmockllm" }'
# Get model info
curl http://localhost:8002/info
# Rerank documents
curl -X POST http://localhost:8002/rerank \
-H "Content-Type: application/json" \
-d '{ "query": "What is Deep Learning?", "texts": ["Deep Learning is...", "Machine Learning is..."] }'Contributions are welcome! Please feel free to submit a Pull Request.
- Install the dependencies:
pip install -e ".[dev]"- Run a server:
python -m openmockllm.main --reload --backend mistral- From openapi.json file:
BACKEND=tei
datamodel-codegen --input docs/${BACKEND} --input-file-type openapi --output openmockllm/${BACKEND}/schemas.py --output-model-type pydantic_v2.BaseModel --strict-nullable
Another recommand method is to use official SDK of the backend.
- Install the dependencies:
pip install -e ".[dev]"- Run a server:
python -m openmockllm.main --reload --backend mistral- Run the tests:
pytest tests/test_mistral