A tool that forwards responses through an Ollama-like server. Allows you to pipe through responses from LM Studio or OpenRouter to any Ollama-compatible endpoint.
PseudoLlama is a simple Express server that mimics the Ollama API. It serves content from a text file as responses to API requests, making it useful for testing applications that integrate with Ollama.
- Simulates Ollama API endpoints (
/api/chat,/api/generate, etc.) - Also supports OpenAI-compatible endpoints (
/v1/chat/completions,/v1/completions, etc.) - Web UI for editing the content and testing the server
- Supports both streaming and non-streaming responses
- Comprehensive logging of all model communications (requests and responses)
npm installStart the server:
npm startThe server runs on port 12345 by default. This is a fixed port for testing purposes.
IMPORTANT: When connecting to this server from other tools, you must specify port 12345 in your configuration.
POST /api/chat- Chat completionsPOST /api/generate- Text generationPOST /api/embeddings- Generate embeddingsGET /api/tags- List available modelsPOST /api/pull- Simulate model pulling
POST /v1/chat/completions- Chat completionsPOST /v1/completions- Text completionsPOST /v1/embeddings- Generate embeddingsGET /v1/models- List available models
GET /api/server/status- Check server statusPOST /api/server/toggle- Enable/disable the serverGET /api/content- Get the current contentPOST /api/content- Update the content
Access the web UI by navigating to http://localhost:12345 in your browser. The UI allows you to:
- View and edit the content that will be returned by the API
- Test the API by sending a request to the server
- Enable/disable the server
PseudoLlama includes comprehensive logging of all model communications:
Basic request and response information is logged to the console when the server is running.
Complete model communications (including full request and response bodies) are logged to logs/model_communications.log. This is particularly useful for:
- Debugging applications that integrate with language models
- Analyzing the exact data sent to and received from models
- Understanding the structure of streaming responses
A log viewer utility is included to help analyze the logs:
# View all logs
node view-logs.js
# Show only the last 10 log entries
node view-logs.js --limit=10
# Filter logs by model
node view-logs.js --model=openrouter
# Filter logs by endpoint
node view-logs.js --endpoint=/v1/chat
# Show only requests
node view-logs.js --requests
# Show only responses
node view-logs.js --responses
# Watch for new log entries in real-time
node view-logs.js --tail
# Show help
node view-logs.js --helpThe log files are automatically rotated when they reach 10MB to prevent excessive disk usage.