Skip to content

feat: Add vLLM LLM support with OpenAI-compatible API#538

Open
SeasonPilot wants to merge 1 commit intoagentuniverse-ai:masterfrom
SeasonPilot:feature/vllm-llm-support
Open

feat: Add vLLM LLM support with OpenAI-compatible API#538
SeasonPilot wants to merge 1 commit intoagentuniverse-ai:masterfrom
SeasonPilot:feature/vllm-llm-support

Conversation

@SeasonPilot
Copy link
Copy Markdown

@SeasonPilot SeasonPilot commented Dec 10, 2025

When submitting a PR, please confirm the following points and put [x] in the boxes one by one. | 在提出pr时,请确认了以下几点,并逐一使用[x]符号确认勾选。

Checklist | 检查项

  • I have read and understood the contributor guidelines. | 我已阅读并理解贡献者指南
  • I have checked for any duplicate features related to this request and communicated with the project maintainers. | 我已检查没有与此请求重复的功能并与项目维护者进行了沟通。
  • I accept the suggestion of the maintainers to make changes to or close this PR. | 我接受此PR配合维护人员的建议进行修改或关闭。
  • I have submitted the test files and can provide screenshots of the test results (required for feature or bug fixes) | 我已经提交了测试文件并可提供测试结果截图(功能修改、BUG修复类PR必须提供,其他按需)
  • I have added or modified the documentation related to this PR | 我已经添加或修改了本次pr对应的文档说明(非必要,根据实际PR内容按需添加)
  • I have added examples and notes if needed | 我已经添加了使用案例代码与文档说明(非必要,根据实际PR内容按需添加)

Please fill in the specific details of this PR: | 请详细填写本次PR的内容:

  • Implemented comprehensive vLLM LLM integration with OpenAI-compatible API support for high-performance inference (24x faster than HuggingFace Transformers, 50-70% memory reduction)
  • Added VLLMOpenAIStyleLLM class extending OpenAIStyleLLM with vLLM-specific parameters (beam_search, best_of, length_penalty, early_stopping) and pre-configured context lengths for 30+
    popular models including Llama 2/3.1/3.2, Mistral/Mixtral, Qwen 2/2.5, Yi, DeepSeek, and Phi families
  • Created complete example configurations (Llama 3.1 8B/70B, Qwen 2.5 7B) with comprehensive deployment guide including basic server setup, Docker deployment, multi-GPU tensor
    parallelism, quantization options, performance tuning, and troubleshooting

Please provide the path of test files and submit screenshots or files of the test results(fill in as needed): | 请填写测试文件路径并提供测试结果截图或文件(按需填写):
image

  • Test file path : tests/test_agentuniverse/unit/llm/test_vllm_openai_style_llm.py
  • Test results : ✅ 7/7 tests passed (7 passed in 0.97s)
    • test_initialization - Validates VLLMOpenAIStyleLLM initialization with correct parameters
    • test_max_context_length - Verifies context length retrieval for Llama 3.1 (131072 tokens)
    • test_vllm_specific_parameters - Confirms vLLM-specific parameters are properly set
    • test_get_num_tokens - Tests token counting functionality
    • test_llama_context_lengths - Validates Llama family context lengths (2/3.1/3.2)
    • test_mistral_context_lengths - Validates Mistral/Mixtral family context lengths
    • test_qwen_context_lengths - Validates Qwen 2/2.5 family context lengths

Please list the names of the docs that were added or modified in this PR (fill in as needed): | 请列出本次PR新增或修改的文档名称(按需填写):

  • agentuniverse/llm/default/vllm_openai_style_llm.py - Core implementation with comprehensive docstrings and usage examples (230 lines)
  • agentuniverse/llm/default/vllm_openai_style_llm.yaml - YAML configuration template
  • examples/sample_standard_app/intelligence/agentic/llm/buildin/vllm/README.md - Comprehensive deployment guide (268 lines) including installation, 4 deployment options, configuration
    examples, performance tuning, troubleshooting, and cost comparison
  • examples/sample_standard_app/intelligence/agentic/llm/buildin/vllm/vllm_llama_3_1_8b.yaml - Basic Llama 3.1 8B configuration example
  • examples/sample_standard_app/intelligence/agentic/llm/buildin/vllm/vllm_llama_3_1_70b.yaml - Advanced Llama 3.1 70B configuration with beam search optimization
  • examples/sample_standard_app/intelligence/agentic/llm/buildin/vllm/vllm_qwen_2_5_7b.yaml - Qwen 2.5 7B configuration for multilingual support (Chinese/English)

Related Issue: #250

Implemented comprehensive vLLM integration for high-performance LLM inference:

Features:
- OpenAI-compatible API integration via VLLMOpenAIStyleLLM class
- Support for vLLM-specific parameters (beam_search, best_of, length_penalty, early_stopping)
- Pre-configured context lengths for 30+ popular models (Llama, Mistral, Qwen, Yi, DeepSeek, Phi)
- Full async/sync and streaming/non-streaming support
- Environment variable configuration (VLLM_API_BASE, VLLM_API_KEY)

Implementation:
- Core: agentuniverse/llm/default/vllm_openai_style_llm.py (233 lines)
- Config: agentuniverse/llm/default/vllm_openai_style_llm.yaml
- Tests: tests/test_agentuniverse/unit/llm/test_vllm_openai_style_llm.py (7 tests, all passing)

Examples:
- Basic Llama 3.1 8B configuration
- Advanced Llama 3.1 70B with beam search optimization
- Qwen 2.5 7B for multilingual support
- Comprehensive README with deployment guides and troubleshooting

Performance Benefits:
- 24x faster inference compared to HuggingFace Transformers
- 50-70% memory reduction via PagedAttention
- 60-80% cost savings vs cloud APIs

Related to issue agentuniverse-ai#250
@SeasonPilot
Copy link
Copy Markdown
Author

#250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant