Skip to content

ci: add vLLM gRPC e2e tests#158

Merged
key4ng merged 23 commits intomainfrom
vllm-grpc-ci
Jan 29, 2026
Merged

ci: add vLLM gRPC e2e tests#158
key4ng merged 23 commits intomainfrom
vllm-grpc-ci

Conversation

@key4ng
Copy link
Collaborator

@key4ng key4ng commented Jan 22, 2026

Description

Add end-to-end tests for vLLM gRPC backend to ensure gateway compatibility with vLLM's gRPC server.

Problem

The gateway supports vLLM gRPC backend but lacked automated e2e tests to verify functionality and catch regressions.

Solution

Add comprehensive test suite covering chat completions, streaming, structured output, and health checks for vLLM gRPC backend, with proper CI integration.

Changes

  • Add e2e_test/vllm_grpc/ test suite (basic, streaming, health tests)
  • Add vLLM runtime support in runtime_registry.py
  • Add vLLM model specs (llama-1b-vllm, llama-8b-vllm)
  • Add vllm-grpc job to CI workflow with flash-attn/flashinfer dependencies
  • Add request timeouts to prevent test hangs

Test Plan

Checklist
  • cargo +nightly fmt passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • (Optional) Documentation updated

@github-actions github-actions bot added ci CI/CD configuration changes tests Test changes labels Jan 22, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @key4ng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's capability by integrating vLLM gRPC models and establishing a robust framework for managing diverse model runtimes. The changes involve abstracting the model worker command generation into a dedicated registry, updating model specifications to include runtime information, and introducing a comprehensive set of end-to-end tests to validate the new vLLM gRPC functionality across various scenarios, from basic chat to advanced streaming and structured output.

Highlights

  • vLLM gRPC Integration: Introduced support for vLLM gRPC models, enabling the system to launch and interact with vLLM servers using the gRPC protocol.
  • Runtime Registry Abstraction: Implemented a new runtime_registry.py module to abstract the process of building worker launch commands, allowing for flexible integration of different model runtimes (e.g., SGLang, vLLM, TensorRT) without modifying core model launching logic.
  • New vLLM Model Specifications: Added new model entries in model_specs.py specifically for vLLM gRPC, including 'llama-8b-vllm', 'llama-1b-vllm', and 'qwen-7b-vllm', each configured with the 'vllm' runtime.
  • Comprehensive E2E Tests for vLLM gRPC: Added a dedicated suite of end-to-end tests under e2e_test/vllm_grpc/ covering basic chat completions (streaming and non-streaming), various sampling parameters, structured output (JSON, regex), health checks, worker status, and streaming edge cases.
  • Refactored Model Launching: Modified model_pool.py to utilize the new build_worker_cmd function from the runtime registry, centralizing and simplifying the logic for launching model workers based on their specified runtime.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/e2e-vllm-grpc.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the vLLM gRPC backend, which is a significant and well-executed addition. The refactoring to create a runtime_registry is a great design choice that makes the testing infrastructure much more extensible for future backends. The new e2e tests for vLLM are comprehensive, covering basic functionality, streaming, health checks, and various sampling parameters.

I've provided a few suggestions to improve code readability and remove redundancy in the test files and helper functions. Overall, this is a high-quality contribution.

@key4ng key4ng changed the title add vllm grpc test ci: add vllm grpc e2e test Jan 24, 2026
@key4ng key4ng changed the title ci: add vllm grpc e2e test ci: add vLLM gRPC e2e tests Jan 24, 2026
@key4ng key4ng marked this pull request as ready for review January 24, 2026 00:17
@slin1237
Copy link
Collaborator

instead of another file
can we add additional mark to existing test case, such that it will default to SGL. And spin up vLLM as well if vLLM is defined
another test file is hard to scale and maintain

@github-actions github-actions bot added ci CI/CD configuration changes and removed ci CI/CD configuration changes labels Jan 27, 2026
@key4ng
Copy link
Collaborator Author

key4ng commented Jan 28, 2026

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces end-to-end tests for the vLLM gRPC backend, a valuable addition for ensuring gateway compatibility and catching regressions. The implementation is well-structured, including logic to handle vLLM-specific features and a fallback gRPC health check. I've provided a few suggestions to enhance code style and error handling, such as moving imports to the top of files for better organization and making exception handling more specific for easier debugging. Overall, this is a solid contribution that significantly improves test coverage.

@key4ng key4ng merged commit aff6273 into main Jan 29, 2026
14 checks passed
@key4ng key4ng deleted the vllm-grpc-ci branch January 29, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CI/CD configuration changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments