ci: add vLLM gRPC e2e tests by key4ng · Pull Request #158 · lightseekorg/smg

key4ng · 2026-01-22T22:00:04Z

Description

Add end-to-end tests for vLLM gRPC backend to ensure gateway compatibility with vLLM's gRPC server.

Problem

The gateway supports vLLM gRPC backend but lacked automated e2e tests to verify functionality and catch regressions.

Solution

Add comprehensive test suite covering chat completions, streaming, structured output, and health checks for vLLM gRPC backend, with proper CI integration.

Changes

Add e2e_test/vllm_grpc/ test suite (basic, streaming, health tests)
Add vLLM runtime support in runtime_registry.py
Add vLLM model specs (llama-1b-vllm, llama-8b-vllm)
Add vllm-grpc job to CI workflow with flash-attn/flashinfer dependencies
Add request timeouts to prevent test hangs

Test Plan

Checklist

cargo +nightly fmt passes
cargo clippy --all-targets --all-features -- -D warnings passes
(Optional) Documentation updated

gemini-code-assist · 2026-01-22T22:00:24Z

Summary of Changes

Hello @key4ng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's capability by integrating vLLM gRPC models and establishing a robust framework for managing diverse model runtimes. The changes involve abstracting the model worker command generation into a dedicated registry, updating model specifications to include runtime information, and introducing a comprehensive set of end-to-end tests to validate the new vLLM gRPC functionality across various scenarios, from basic chat to advanced streaming and structured output.

Highlights

vLLM gRPC Integration: Introduced support for vLLM gRPC models, enabling the system to launch and interact with vLLM servers using the gRPC protocol.
Runtime Registry Abstraction: Implemented a new runtime_registry.py module to abstract the process of building worker launch commands, allowing for flexible integration of different model runtimes (e.g., SGLang, vLLM, TensorRT) without modifying core model launching logic.
New vLLM Model Specifications: Added new model entries in model_specs.py specifically for vLLM gRPC, including 'llama-8b-vllm', 'llama-1b-vllm', and 'qwen-7b-vllm', each configured with the 'vllm' runtime.
Comprehensive E2E Tests for vLLM gRPC: Added a dedicated suite of end-to-end tests under e2e_test/vllm_grpc/ covering basic chat completions (streaming and non-streaming), various sampling parameters, structured output (JSON, regex), health checks, worker status, and streaming edge cases.
Refactored Model Launching: Modified model_pool.py to utilize the new build_worker_cmd function from the runtime registry, centralizing and simplifying the logic for launching model workers based on their specified runtime.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/e2e-vllm-grpc.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the vLLM gRPC backend, which is a significant and well-executed addition. The refactoring to create a runtime_registry is a great design choice that makes the testing infrastructure much more extensible for future backends. The new e2e tests for vLLM are comprehensive, covering basic functionality, streaming, health checks, and various sampling parameters.

I've provided a few suggestions to improve code readability and remove redundancy in the test files and helper functions. Overall, this is a high-quality contribution.

e2e_test/infra/model_specs.py

e2e_test/vllm_grpc/test_basic.py

slin1237 · 2026-01-24T01:36:05Z

instead of another file
can we add additional mark to existing test case, such that it will default to SGL. And spin up vLLM as well if vLLM is defined
another test file is hard to scale and maintain

e2e_test/vllm_grpc/__init__.py

e2e_test/infra/model_pool.py

key4ng · 2026-01-28T23:03:41Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces end-to-end tests for the vLLM gRPC backend, a valuable addition for ensuring gateway compatibility and catching regressions. The implementation is well-structured, including logic to handle vLLM-specific features and a fallback gRPC health check. I've provided a few suggestions to enhance code style and error handling, such as moving imports to the top of files for better organization and making exception handling more specific for easier debugging. Overall, this is a solid contribution that significantly improves test coverage.

e2e_test/chat_completions/test_function_calling.py

e2e_test/fixtures/setup_backend.py

e2e_test/infra/model_pool.py

.github/workflows/pr-test-rust.yml

e2e_test/chat_completions/test_function_calling.py

github-actions bot added ci CI/CD configuration changes tests Test changes labels Jan 22, 2026

gemini-code-assist bot reviewed Jan 22, 2026

View reviewed changes

e2e_test/infra/model_specs.py Outdated Show resolved Hide resolved

e2e_test/vllm_grpc/test_basic.py Outdated Show resolved Hide resolved

e2e_test/vllm_grpc/test_basic.py Outdated Show resolved Hide resolved

key4ng changed the title ~~add vllm grpc test~~ ci: add vllm grpc e2e test Jan 24, 2026

key4ng changed the title ~~ci: add vllm grpc e2e test~~ ci: add vLLM gRPC e2e tests Jan 24, 2026

key4ng marked this pull request as ready for review January 24, 2026 00:17

key4ng requested review from CatherineSue and slin1237 as code owners January 24, 2026 00:17

CatherineSue reviewed Jan 24, 2026

View reviewed changes

e2e_test/vllm_grpc/__init__.py Outdated Show resolved Hide resolved

e2e_test/infra/model_pool.py Show resolved Hide resolved

key4ng force-pushed the vllm-grpc-ci branch from c944478 to c116e83 Compare January 27, 2026 00:25

github-actions bot added ci CI/CD configuration changes and removed ci CI/CD configuration changes labels Jan 27, 2026

key4ng force-pushed the vllm-grpc-ci branch from ed93212 to 9cf1fd2 Compare January 27, 2026 19:15

key4ng requested a review from XinyueZhang369 as a code owner January 28, 2026 01:46

key4ng force-pushed the vllm-grpc-ci branch from 7ea2e4f to f4ed866 Compare January 28, 2026 19:36

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

key4ng force-pushed the vllm-grpc-ci branch from ec735c0 to 16e91f0 Compare January 28, 2026 23:35

CatherineSue reviewed Jan 29, 2026

View reviewed changes

.github/workflows/pr-test-rust.yml Show resolved Hide resolved

CatherineSue reviewed Jan 29, 2026

View reviewed changes

e2e_test/chat_completions/test_function_calling.py Outdated Show resolved Hide resolved

slin1237 approved these changes Jan 29, 2026

View reviewed changes

key4ng added 7 commits January 28, 2026 17:16

implement vllm grpc ci

804497a

remvove enabele thinking tests

5f143ac

split the step to avoid dependency issue

81e190f

skip qwen-30b

e9a7442

fix the filter

a282bc5

new implementation

b97d527

skip logprob test because grpc vllm doesn't support

0a9e891

key4ng added 16 commits January 28, 2026 17:16

skip logprob check

878c908

test

49cc22d

fix model selection

5bc80c1

skip n > 1

b3be00f

add set backend

a524d06

remove set backend

ddffcb9

add back test matrix

5e5f6ac

clean flag

9d56241

clean comments

042a1a5

clean skip tests logic

5d187f5

better skip logic

fe3995c

clean get runtime method

9c13d79

resolve comments - addd debug log

ac072da

centralize the skipped tests

b1434fe

update filter

0894fe5

fix the closer model

cbca2ce

key4ng force-pushed the vllm-grpc-ci branch from 2560575 to cbca2ce Compare January 29, 2026 01:16

key4ng requested a review from CatherineSue January 29, 2026 01:41

CatherineSue approved these changes Jan 29, 2026

View reviewed changes

key4ng merged commit aff6273 into main Jan 29, 2026
14 checks passed

key4ng deleted the vllm-grpc-ci branch January 29, 2026 01:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add vLLM gRPC e2e tests#158

ci: add vLLM gRPC e2e tests#158
key4ng merged 23 commits intomainfrom
vllm-grpc-ci

key4ng commented Jan 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Jan 24, 2026

Uh oh!

Uh oh!

Uh oh!

key4ng commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

key4ng commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Problem

Solution

Changes

Test Plan

Uh oh!

gemini-code-assist bot commented Jan 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Jan 24, 2026

Uh oh!

Uh oh!

Uh oh!

key4ng commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

key4ng commented Jan 22, 2026 •

edited

Loading