[TRTLLM-10076][feat] Serve CLI improvements: renames, new flags, and mm_embedding_serve enhancements#12105
Conversation
📝 WalkthroughWalkthroughThe changes add support for a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tensorrt_llm/commands/serve.py`:
- Around line 1122-1127: The disaggregated_mpi_worker entrypoint (function
disaggregated_mpi_worker) should mirror the deprecation behavior of
disaggregated by detecting if "--config_file" was passed and emitting a
DeprecationWarning; add a check using sys.argv to see if "--config_file" is
present and call warnings.warn(..., DeprecationWarning, stacklevel=2)
immediately after the disaggregated_mpi_worker docstring/entry log so users see
the same deprecation message as the disaggregated command.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c1f1f5fa-f1ab-4275-9a8a-9c393cfda74f
📒 Files selected for processing (2)
tensorrt_llm/commands/serve.pytensorrt_llm/llmapi/llm_args.py
b398e02 to
57a6655
Compare
… improvements: renames, new flags, and mm_embedding_serve enhancements - TRTLLM-10076: Update --tokenizer description for PyTorch backend, add --hf_revision alias for --revision with deprecation warning, support hf_revision key in YAML config, add --enable_attention_dp flag - TRTLLM-10079: mm_embedding_serve: add --config alias for --extra_encoder_options, expose --hf_revision, --free_gpu_memory_fraction, --tensor_parallel_size - TRTLLM-10229: Add --config alias for --config_file in disaggregated and disaggregated_mpi_worker commands - TRTLLM-10078: Improve --server_role help message with role descriptions Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Made-with: Cursor Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Made-with: Cursor Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Made-with: Cursor
57a6655 to
1386e53
Compare
|
/bot run |
|
PR_Github #38560 [ run ] triggered by Bot. Commit: |
|
PR_Github #38560 [ run ] completed with state
|
|
/bot run |
|
PR_Github #38585 [ run ] triggered by Bot. Commit: |
Made-with: Cursor
Summary by CodeRabbit
Release Notes
New Features
--enable_attention_dpflag to the serve command for distributed attention processing.Deprecations
--hf_revisioninstead of--revision.--configinstead of--extra_encoder_optionsand--config_file.Improvements
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.