[model-gateway] Fix tokenizer caching and improve error handling#15695
[model-gateway] Fix tokenizer caching and improve error handling#15695
Conversation
Summary of ChangesHello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the model-gateway's performance and reliability. By implementing tokenizer caching, it optimizes the request processing pipeline, reducing overhead. Concurrently, it bolsters error handling mechanisms, particularly for model identification, preventing runtime panics and providing clearer error messages. The introduction of a dedicated constant for unknown model IDs further refines code quality and consistency across the system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces several valuable improvements. Caching the tokenizer in the RequestContext is a great performance optimization that reduces redundant lookups. Replacing .unwrap() calls with proper error handling significantly improves the robustness of the code. Furthermore, introducing the UNKNOWN_MODEL_ID constant and fixing incorrect function names in logs enhance code quality and maintainability. The changes are well-implemented and clearly beneficial. I have one suggestion to further improve maintainability by reducing some code duplication.
19ceb70 to
9ef204a
Compare
- Cache tokenizer in RequestContext to avoid redundant registry lookups (4 lookups per request -> 1 lookup, reused across pipeline stages) - Replace dangerous .unwrap() on model_id with proper error handling - Add UNKNOWN_MODEL_ID constant to eliminate magic strings - Fix copy-paste errors in error log function names - Remove unused tokenizer() helper method (dead code) Files changed: - context.rs: Add tokenizer field to ProcessingState, add tokenizer_arc() helper - chat/preparation.rs, generate/preparation.rs: Cache tokenizer, fix model_id unwrap - chat/response_processing.rs, generate/response_processing.rs: Use cached tokenizer - core/mod.rs: Add UNKNOWN_MODEL_ID constant - model_card.rs, dispatch_metadata.rs, bucket.rs, cache_aware.rs: Use constant
bd22f42 to
74ce93c
Compare
74ce93c to
2473ccd
Compare
Files changed:
Checklist