Skip to content

adjust dynamic vs static outputs comparison in test_lora_update.py#11884

Merged
Fridge003 merged 6 commits intosgl-project:mainfrom
glenliu21:test_lora_update_fix
Oct 24, 2025
Merged

adjust dynamic vs static outputs comparison in test_lora_update.py#11884
Fridge003 merged 6 commits intosgl-project:mainfrom
glenliu21:test_lora_update_fix

Conversation

@glenliu21
Copy link
Contributor

Motivation

Currently, test_lora_update.py requires dynamic outputs be equivalent to static outputs. This is too strict of a test, and it occasionally fails.

For example, I ran with the following hardware:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA H200                    On  |   00000000:26:00.0 Off |                    0 |
| N/A   26C    P0             75W /  700W |       0MiB / 143771MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

and got the error:

AssertionError: 'the [55 chars] It involves the use of algorithms, data struc[86 chars]s to' != 'the [55 chars] It is a broad field that encompasses a wide r[74 chars]orks'
- the theory, design, and implementation of computer systems. It involves the use of algorithms, data structures, and software engineering techniques to develop efficient and effective solutions to
+ the theory, design, and implementation of computer systems. It is a broad field that encompasses a wide range of topics, including algorithms, computer architecture, computer networks
 : Output mismatch at batch 1, prompt 11:
- Dynamic: 'the theory, design, and implementation of computer systems. It involves the use of algorithms, data structures, and software engineering techniques to develop efficient and effective solutions to'
- Static: 'the theory, design, and implementation of computer systems. It is a broad field that encompasses a wide range of topics, including algorithms, computer architecture, computer networks'
E
======================================================================
ERROR: test_dynamic_lora_update_engine (__main__.TestLoRADynamicUpdate)
Test dynamic LoRA updates in engine mode.
----------------------------------------------------------------------
AssertionError: 'the [55 chars] It involves the use of algorithms, data struc[86 chars]s to' != 'the [55 chars] It is a broad field that encompasses a wide r[74 chars]orks'
- the theory, design, and implementation of computer systems. It involves the use of algorithms, data structures, and software engineering techniques to develop efficient and effective sol
utions to
+ the theory, design, and implementation of computer systems. It is a broad field that encompasses a wide range of topics, including algorithms, computer architecture, computer networks
 : Output mismatch at batch 1, prompt 11:
- Dynamic: 'the theory, design, and implementation of computer systems. It involves the use of algorithms, data structures, and software engineering techniques to develop efficient and effective solutions to'
- Static: 'the theory, design, and implementation of computer systems. It is a broad field that encompasses a wide range of topics, including algorithms, computer architecture, computer ne
tworks'

Because the rest of the test guarantees that LoRA adapters are loaded and unloaded correctly, this check should be more of a sanity check than a rigorous test.

Modifications

Instead of comparing outputs exactly, the test now requires that the number of overlapping words is above some threshold (currently set to 90%). Though this should reduce the number of false negatives, this test still could be flaky, and I believe that in the future we could consider just removing this check.

Checklist

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @glenliu21, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses the flakiness of the test_lora_update.py test by relaxing the output comparison criteria. Previously, the test demanded exact matches between dynamic and static outputs, which proved too strict and led to intermittent failures. The updated approach now validates output similarity based on a 90% word overlap threshold, ensuring the test remains effective without being overly restrictive.

Highlights

  • Relaxed Output Comparison: The test test_lora_update.py no longer requires exact equivalence between dynamic and static outputs, addressing flakiness due to strict comparison.
  • Word Overlap Threshold: Output comparison now uses a word overlap threshold of 90% to determine similarity, making the test more robust while still ensuring correctness.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adjusts the comparison of dynamic and static outputs in test_lora_update.py to be less strict, using a word overlap threshold instead of an exact match. This is a good approach to make the test more robust against minor generation differences. I've identified a potential ZeroDivisionError in the new similarity calculation and suggest a fix. I've also proposed a small improvement to the assertion's error message for better readability.

@Fridge003
Copy link
Collaborator

Maybe we can use Rouge-L as the metric for comparing output? Like this one:
https://github.com/sgl-project/sglang/blob/main/test/srt/lora/utils.py#L274

@Fridge003 Fridge003 merged commit fc86b18 into sgl-project:main Oct 24, 2025
136 of 145 checks passed
@glenliu21 glenliu21 deleted the test_lora_update_fix branch October 25, 2025 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments