[ci] update baseline and fix lmdeploy version#2098
[ci] update baseline and fix lmdeploy version#2098MaiziXiao merged 6 commits intoopen-compass:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR updates baseline metrics across several test configuration files and adjusts the CI workflow for lmdeploy, including changes to default values and retry behavior.
- Updated the default value for building lmdeploy and increased retry attempts in CI workflows.
- Modified multiple baseline accuracy values in various YAML scripts to reflect new performance figures.
- Adjusted job conditions in CI to depend on the successful preparation of the environment.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| .github/workflows/daily-run-test.yml | Updated default value for lmdeploy build, increased retry attempts, and enhanced job conditions. |
| .github/scripts/oc_score_baseline_testrange.yaml | Revised baseline accuracy values for various models. |
| .github/scripts/oc_score_baseline_fullbench.yaml | Updated numerous metrics values across fullbench tests. |
| .github/scripts/oc_score_baseline.yaml | Adjusted demo accuracy and related metrics for lmdeploy tests. |
Comments suppressed due to low confidence (3)
.github/workflows/daily-run-test.yml:20
- Changing the default for 'build lmdeploy' from true to false may impact downstream workflows; please confirm that this is the intended behavior.
default: false
.github/scripts/oc_score_baseline_testrange.yaml:15
- [nitpick] Confirm that the updated baseline value for 'gsm8k_accuracy' reflects the new performance expectations and is consistent with the results from recent tests.
gsm8k_accuracy: 34.38
.github/scripts/oc_score_baseline.yaml:12
- [nitpick] Please verify that the updated 'demo_gsm8k_accuracy' metric aligns with the current testing criteria for lmdeploy integration.
demo_gsm8k_accuracy: 84.38
|
|
||
| daily_run_test_volc: | ||
| if: ${{!cancelled()}} | ||
| if: ${{!cancelled() && contains(needs.prepare_env.result, 'success')}} |
There was a problem hiding this comment.
[nitpick] The added condition relies on a substring match of the job result; consider using a more explicit status check if supported by the CI system to improve clarity.
| if: ${{!cancelled() && contains(needs.prepare_env.result, 'success')}} | |
| if: ${{!cancelled() && needs.prepare_env.result == 'success'}} |
There was a problem hiding this comment.
Outdated models like gemma2, internlm2, deepseek v2 can be removed in the later update
* update * update * update * update * update * update
* update * update * update * update * update * update
* update * update * update * update * update * update
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here and update the documentation.
Checklist
Before PR:
After PR: