-
Notifications
You must be signed in to change notification settings - Fork 383
create a new accuracy eval script for official README.md eval accuracy #3449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3449
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1581808 with merge base 69ce0fd ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Creates a standalone eval script for generating accuracy metrics for quantization README.md, based on the HuggingFace model definition of LLaMa 3.1 8B Why new script? 1. the current `prod` script in https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now 2. we have HummingBird scripts in https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases, but they seem pretty verbose and hard to use/modify 3. we have https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py, I copy-pasted and modified this for the current PR. The script above didn't work as is for various reasons, and also seemed to be hard to use/modify, for main README.md it's important to have a very simple standalone script. We should probably do a pass on the naming before landing. Future work: 1. add metrics for `int4_weight_only_hqq` (need to run on A100) 2. add metrics for `mxfp8` and `nvfp4` (need to run on B200) 3. make the parsing of logs automated 4. also add a similar script for performance benchmarks, using vllm 5. delete https://github.com/pytorch/ao/blob/main/torchao/_models/llama/ Test Plan: ``` // debug run on small model with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh facebook/opt-125m // real run with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 39c1d72 ghstack-comment-id: 3618394399 Pull-Request: #3449
Summary: Creates a standalone eval script for generating accuracy metrics for quantization README.md, based on the HuggingFace model definition of LLaMa 3.1 8B Why new script? 1. the current `prod` script in https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now 2. we have HummingBird scripts in https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases, but they seem pretty verbose and hard to use/modify 3. we have https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py, I copy-pasted and modified this for the current PR. The script above didn't work as is for various reasons, and also seemed to be hard to use/modify, for main README.md it's important to have a very simple standalone script. We should probably do a pass on the naming before landing. Future work: 1. add metrics for `int4_weight_only_hqq` (need to run on A100) 2. add metrics for `mxfp8` and `nvfp4` (need to run on B200) 3. make the parsing of logs automated 4. also add a similar script for performance benchmarks, using vllm 5. delete https://github.com/pytorch/ao/blob/main/torchao/_models/llama/ Test Plan: ``` // debug run on small model with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh facebook/opt-125m // real run with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 174b317 ghstack-comment-id: 3618394399 Pull-Request: #3449
| # note: | ||
| # * `int4_groupwise_hqq_weight_float8_rowwise_activation` doesn't work with dtype_map auto: https://gist.github.com/vkuzo/6b128681b628744d445c553cdeac8a85 | ||
| # * `int4_groupwise_hqq_weight_only` only works on A100 | ||
| for quant_recipe in float8_rowwise int4_groupwise_weight_float8_rowwise_activation int4_groupwise_hqq_weight_only int8_rowwise_weight_only int8_rowwise; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: int4_groupwise_weight_float8_rowwise_activation --> float8_rowwise_activation_int4_groupwise_weight to match the config name order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm matching the order in https://github.com/pytorch/ao/tree/main?tab=readme-ov-file#stable-workflows
overall if we want to standardize this everywhere sounds reasonable, IMO let's do that in a separate "rename-only" PR?
Summary:
Creates a standalone eval script for generating accuracy metrics for
quantization README.md, based on the HuggingFace model definition of
LLaMa 3.1 8B
Why new script?
prodscript inhttps://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py
uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now
https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases,
but they seem pretty verbose and hard to use/modify
https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py,
I copy-pasted and modified this for the current PR. The script above
didn't work as is for various reasons, and also seemed to be hard to
use/modify, for main README.md it's important to have a very simple
standalone script.
We should probably do a pass on the naming before landing.
Future work:
int4_weight_only_hqq(need to run on A100)mxfp8andnvfp4(need to run on B200)Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: