Skip to content

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 5, 2025

Summary:

Creates a standalone eval script for generating accuracy metrics for
quantization README.md, based on the HuggingFace model definition of
LLaMa 3.1 8B

Why new script?

  1. the current prod script in
    https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py
    uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now
  2. we have HummingBird scripts in
    https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases,
    but they seem pretty verbose and hard to use/modify
  3. we have
    https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py,
    I copy-pasted and modified this for the current PR. The script above
    didn't work as is for various reasons, and also seemed to be hard to
    use/modify, for main README.md it's important to have a very simple
    standalone script.

We should probably do a pass on the naming before landing.

Future work:

  1. add metrics for int4_weight_only_hqq (need to run on A100)
  2. add metrics for 'int4 weight float8 activation' (currently doesn't work with HF accelerate)
  3. add metrics for mxfp8 and nvfp4 (need to run on B200)
  4. make the parsing of logs automated
  5. also add a similar script for performance benchmarks, using vllm
  6. delete https://github.com/pytorch/ao/blob/main/torchao/_models/llama/

Test Plan:

// debug run on small model
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh facebook/opt-125m

// real run
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 5, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3449

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1581808 with merge base 69ce0fd (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo added a commit that referenced this pull request Dec 5, 2025
Summary:

Creates a standalone eval script for generating accuracy metrics for
quantization README.md, based on the HuggingFace model definition of
LLaMa 3.1 8B

Why new script?
1. the current `prod` script in
   https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py
   uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now
2. we have HummingBird scripts in
   https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases,
   but they seem pretty verbose and hard to use/modify
3. we have
   https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py,
   I copy-pasted and modified this for the current PR. The script above
   didn't work as is for various reasons, and also seemed to be hard to
   use/modify, for main README.md it's important to have a very simple
   standalone script.

We should probably do a pass on the naming before landing.

Future work:
1. add metrics for `int4_weight_only_hqq` (need to run on A100)
2. add metrics for `mxfp8` and `nvfp4` (need to run on B200)
3. make the parsing of logs automated
4. also add a similar script for performance benchmarks, using vllm
5. delete https://github.com/pytorch/ao/blob/main/torchao/_models/llama/

Test Plan:

```
// debug run on small model
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh facebook/opt-125m

// real run
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 39c1d72
ghstack-comment-id: 3618394399
Pull-Request: #3449
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 5, 2025
@vkuzo vkuzo added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Dec 8, 2025
[ghstack-poisoned]
vkuzo added a commit that referenced this pull request Dec 8, 2025
Summary:

Creates a standalone eval script for generating accuracy metrics for
quantization README.md, based on the HuggingFace model definition of
LLaMa 3.1 8B

Why new script?
1. the current `prod` script in
   https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py
   uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now
2. we have HummingBird scripts in
   https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases,
   but they seem pretty verbose and hard to use/modify
3. we have
   https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py,
   I copy-pasted and modified this for the current PR. The script above
   didn't work as is for various reasons, and also seemed to be hard to
   use/modify, for main README.md it's important to have a very simple
   standalone script.

We should probably do a pass on the naming before landing.

Future work:
1. add metrics for `int4_weight_only_hqq` (need to run on A100)
2. add metrics for `mxfp8` and `nvfp4` (need to run on B200)
3. make the parsing of logs automated
4. also add a similar script for performance benchmarks, using vllm
5. delete https://github.com/pytorch/ao/blob/main/torchao/_models/llama/

Test Plan:

```
// debug run on small model
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh facebook/opt-125m

// real run
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 174b317
ghstack-comment-id: 3618394399
Pull-Request: #3449
# note:
# * `int4_groupwise_hqq_weight_float8_rowwise_activation` doesn't work with dtype_map auto: https://gist.github.com/vkuzo/6b128681b628744d445c553cdeac8a85
# * `int4_groupwise_hqq_weight_only` only works on A100
for quant_recipe in float8_rowwise int4_groupwise_weight_float8_rowwise_activation int4_groupwise_hqq_weight_only int8_rowwise_weight_only int8_rowwise; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: int4_groupwise_weight_float8_rowwise_activation --> float8_rowwise_activation_int4_groupwise_weight to match the config name order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm matching the order in https://github.com/pytorch/ao/tree/main?tab=readme-ov-file#stable-workflows

overall if we want to standardize this everywhere sounds reasonable, IMO let's do that in a separate "rename-only" PR?

@jerryzh168 jerryzh168 requested a review from jainapurva December 8, 2025 18:52
vkuzo added a commit that referenced this pull request Dec 8, 2025
Summary:

#3449 is a newer version of these
which uses the HuggingFace model definition.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 9d85193
ghstack-comment-id: 3628761600
Pull-Request: #3466
vkuzo added a commit that referenced this pull request Dec 8, 2025
Summary:

#3449 is a newer version of these
which uses the HuggingFace model definition.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 0ad33cb
ghstack-comment-id: 3628761600
Pull-Request: #3466
@vkuzo vkuzo merged commit 7b65989 into main Dec 8, 2025
56 checks passed
vkuzo added a commit that referenced this pull request Dec 8, 2025
vkuzo added a commit that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants