Skip to content

ONNX support to run the QualT5 models for inference on CPU#1

Open
sheineking wants to merge 4 commits intoterrierteam:mainfrom
sheineking:onnx_qualt5
Open

ONNX support to run the QualT5 models for inference on CPU#1
sheineking wants to merge 4 commits intoterrierteam:mainfrom
sheineking:onnx_qualt5

Conversation

@sheineking
Copy link
Copy Markdown

  • ONNXQualT5 class for inference on CPU
  • Add utility code for exporting (to new cache directory) and loading

On a sample of 4,000 web documents, the maximum absolute deviation in quality score between the original qt5-tiny and the ONNX-version was 0.01.

@cmacdonald
Copy link
Copy Markdown
Contributor

Great idea, thank you. Presumably this is faster on CPU than normal inference? Do you have timings on the same 4000 docs, say?

Can we make the onnx dependencies optional? We normally do that with pyproject.toml. Do you need an example?

@sheineking
Copy link
Copy Markdown
Author

Thanks a lot for the quick response. Yes, it is faster on CPU. I have the timings only in the context of the OWS pipeline Resilipipe, building on the module that Ariane developed. This comes with the following limitations:

  1. The timings are based on all processing steps in the pipeline
  2. Texts are processed not in batches but as individual elements

The different setups I tried are:

  • qt5-tiny on CPU: ~0.7 records / s (Tested only on 100 records)
  • qt5-tiny on GPU: ~55 records / s (Tested on all 4,000 records)
  • qt5-tiny with ONNX: ~50 records / s (Tested on all 4,000 records)

I will do a separate comparison and update the results.

And yes, I will add a pyproject.toml to make the onnx dependencies optional.

@sheineking
Copy link
Copy Markdown
Author

I added the dependencies as an option to the setup.py and removed the import in the init.py to make it optional.

@cmacdonald
Copy link
Copy Markdown
Contributor

Thanks. Given the context, I'll let @seanmacavaney give a final review.

@cmacdonald cmacdonald closed this Sep 19, 2025
@cmacdonald cmacdonald reopened this Sep 19, 2025
@sheineking
Copy link
Copy Markdown
Author

sheineking commented Sep 19, 2025

And here are the updated times when applying only QualT5.transform on batches of webpage text. The model is qt5-tiny for all three settings.

qmodel = QualT5("pyterrier-quality/qt5-tiny")
df = qmodel.transform(df)
  • GPU: 4,000 records in 2.08s
  • CPU: 400 records in 117.5s
  • ONNX: 400 records in 4.47s

- Support ONNX session options
- Move expensive imports to model export
- Create separate package for ONNX code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants