ONNX support to run the QualT5 models for inference on CPU#1
ONNX support to run the QualT5 models for inference on CPU#1sheineking wants to merge 4 commits intoterrierteam:mainfrom
Conversation
|
Great idea, thank you. Presumably this is faster on CPU than normal inference? Do you have timings on the same 4000 docs, say? Can we make the onnx dependencies optional? We normally do that with pyproject.toml. Do you need an example? |
|
Thanks a lot for the quick response. Yes, it is faster on CPU. I have the timings only in the context of the OWS pipeline Resilipipe, building on the module that Ariane developed. This comes with the following limitations:
The different setups I tried are:
I will do a separate comparison and update the results. And yes, I will add a pyproject.toml to make the onnx dependencies optional. |
|
I added the dependencies as an option to the setup.py and removed the import in the init.py to make it optional. |
|
Thanks. Given the context, I'll let @seanmacavaney give a final review. |
|
And here are the updated times when applying only
|
- Support ONNX session options - Move expensive imports to model export - Create separate package for ONNX code
On a sample of 4,000 web documents, the maximum absolute deviation in quality score between the original qt5-tiny and the ONNX-version was 0.01.