Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.1 by renovate[bot] · Pull Request #35 · bjorncs/vespa

renovate · 2023-07-06T08:36:18Z

ℹ️ Note

This PR body was truncated due to platform limits.

This PR contains the following updates:

Package	Change	Age	Confidence
com.microsoft.onnxruntime:onnxruntime (source)	`1.13.1` → `1.24.1`

Release Notes

microsoft/onnxruntime (com.microsoft.onnxruntime:onnxruntime)

`v1.24.1`: ONNX Runtime v1.24.1

Compare Source

📢 Announcements & Breaking Changes

Platform Support Changes

Python 3.10 wheels are no longer published — Please upgrade to Python 3.11+
Python 3.14 support added
Free-threaded Python (PEP 703) — Added support for Python 3.13t and 3.14t in Linux (#26786)
x86_64 binaries for macOS/iOS are no longer provided and minimum macOS is raised to 14.0

API Version

ORT_API_VERSION updated to 24 (#26418)

✨ New Features

🤖 Execution Provider (EP) Plugin API

A major infrastructure enhancement enabling plugin-based EPs with dynamic loading:

Initial kernel-based EP support (#26206)
Weight pre-packing support for plugin EPs (#26754)
EP Context model support (#25124)
Control flow kernel APIs (#26927)
OrtKernelInfo APIs for kernel-based plugin EPs (#26803)

🔧 Core APIs

OrtApi::CreateEnvWithOptions() and OrtEpApi::GetEnvConfigEntries() (#26971)
EP Device Compatibility APIs (#26922)
External Resource Importer API for D3D12 shared resources (#26828)
Session config access from KernelInfo (#26589)

📊 Dependencies & Integration

ONNX upgraded to 1.20.1 (#26579)
Protobuf updated from 3.20.3 → 4.25.8 (#26910)
CUDA Graph enabled by default (#26929)

🖥️ Execution Provider Updates

NVIDIA

CUDA EP: Flash Attention updates, GQA kernel fusion, BF16 support for MoE/qMoE/MatMulNBits, CUDA 13.0 support
TensorRT EP: Upgraded to TensorRT 10.14, automatic plugin loading, NVFP4 custom ops
TensorRT RTX EP: RTX runtime caching, CUDA graph support, BFloat16, memory-mapped engines

Qualcomm QNN EP

QNN SDK upgraded to 2.42.0 with new ops (RMSNorm, ScatterElements, GatherND, STFT, RandomUniformLike)
Gelu pattern fusion, LPBQ quantization support, ARM64 wheel builds, v81 device support

Intel & AMD

OpenVINO EP: Upgraded to 2025.4.1
VitisAI EP: External EP loader, compiled model compatibility API
MIGraphX EP: QuickGelu, multihead attention, QLinear pooling ops

ArmNN EP

Arm is formally deprecating the Arm NN Execution Provider (EP) in ONNX Runtime. The Arm NN EP is still experimental and depends on technology that is no longer actively maintained. Keeping it available now only adds complexity and potential confusion for users.

What to expect:

Effective immediately, the Arm NN EP is deprecated and will no longer be maintained
All build options, documentation, and examples referencing ArmNN will be removed once the upstream change merges; the removal will appear in the first ONNX Runtime release that includes that change. We will confirm the release number as soon as it is known
Builds that still rely on Arm NN-specific options (for example --use_armnn) will fail after the change lands, so please adjust configurations in advance

🌐 Web & JavaScript

WebGPU EP: Flash Attention optimizations, graph capture, Split-K MatMul, qMoE support, WGSL templates
WebNN EP: GQA local attention, GatherBlockQuantized, ConvInteger/MatMulInteger
Node.js/React Native: Node.js v22, JSI for React Native, JSPI build support

🧠 CPU Improvements

KleidiAI: SME1/SME2 Convolution and SGemm kernels, FP32 Gemv, Windows/Arm support
New ops: MoE/qMoE kernels, RotaryEmbeddings opset 23, LayerNorm/RMSNorm broadcasting
Platform support: S390x SIMD, LoongArch64 4-bit quantization, FP16 inference improvements
ARM NCHWc layout support: NCHWc layout support for potential performance improvement of Conv models. Needs building from source with --enable_arm_neon_nchwc to enable this feature (#25580 #26838 #26691 #26171). This feature may be turned ON by default in a future release based on community feedback.
ARM perf improvements: Dedicated depthwise conv kernel (#26688) and SiLU activation perf improvement (#26753)

🔌 Language Bindings

C#

.NET 9.0 MAUI targets (#26463)

Python

add_external_initializers_from_files (#26012)

Java

Auto EP and compile model support (#25131)
OrtCompiledModelCompatibility (#26028)

🐛 Bug Fixes

Critical Fixes

DoS vulnerability in FuseReluClip (#26878)
Security issue loading arbitrary files as external data (#26776)
Memory leak fix for KernelContext_GetAllocator (#26883)
Local Attention off-by-1 bug (#25927)

EP-Specific Fixes

[QNN] Clip op with min/max from QDQ (#26601)
[CoreML] Gather fp16 support (#26442)

🙏 Contributors

Thanks to our 170 contributors for this release!

@fs-eire, @tianleiwu, @edgchen1, @qjia7, @yuslepukhin, @hariharans29, @Honry, @qti-yuduo, @adrianlizarraga, @snnn, @eserscor, @vraspar, @xiaofeihan1, @guschmue, @daijh, @quic-muchhsu, @qti-jkilpatrick, @tirupath-qti, @Jiawei-Shao, @qti-hungjuiw, @quic-ashwshan, @titaiwangms, @qti-mattsinc, @chilo-ms, @jchen10, @xhcao, @skottmckay, @quic-calvnguy, @JonathanC-ARM, @Rohanjames1997, @sushraja-msft, @jambayk, @adrastogi, @xenova, @quic-tirupath, @justinchuby, @HectorSVC, @kunal-vaishnavi, @wenqinI, @prathikr, @baijumeswani, @preetha-intel, @jatinwadhwa921, @umangb-09, @qti-ashwshan, @carzh, @bachelor-dou, @ranjitshs, @gedoensmax, @xadupre, @nenad1002, @TedThemistokleous, @keshavv27, @zpye, @jnagi-intel, @jiafatom, @mingyueliuh, @Colm-in-Arm, @borg323, @chunghow-qti, @Craigacp, @BODAPATIMAHESH, @AlekseiNikiforovIBM, @hans00, @thevishalagarwal, @MaanavD, @qti-kromero, @damdoo01-arm, @BoarQing, @naomiOvad, @yuhuchua-qti, @hadiFute, @vishalpandya1990, @rivkastroh, @minfhong-qti, @kuanyul-qti, @xieofxie, @ankitm3k, @RyanMetcalfeInt8, @MayureshV1, @bopeng1234, @vthaniel, @mdvoretc-intel, @ericcraw, @javier-intel, @saurabhkale17, @sfatimar, @Kotomi-Du, @intbf, @n1harika, @TejalKhade28, @gupta-pallavi, @cbourjau, @nieubank, @r-devulap, @wszqkzqk, @sanketkaleoss, @amancini-N, @fanchenkong1, @meakbiyik, @hisham-hchowdhu, @shaoboyan091, @Stonesjtu, @qwu16, @wangw-1991, @bonktree, @naetherm, @nikhilfujitsu, @Panxuefeng-loongson, @selenayang888, @moyo1997, @chwarr, @patryk-kaiser-ARM, @fdwr, @SavaLione, @shiyi9801, @mcost45, @aciddelgado, @prudhvi-qti, @Jonahcb, @lifang-zhang, @zhaoxul-qti, @gaugarg-nv, @cocotdf, @WangFengtu1996, @orlmon01, @weidu-tpvision, @theHamsta, @kevinch-nv, @XXXXRT666, @movedancer, @melkap01-Arm, @KingSora, @urpetkov-amd, @junchao-loongson, @jixiongdeng, @wcy123, @GrigoryEvko, @anujj, @peishenyan, @quic-ankus, @jchen351, @yihonglyu, @satyajandhyala, @co63oc, @mschofie, @quic-ashigarg, @asoldano, @nproshun, @jiangzhaoming, @seungtaek94, @liqunfu, @jaholme, @hanbitmyths, @quic-boyuc, @rM-planet, @qti-vaiskv, @AndreyOrb, @pkubaj, @xhan65, @Jaswanth51, @quic-hungjuiw, @jywu-msft, @mklimenk, @derdeljan-msft, @ianfhunter, @NingW101, @feich-ms, @Akupadhye, @wschin

Full Changelog: v1.23.2...rel-1.24.1

`v1.23.2`: ONNX Runtime v1.23.2

Compare Source

`v1.23.1`: ONNX Runtime v1.23.1

Compare Source

What's Changed

Fix Attention GQA implementation on CPU (#25966)
Address edge GetMemInfo edge cases (#26021)
Implement new Python APIs (#25999)
MemcpyFromHost and MemcpyToHost support for plugin EPs (#26088)
[TRT RTX EP] Fix bug for generating the correct subgraph in GetCapability (#26132)
add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart (#25590)
[build] fix WebAssembly build on macOS/arm64 (#25653)
[CPU] MoE Kernel (#25958)
[CPU] Block-wise QMoE kernel for CPU (#26009)
[C#] Implement missing APIs (#26101)
Regenerate test model with ONNX IR < 12 (#26149)
[CPU] Fix compilation errors because of unused variables (#26147)
[EP ABI] Check if nodes specified in GetCapability() have already been assigned (#26156)
[QNN EP] Add dynamic option to set HTP performance mode (#26135)

Full Changelog: microsoft/onnxruntime@v1.23.0...v1.23.1

`v1.23.0`: ONNX Runtime v1.23.0

Compare Source

Announcements

This release introduces Execution Provider (EP) Plugin API, which is a new infrastructure for building plugin-based EPs. (#24887 , #25137, #25124, #25147, #25127, #25159, #25191, #2524)
This release introduces the ability to dynamically download and install execution providers. This feature is exclusively available in the WinML build and requires Windows 11 version 25H2 or later. To leverage this new capability, C/C++/C# users should use the builds distributed through the Windows App SDK, and Python users should install the onnxruntime-winml package(will be published soon). We encourage users who can upgrade to the latest Windows 11 to utilize the WinML build to take advantage of this enhancement.

Upcoming Changes

The next release will stop providing x86_64 binaries for macOS and iOS operating systems.
The next release will increase the minimum supported macOS version from 13.4 to 14.0.
The next release will stop providing python 3.10 wheels.

Execution & Core Optimizations

Shutdown logic on Windows is simplified

Now on Windows some global object will be not destroyed if we detect that the process is being shutting down(#24891) . It will not cause memory leak as when a process ends all the memory will be returned to the operating system. This change can reduce the chance of having crashes on process exit.

AutoEP/Device Management

Now ONNX Runtime has the ability to automatically discovery computing devices and select the best EPs to download and register. The EP downloading feature currently only works on Windows 11 version 25H2 or later.

Execution Provider (EP) Updates

ROCM EP was removed from the source tree. Users are recommended to use Migraphx or Vitis AI EPs from AMD.
A new EP, Nvidia TensorRT RTX, was added.

Web

EMDSK is upgraded from 4.0.4 to 4.0.8

WebGPU EP

Added WGSL template support.

QNN EP

SDK Update: Added support for QNN SDK 2.37.

KleidiAI

Enhanced performance for SGEMM, IGEMM, and Dynamic Quantized MatMul operations, especially for Conv2D operators on hardware that supports SME2 (Scalable Matrix Extension v2).

Known Problems

There was a change in build.py that was related to KleidiAI that may cause build failures when doing cross-compiling (#26175) .

Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:

@1duo, @Akupadhye, @amarin16, @AndreyOrb, @ankan-ban, @ankitm3k, @anujj, @aparmp-quic, @arnej27959, @bachelor-dou, @benjamin-hodgson, @Bonoy0328, @chenweng-quic, @chuteng-quic, @clementperon, @co63oc, @daijh, @damdoo01-arm, @danyue333, @fanchenkong1, @gedoensmax, @genarks, @gnedanur, @Honry, @huaychou, @ianfhunter, @ishwar-raut1, @jing-bao, @joeyearsley, @johnpaultaken, @jordanozang, @JulienMaille, @keshavv27, @kevinch-nv, @khoover, @krahenbuhl, @kuanyul-quic, @mauriciocm9, @mc-nv, @minfhong-quic, @mingyueliuh, @MQ-mengqing, @NingW101, @notken12, @omarhass47, @peishenyan, @pkubaj, @qc-tbhardwa, @qti-jkilpatrick, @qti-yuduo, @quic-ankus, @quic-ashigarg, @quic-ashwshan, @quic-calvnguy, @quic-hungjuiw, @quic-tirupath, @qwu16, @ranjitshs, @saurabhkale17, @schuermans-slx, @sfatimar, @stefantalpalaru, @sunnyshu-intel, @TedThemistokleous, @thevishalagarwal, @toothache, @umangb-09, @vatlark, @VishalX, @wcy123, @xhcao, @xuke537, @zhaoxul-qti

`v1.22.0`: ONNX Runtime v1.22

Compare Source

Announcements

This release introduces new API's for Model Editor, Auto EP infrastructure, and AOT Compile
OnnxRuntime GPU packages require CUDA 12.x , packages built for CUDA 11.x are no longer published.
The min supported Windows version is now 10.0.19041.

GenAI & Advanced Model Features

Constrained Decoding: Introduced new capabilities for constrained decoding, offering more control over generative AI model outputs.

Execution & Core Optimizations

Core

Auto EP Selection Infrastructure: Added foundational infrastructure to enable automatic selection of Execution Providers via selection policies, aiming to simplify configuration and optimize performance. (Pull Request #24430)
Compile API: Introduced new APIs to support explicit compilation of ONNX models.
- See: OrtCompileApi Struct Reference (Assuming a similar link structure for future documentation)
- See: EP Context Design (Assuming a similar link structure for future documentation)
Model Editor API api's for creating or editing ONNX models
- See: OrtModelEditorApi

Execution Provider (EP) Updates

CPU EP/MLAS

KleidiAI Integration: Integrated KleidiAI into ONNX Runtime/MLAS for enhanced performance on Arm architectures.
MatMulNBits Support: Added support for MatMulNBits, enabling matrix multiplication with weights quantized to 8 bits.
GroupQueryAttention optimizations and enhancements

OpenVINO EP

Added support up to OpenVINO 2025.1
Introduced Intel compiler level optimizations for QDQ models.
Added support to select Intel devices based on LUID
Load_config feature improvement to support AUTO, HETERO and MULTI plugin.
misc bugfixes/optimizations
For detailed updates, refer to Pull Request #24394: ONNXRuntime OpenVINO - Release 1.22

QNN EP

SDK Update: Added support for QNN SDK 2.33.2.
operator updates/support to Sum, Softmax, Upsample, Expand, ScatterND, Einsum
QNN EP can be built as shared or static library.
enable QnnGpu backend
For detailed updates refer to recent QNN tagged PR's

TensorRT EP

TensorRT Version: Added support for TensorRT 10.9.
- Note for onnx-tensorrt open-source parser users: Please check here for specific requirements (Referencing 1.21 link as a placeholder, this should be updated for 1.22).
New Features:
- EP option to enable TRT Preview Feature
- Support to load TensorRT V3 plugin
Bug Fixes:
- Resolved an issue related to multithreading scenarios.
- Fixed incorrect GPU usage that affected both TensorRT EP and CUDA EP.

NV TensorRT RTX EP

New Execution Provider: Introduced a new Execution Provider specifically for Nvidia RTX GPUs, leveraging TensorRT for optimized performance.

CUDA EP

MatMulNBits Enhancement: Added support for 8-bit weight-only quantization in MatMulNBits.
Bug Fixes:
- Fixed incorrect GPU usage (also mentioned under TensorRT EP).

VitisAI EP

Miscellaneous bug fixes and improvements.

Infrastructure & Build Improvements

Build System & Packages

QNN Nuget Package: The QNN Nuget package is now built as ARM64x.

Dependencies / Version Updates

CUDA Version Update: This release includes an update to the CUDA version. Users should consult the documentation for specific version requirements. CUDA 11 based GPU packages no longer published.

Web

WebGPU Expansion:
- Added WebGPU support to the node.js package (Windows and macOS).
- Enabled WebGPU when building from source for macOS, Linux, and Windows.

Mobile

No major updates of note this release.

Contributions

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:

Yulong Wang, Jian Chen, Changming Sun, Satya Kumar Jandhyala, Hector Li, Prathik Rao, Adrian Lizarraga, Jiajia Qin, Scott McKay, Jie Chen, Tianlei Wu, Edward Chen, Wanming Lin, xhcao, vraspar, Dmitri Smirnov, Jing Fang, Yifan Li, Caroline Zhu, Jianhui Dai, Chi Lo, Guenther Schmuelling, Ryan Hill, Sushanth Rajasankar, Yi-Hong Lyu, Ankit Maheshkar, Artur Wojcik, Baiju Meswani, David Fan, Enrico Galli, Hans, Jambay Kinley, John Paul, Peishen Yan, Yateng Hong, amarin16, chuteng-quic, kunal-vaishnavi, quic-hungjuiw, Alessio Soldano, Andreas Hussing, Ashish Garg, Ashwath Shankarnarayan, Chengdong Liang, Clément Péron, Erick Muñoz, Fanchen Kong, George Wu, Haik Silm, Jagadish Krishnamoorthy, Justin Chu, Karim Vadsariya, Kevin Chen, Mark Schofield, Masaya, Kato, Michael Tyler, Nenad Banfic, Ningxin Hu, Praveen G, Preetha Veeramalai, Ranjit Ranjan, Seungtaek Kim, Ti-Tai Wang, Xiaofei Han, Yueqing Zhang, co63oc, derdeljan-msft, genmingz@AMD, jiangzhaoming, jing-bao, kuanyul-quic, liqun Fu, minfhong-quic, mingyue, quic-tirupath, quic-zhaoxul, saurabh, selenayang888, sfatimar, sheetalarkadam, virajwad, zz002, Ștefan Talpalaru

`v1.21.1`: ONNX Runtime v1.21.1

Compare Source

What's new?

Extend CMAKE_CUDA_FLAGS with all Blackwell compute capacity #23928 - @yf711
[ARM CPU] Fix fp16 const initialization on no-fp16 platform #23978 - @fajin-corp
[TensorRT EP] Call cudaSetDevice at compute function for handling multithreading scenario #24010 - @chilo-ms
Fix attention bias broadcast #24017 - @tianleiwu
Deleted the constant SKIP_CUDA_TEST_WITH_DML #24113 - @CodingSeaotter
[QNN EP] ARM64EC python package remove --vcpkg in build #24174 - @jywu-msft
[wasm] remove --vcpkg in wasm build #24179 - @fs-eire

`v1.21.0`: ONNX Runtime v1.21.0

Compare Source

Announcements

No large announcements of note this release! We've made a lot of small refinements to streamline your ONNX Runtime experience.

GenAI & Advanced Model Features

Enhanced Decoding & Pipeline Support

Added "chat mode" support for CPU, GPU, and WebGPU.
Provided support for decoder model pipelines.
Added support for Java API for MultiLoRA.

API & Compatibility Updates

Chat mode introduced breaking changes in the API (see migration guide).

Bug Fixes for Model Output

Fixed Phi series garbage output issues with long prompts.
Resolved gibberish issues with top_k on CPU.

Execution & Core Optimizations

Core Refinements

Reduced default logger usage for improved efficiency(#23030).
Fixed a visibility issue in theadpool (#23098).

Execution Provider (EP) Updates

General

Removed TVM EP from the source tree(#22827).
Marked NNAPI EP for deprecation (following Google's deprecation of NNAPI).
Fixed a DLL delay loading issue that impacts WebGPU EP and DirectML EP's usability on Windows (#23111, #23227)

TensorRT EP Improvements

Added support for TensorRT 10.8.
- onnx-tensorrt open-source parser user: please check here for requirement.
Assigned DDS ops (NMS, RoiAlign, NonZero) to TensorRT by default.
Introduced option trt_op_types_to_exclude to exclude specific ops from TensorRT assignment.

CUDA EP Improvements

Added a python API preload_dlls to coexist with PyTorch.
Miscellaneous enhancements for Flux model inference.

QNN EP Improvements

Introduced QNN shared memory support.
Improved performance for AI Hub models.
Added support for QAIRT/QNN SDK 2.31.
Added Python 3.13 package.
Miscellaneous bug fixes and enhancements.
QNN EP is now built as a shared library/DLL by default. To retain previous build behavior, use build option --use_qnn static_lib.

DirectML EP Support & Upgrades

Updated DirectML version from 1.15.2 to 1.15.4(#22635).

OpenVINO EP Improvements

Introduced OpenVINO EP Weights Sharing feature.
Added support for various contrib Ops in OVEP:
- SkipLayerNormalization, MatMulNBits, FusedGemm, FusedConv, EmbedLayerNormalization, BiasGelu, Attention, DynamicQuantizeMatMul, FusedMatMul, QuickGelu, SkipSimplifiedLayerNormalization
Miscellaneous bug fixes and improvements.

VitisAI EP Improvements

Miscellaneous bug fixes and improvements.

Mobile Platform Enhancements

CoreML Updates

Added support for caching generated CoreML models.

Extensions & Tokenizer Improvements

Expanded Tokenizer Support

Now supports more tokenizer models, including ChatGLM, Baichuan2, Phi-4, etc.
Added full Phi-4 pre/post-processing support for text, vision, and audio.
Introduced RegEx pattern loading from tokenizer.json.

Image Codec Enhancements

ImageCodec now links to native APIs if available; otherwise, falls back to built-in libraries.

Unified Tokenizer API

Introduced a new tokenizer op schema to unify the tokenizer codebase.
Added support for loading tokenizer data from a memory blob in the C API.

Infrastructure & Build Improvements

Runtime Requirements

All the prebuilt Windows packages now require VC++ Runtime version >= 14.40(instead of 14.38). If your VC++ runtime version is lower than that, you may see a crash when ONNX Runtime was initializing. See https://github.com/microsoft/STL/wiki/Changelog#vs-2022-1710 for more details.

Updated minimum iOS and Android SDK requirements to align with React Native 0.76:

iOS >= 15.1
Android API >= 24 (Android 7)

All macOS packages now require macOS version >= 13.3.

CMake File Changes

CMake Version: Increased the minimum required CMake version from 3.26 to 3.28. Added support for CMake 4.0.
Python Version: Increased the minimum required Python version from 3.8 to 3.10 for building ONNX Runtime from source.
Improved VCPKG support

Added the following cmake options for WebGPU EP

onnxruntime_USE_EXTERNAL_DAWN
onnxruntime_CUSTOM_DAWN_SRC_PATH
onnxruntime_BUILD_DAWN_MONOLITHIC_LIBRARY
onnxruntime_ENABLE_PIX_FOR_WEBGPU_EP
onnxruntime_ENABLE_DAWN_BACKEND_VULKAN
onnxruntime_ENABLE_DAWN_BACKEND_D3D12

Added cmake option onnxruntime_BUILD_QNN_EP_STATIC_LIB for building with QNN EP as a static library.
Removed cmake option onnxruntime_USE_PREINSTALLED_EIGEN.

Fixed a build issue with Visual Studio 2022 17.3 (#23911)

Modernized Build Tools

Now using VCPKG for most package builds.
Upgraded Gradle from 7.x to 8.x.
Updated JDK from 11 to 17.
Enabled onnxruntime_USE_CUDA_NHWC_OPS by default for CUDA builds.
Added support for WASM64 (build from source; no package published).

Dependency Cleanup

Removed Google’s nsync from dependencies.

Others

Updated Node.js installation script to support network proxy usage (#23231)

Web

No updates of note.

Contributors

Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:

Changming Sun, Yulong Wang, Tianlei Wu, Jian Chen, Wanming Lin, Adrian Lizarraga, Hector Li, Jiajia Qin, Yifan Li, Edward Chen, Prathik Rao, Jing Fang, shiyi, Vincent Wang, Yi Zhang, Dmitri Smirnov, Satya Kumar Jandhyala, Caroline Zhu, Chi Lo, Justin Chu, Scott McKay, Enrico Galli, Kyle, Ted Themistokleous, dtang317, wejoncy, Bin Miao, Jambay Kinley, Sushanth Rajasankar, Yueqing Zhang, amancini-N, ivberg, kunal-vaishnavi, liqun Fu, Corentin Maravat, Peishen Yan, Preetha Veeramalai, Ranjit Ranjan, Xavier Dupré, amarin16, jzm-intel, kailums, xhcao, A-Satti, Aleksei Nikiforov, Ankit Maheshkar, Javier Martinez, Jianhui Dai, Jie Chen, Jon Campbell, Karim Vadsariya, Michael Tyler, PARK DongHa, Patrice Vignola, Pranav Sharma, Sam Webster, Sophie Schoenmeyer, Ti-Tai Wang, Xu Xing, Yi-Hong Lyu, genmingz@AMD, junchao-zhao, sheetalarkadam, sushraja-msft, Akshay Sonawane, Alexis Tsogias, Ashrit Shetty, Bilyana Indzheva, Chen Feiyue, Christian Larson, David Fan, David Hotham, Dmitry Deshevoy, Frank Dong, Gavin Kinsey, George Wu, Grégoire, Guenther Schmuelling, Indy Zhu, Jean-Michaël Celerier, Jeff Daily, Joshua Lochner, Kee, Malik Shahzad Muzaffar, Matthieu Darbois, Michael Cho, Michael Sharp, Misha Chornyi, Po-Wei (Vincent), Sevag H, Takeshi Watanabe, Wu, Junze, Xiang Zhang, Xiaoyu, Xinpeng Dou, Xinya Zhang, Yang Gu, Yateng Hong, mindest, mingyue, raoanag, saurabh, shaoboyan091, sstamenk, tianf-fff, wonchung-microsoft, xieofxie, zz002

`v1.20.0`: ONNX Runtime v1.20.0

Compare Source

Release Manager: @apsonawane

Announcements

All ONNX Runtime Training packages have been deprecated. ORT 1.19.2 was the last release for which onnxruntime-training (PyPI), onnxruntime-training-cpu (PyPI), Microsoft.ML.OnnxRuntime.Training (Nuget), onnxruntime-training-c (CocoaPods), onnxruntime-training-objc (CocoaPods), and onnxruntime-training-android (Maven Central) were published.
ONNX Runtime packages will stop supporting Python 3.8 and Python 3.9. This decision aligns with NumPy Python version support. To continue using ORT with Python 3.8 and Python 3.9, you can use ORT 1.19.2 and earlier.
ONNX Runtime 1.20 CUDA packages will include new dependencies that were not required in 1.19 packages. The following dependencies are new: libcudnn_adv.so.9, libcudnn_cnn.so.9, libcudnn_engines_precompiled.so.9, libcudnn_engines_runtime_compiled.so.9, libcudnn_graph.so.9, libcudnn_heuristic.so.9, libcudnn_ops.so.9, libnvrtc.so.12, and libz.so.1.

Build System & Packages

Python 3.13 support is included in PyPI packages.
ONNX 1.17 support will be delayed until a future release, but the ONNX version used by ONNX Runtime has been patched to include a shape inference change to the Einsum op.
DLLs in the Maven build are now digitally signed (fix for issue reported here).
(Experimental) vcpkg support added for the CPU EP. The DML EP does not yet support vcpkg, and other EPs have not been tested.

Core

MultiLoRA support.
Reduced memory utilization.
- Fixed alignment that was causing mmap to fail for external weights.
- Eliminated double allocations when deserializing external weights.
- Added ability to serialize pre-packed weights so that they don’t cause an increase in memory utilization when the model is loaded.
Support bfloat16 and float8 data types in python I/O binding API.

Performance

INT4 quantized embedding support on CPU and CUDA EPs.
Miscellaneous performance improvements and bug fixes.

EPs

CPU

FP16 support for MatMulNbits, Clip, and LayerNormalization ops.

CUDA

Cudnn frontend integration for convolution operators.
Added support of cuDNN Flash Attention and Lean Attention in MultiHeadAttention op.

TensorRT

TensorRT 10.4 and 10.5 support.

QNN

QNN HTP support for weight sharing across multiple ORT inference sessions. (See ORT QNN EP documentation for more information.)
Support for QNN SDK 2.27.

OpenVINO

Added support up to OpenVINO 2024.4.1.
Compile-time memory optimizations.
Enhancement of ORT EPContext Session option for optimized first inference latency.
Added remote tensors to ensure direct memory access for inferencing on NPU.

DirectML

DirectML 1.15.2 support.

Mobile

Improved Android QNN support, including a pre-built Maven package and various performance improvements.
FP16 support for ML Program models with CoreML EP.
FP16 XNNPACK kernels to provide a fallback option if CoreML is not available at runtime.
Initial support for using the native WebGPU EP on Android and iOS. _Note: The set of initial operators is limited, and the code is available from the main branch, not ORT 1.20 packages. See #22591 for more information.

Web

Quantized embedding support.
On-demand weight loading support (offloads Wasm32 heap and enables 8B-parameter LLMs).
Integrated Intel GPU performance improvements.
Opset-21 support (Reshape, Shape, Gelu).

GenAI

MultiLoRA support.
Generations can now be terminated mid-loop.
Logit soft capping support in Group Query Attention (GQA).
Additional model support, including Phi-3.5 Vision Multi-Frame, ChatGLM3, and Nemotron-Mini.
Python package now available for Mac.
Mac / iOS now available in NuGet packages.

Full release notes for ONNX Runtime generate() API v0.5.0 can be found here.

Extensions

Tokenization performance improvements.
Support for latest Hugging Face tokenization JSON format (transformers>=4.45).
Unigram tokenization model support.
OpenCV dependency removed from C API build.

Full release notes for ONNX Runtime Extensions v0.13 can be found here.

Olive

Olive command line interface (CLI) now available with support to execute well-defined, concrete workflows without the need to create or edit configs manually.
Additional improvements, including support for YAML-based workflow configs, streamlined DataConfig management, simplified workflow configuration, and more.
Llama and Phi-3 model updates, including an updated MultiLoRA example using the ORT generate() API.
Full release notes for Olive v0.7.0 can be found here.

Contributors

Big thank you to the release manager @apsonawane, as well as @snnn, @jchen351, @sheetalarkadam, and everyone else who made this release possible!

Tianlei Wu, Yi Zhang, Yulong Wang, Scott McKay, Edward Chen, Adrian Lizarraga, Wanming Lin, Changming Sun, Dmitri Smirnov, Jian Chen, Jiajia Qin, Jing Fang, George Wu, Caroline Zhu, Hector Li, Ted Themistokleous, mindest, Yang Gu, jingyanwangms, liqun Fu, Adam Pocock, Patrice Vignola, Yueqing Zhang, Prathik Rao, Satya Kumar Jandhyala, Sumit Agarwal, Xu Xing, aciddelgado, duanshengliu, Guenther Schmuelling, Kyle, Ranjit Ranjan, Sheil Kumar, Ye Wang, kunal-vaishnavi, mingyueliuh, xhcao, zz002, 0xdr3dd, Adam Reeve, Arne H Juul, Atanas Dimitrov, Chen Feiyue, Chester Liu, Chi Lo, Erick Muñoz, Frank Dong, Jake Mathern, Julius Tischbein, Justin Chu, Xavier Dupré, Yifan Li, amarin16, anujj, chenduan-amd, saurabh, sfatimar, sheetalarkadam, wejoncy, Akshay Sonawane, AlbertGuan9527, Bin Miao, Christian Bourjau, Claude, Clément Péron, Emmanuel, Enrico Galli, Fangjun Kuang, Hann Wang, Indy Zhu, Jagadish Krishnamoorthy, Javier Martinez, Jeff Daily, Justin Beavers, Kevin Chen, Krishna Bindumadhavan, Lennart Hannink, Luis E. P., Mauricio A Rovira Galvez, Michael Tyler, PARK DongHa, Peishen Yan, PeixuanZuo, Po-Wei (Vincent), Pranav Sharma, Preetha Veeramalai, Sophie Schoenmeyer, Vishnudas Thaniel S, Xiang Zhang, Yi-Hong Lyu, Yufeng Li, goldsteinn, mcollinswisc, mguynn-intc, mingmingtasd, raoanag, shiyi, stsokolo, vraspar, wangshuai09

Full changelog: v1.19.2...v1.20.0

`v1.19.2`: ONNX Runtime v1.19.2

Compare Source

Announcements

ORT 1.19.2 is a small patch release, fixing some broken workflows and introducing bug fixes.

Build System & Packages

Fixed the signing of native DLLs.
Disabled absl symbolize in Windows Release build to avoid dependency on dbghelp.dll.

Training

Restored support for CUDA compute capability 7.0 and 7.5 with CUDA 12, and 6.0 and 6.1 with CUDA 11.
Several fixes for training CI pipelines.

Mobile

Fixed ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node access for CoreML EP.

Generative AI

Added CUDA kernel for Phi3 MoE.
Added smooth softmax support in CUDA and CPU kernels for the GroupQueryAttention operator.
Fixed number of splits calculations in GroupQueryAttention CUDA operator.
Enabled causal support in the MultiHeadAttention CUDA operator.

Contributors

@prathikr, @mszhanyi, @edgchen1, @tianleiwu, @wangyems, @aciddelgado, @mindest, @snnn, @baijumeswani, @MaanavD

Thanks to everyone who helped ship this release smoothly!

Full Changelog: microsoft/onnxruntime@v1.19.0...v1.19.2

`v1.19.0`: ONNX Runtime v1.19.0

Compare Source

Announcements

Note that the wrong commit was initially tagged with v1.19.0. The final commit has since been correctly tagged: 26250ae. This shouldn't effect much, but sorry for the inconvenience!

Build System & Packages

Numpy support for 2.x has been added
Qualcomm SDK has been upgraded to 2.25
ONNX has been upgraded from 1.16 → 1.16.1
Default GPU packages use CUDA 12.x and Cudnn 9.x (previously CUDA 11.x/CuDNN 8.x) CUDA 11.x/CuDNN 8.x packages are moved to the aiinfra VS feed.
TensorRT 10.2 support added
Introduced Java CUDA 12 packages on Maven.
Discontinued support for Xamarin. (Xamarin reached EOL on May 1, 2024)
Discontinued support for macOS 11 and increasing the minimum supported macOS version to 12. (macOS 11 reached EOL in September 2023)
Discontinued support for iOS 12 and increasing the minimum supported iOS version to 13.

Core

Implemented DeformConv
Fixed big-endian and support build on AIX

Performance

Added QDQ support for INT4 quantization in CPU and CUDA Execution Providers
Implemented FlashAttention on CPU to improve performance for GenAI prompt cases
Improved INT4 performance on CPU (X64, ARM64) and NVIDIA GPUs

Execution Providers

TensorRT
- Updated to support TensorRT 10.2
- Remove calls to deprecated api’s
- Enable refittable embedded engine when ONNX model provided as byte stream
CUDA
- Upgraded cutlass to 3.5.0 for performance improvement of memory efficient attention.
- Updated MultiHeadAttention and Attention operators to be thread-safe.
- Added sdpa_kernel provider option to choose kernel for Scaled Dot-Product Attention.
- Expanded op support - Tile (bf16)
CPU
- Expanded op support - GroupQueryAttention, SparseAttention (for Phi-3 small)
QNN
- Updated to support QNN SDK 2.25
- Expanded op support - HardSigmoid, ConvTranspose 3d, Clip (int32 data), Matmul (int4 weights), Conv (int4 weights), prelu (fp16)
- Expanded fusion support – Conv + Clip/Relu fusion
OpenVINO
- Added support for OpenVINO 2024.3
- Support for enabling EpContext using session options
DirectML
- Updated DirectML from 1.14.1 → 1.15.1
- Updated ONNX opset from 17 → 20
- Opset 19 and Opset 20 are supported with known caveats:
  - Gridsample 20: 5d not supported
  - DeformConv not supported

Mobile

Additional CoreML ML Program operators were added
- See supported operators list here
Fixed packaging issue with macOS framework in onnxruntime-c cocoapod
Removed Xamarin support
- Xamarin EOL was May 1, 2024
- Xamarin official support policy | .NET (microsoft.com)

Web

Updated JavaScript packaging to align with best practices, including slight incompatibilities when apps bundle onnxruntime-web
Improved CPU operators coverage for WebNN (now supported by Chrome)

Training

No specific updates

GenAI

Support for new models Qwen, Llama 3.1, Gemma 2, phi3 small
Support to build quantized models with method AWQ and GPTQ
Performance improvements for Intel and Arm CPU
Packing and language binding
- Added Java bindings (build from source)
- Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
- Publish packages for Win Arm
- Support for Android (build from source)
Bug fixes, like the long prompt correctness issue for phi3.

Extensions

Added C

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

bjorncs force-pushed the master branch from 070eae9 to 6ebc257 Compare July 6, 2023 15:50

renovate bot force-pushed the renovate/onnxruntime.version branch from f8df994 to 78cd577 Compare July 6, 2023 15:59

renovate bot force-pushed the renovate/onnxruntime.version branch from 78cd577 to 7844bd6 Compare September 20, 2023 17:28

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.15.1~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.0 Sep 20, 2023

renovate bot force-pushed the renovate/onnxruntime.version branch from 7844bd6 to 7d887de Compare October 12, 2023 05:20

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.1 Oct 12, 2023

renovate bot force-pushed the renovate/onnxruntime.version branch from 7d887de to 12fd980 Compare November 10, 2023 08:15

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.1~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.2 Nov 10, 2023

renovate bot force-pushed the renovate/onnxruntime.version branch from 12fd980 to b49fce6 Compare November 21, 2023 23:50

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.2~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.3 Nov 21, 2023

renovate bot force-pushed the renovate/onnxruntime.version branch from b49fce6 to f4f9479 Compare February 4, 2024 15:01

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.16.3~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.17.0 Feb 4, 2024

renovate bot force-pushed the renovate/onnxruntime.version branch from f4f9479 to 64c33ef Compare March 1, 2024 23:44

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.17.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.17.1 Mar 1, 2024

renovate bot force-pushed the renovate/onnxruntime.version branch from 64c33ef to 7763049 Compare April 22, 2024 00:02

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.17.1~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.17.3 Apr 22, 2024

renovate bot force-pushed the renovate/onnxruntime.version branch from 7763049 to b83b0bf Compare May 23, 2024 21:01

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.17.3~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.18.0 May 23, 2024

renovate bot force-pushed the renovate/onnxruntime.version branch from b83b0bf to 82e667f Compare August 25, 2024 21:17

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.18.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.19.0 Aug 25, 2024

renovate bot force-pushed the renovate/onnxruntime.version branch from 82e667f to 3642954 Compare September 8, 2024 11:32

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.19.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.19.2 Sep 8, 2024

renovate bot force-pushed the renovate/onnxruntime.version branch from 3642954 to dcc0d72 Compare November 1, 2024 17:43

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.19.2~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.20.0 Nov 1, 2024

renovate bot force-pushed the renovate/onnxruntime.version branch from dcc0d72 to c63b039 Compare March 8, 2025 15:46

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.20.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.21.0 Mar 8, 2025

renovate bot force-pushed the renovate/onnxruntime.version branch from c63b039 to 4ed4331 Compare May 4, 2025 20:04

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.21.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.21.1 May 4, 2025

renovate bot force-pushed the renovate/onnxruntime.version branch from 4ed4331 to c90fcd2 Compare May 19, 2025 00:12

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.21.1~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.22.0 May 19, 2025

renovate bot force-pushed the renovate/onnxruntime.version branch from c90fcd2 to e22a391 Compare September 28, 2025 11:03

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.22.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.0 Sep 28, 2025

renovate bot force-pushed the renovate/onnxruntime.version branch from e22a391 to 5f1b66d Compare October 12, 2025 11:26

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.0~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.1 Oct 12, 2025

renovate bot force-pushed the renovate/onnxruntime.version branch from 5f1b66d to 05a7348 Compare November 20, 2025 00:03

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.1~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.2 Nov 20, 2025

Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.1

47d97dc

renovate bot force-pushed the renovate/onnxruntime.version branch from 05a7348 to 47d97dc Compare February 8, 2026 08:04

renovate bot changed the title ~~Update dependency com.microsoft.onnxruntime:onnxruntime to v1.23.2~~ Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.1 Feb 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.1#35

Update dependency com.microsoft.onnxruntime:onnxruntime to v1.24.1#35
renovate[bot] wants to merge 1 commit intomasterfrom
renovate/onnxruntime.version

renovate bot commented Jul 6, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

renovate bot commented Jul 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release Notes

v1.24.1: ONNX Runtime v1.24.1

📢 Announcements & Breaking Changes

Platform Support Changes

API Version

✨ New Features

🤖 Execution Provider (EP) Plugin API

🔧 Core APIs

📊 Dependencies & Integration

🖥️ Execution Provider Updates

NVIDIA

Qualcomm QNN EP

Intel & AMD

ArmNN EP

🌐 Web & JavaScript

🧠 CPU Improvements

🔌 Language Bindings

C#

Python

Java

🐛 Bug Fixes

Critical Fixes

EP-Specific Fixes

🙏 Contributors

v1.23.2: ONNX Runtime v1.23.2

v1.23.1: ONNX Runtime v1.23.1

What's Changed

v1.23.0: ONNX Runtime v1.23.0

Announcements

Upcoming Changes

Execution & Core Optimizations

Shutdown logic on Windows is simplified

AutoEP/Device Management

Execution Provider (EP) Updates

Web

WebGPU EP

QNN EP

KleidiAI

Known Problems

Contributions

v1.22.0: ONNX Runtime v1.22

Announcements

GenAI & Advanced Model Features

Execution & Core Optimizations

Core

Execution Provider (EP) Updates

CPU EP/MLAS

OpenVINO EP

QNN EP

TensorRT EP

NV TensorRT RTX EP

CUDA EP

VitisAI EP

Infrastructure & Build Improvements

Build System & Packages

Dependencies / Version Updates

Web

Mobile

Contributions

v1.21.1: ONNX Runtime v1.21.1

What's new?

v1.21.0: ONNX Runtime v1.21.0

Announcements

GenAI & Advanced Model Features

Enhanced Decoding & Pipeline Support

API & Compatibility Updates

Bug Fixes for Model Output

Execution & Core Optimizations

Core Refinements

Execution Provider (EP) Updates

General

TensorRT EP Improvements

CUDA EP Improvements

QNN EP Improvements

DirectML EP Support & Upgrades

OpenVINO EP Improvements

VitisAI EP Improvements

renovate bot commented Jul 6, 2023 •

edited

Loading

`v1.24.1`: ONNX Runtime v1.24.1

`v1.23.2`: ONNX Runtime v1.23.2

`v1.23.1`: ONNX Runtime v1.23.1

`v1.23.0`: ONNX Runtime v1.23.0

`v1.22.0`: ONNX Runtime v1.22

`v1.21.1`: ONNX Runtime v1.21.1

`v1.21.0`: ONNX Runtime v1.21.0

`v1.20.0`: ONNX Runtime v1.20.0

`v1.19.2`: ONNX Runtime v1.19.2

`v1.19.0`: ONNX Runtime v1.19.0