Skip to content

Releases: aws-neuron/aws-neuron-sdk

Neuron SDK Release - May 20, 2025

20 May 19:03
4eba310

Choose a tag to compare

With the Neuron 2.23 release, we move NxD Inference (NxDI) library out of beta. It is now recommended for all multi-chip inference use-cases. In addition, Neuron has new training capabilities, including Context Parallelism and ORPO, NKI improvements (new operators and ISA features), and new Neuron Profiler debugging and performance analysis optimizations. Finaly, Neuron now supports PyTorch 2.6 and JAX 0.5.3.

Inference: NxD Inference (NxDI) moves from beta to GA. NxDI now supports Persistent Cache to reduce compilation times, and optimizes model loading with improved weight sharding performance.

Training: NxD Training (NxDT) added Context Parallelism support (beta) for Llama models, enabling sequence lengths up to 32K. NxDT now supports model alignment, ORPO, using DPO-style datasets. NxDT has upgraded supports for 3rd party libraries, specifically: PyTorch Lightning 2.5, Transformers 4.48, and NeMo 2.1.

Neuron Kernel Interface (NKI): New support for 32-bit integer nki.language.add and nki.language.multiply on GPSIMD Engine. NKI.ISA improvements include range_select for Trainium2, fine-grained engine control, and enhanced tensor operations. New performance tuning API no_reorder has been added to enable user-scheduling of instructions. When combined with allocation, this enables software pipelining. Language consistency has been improved for arithmetic operators (+=, -=, /=, *=) across loop types, PSUM, and SBUF.

Neuron Profiler: Profiling performance has improved, allowing users to view profile results 5x times faster on average. New features include timeline-based error tracking and JSON error event reporting, supporting execution and OOB error detection. Additionally, this release improves multiprocess visualization with Perfetto.

Neuron Monitoring: Added Kubernetes context information (pod_name, namespace, and container_name) to neuron monitor prometheus output, enabling resource utilization tracking by pod, namespace, and container.

Neuron DLCs: This release updates containers with PyTorch 2.6 support for inference and training. For JAX DLC, this release adds JAX 0.5.0 training support.

Neuron DLAMIs: This release updates MultiFramework AMIs to include PyTorch 2.6, JAX 0.5, and TensorFlow 2.10 and Single Framework AMIs for PyTorch 2.6 and JAX 0.5.

Neuron SDK Release - May 12, 2025

12 May 23:10
c3ac28c

Choose a tag to compare

Neuron 2.22.1 release includes a Neuron Driver update that resolves DMA abort errors on Trainium2 devices. These errors were previously occurring in the Neuron Runtime during specific workload executions.

Neuron SDK Release - April 3, 2025

04 Apr 05:52
9f6387f

Choose a tag to compare

The Neuron 2.22 release includes performance optimizations, enhancements and new capabilities across the Neuron software stack.

For inference workloads, the NxD Inference library now supports Llama-3.2-11B model and supports multi-LoRA serving, allowing customers to load and serve multiple LoRA adapters. Flexible quantization features have been added, enabling users to specify which model layers or NxDI modules to quantize. Asynchronous inference mode has also been introduced, improving performance by overlapping Input preparation with model execution.

For training, we added LoRA supervised fine-tuning to NxD Training to enable additional model customization and adaptation.

Neuron Kernel Interface (NKI): This release adds new APIs in nki.isa, nki.language, and nki.profile. These enhancements provide customers with greater flexibility and control.

The updated Neuron Runtime includes optimizations for reduced latency and improved device memory footprint. On the tooling side, the Neuron Profiler 2.0 (beta) has added UI enhancements and new event type support.

Neuron DLCs: this release reduces DLC image size by up to 50% and enables faster build times with updated Dockerfiles structure. On the Neuron DLAMI side, new PyTorch 2.5 single framework DLAMIs have been added for Ubuntu 22.04 and Amazon Linux 2023, along with several new virtual environments within the Neuron Multi Framework DLAMIs.

Neuron SDK Release - January 14, 2025

15 Jan 06:22
ae20816

Choose a tag to compare

Neuron 2.21.1 release pins Transformers NeuronX dependency to transformers<4.48 and fixes DMA abort errors on Trn2.

Additionally, this release addresses NxD Core and Training improvements, including fixes for sequence parallel support in quantized models and a new flag for dtype control in Llama3/3.1 70B configurations. See NxD Training Release Notes (neuronx-distributed-training) for details.

NxD Inference update includes minor bug fixes for sampling parameters. See NxD Inference Release Notes.

Neuron supported DLAMIs and DLCs have been updated to Neuron 2.21.1 SDK. Users should be aware of an incompatibility between Tensorflow-Neuron 2.10 (Inf1) and Neuron Runtime 2.21 in DLAMIs, which will be addressed in the next minor release. See Neuron DLAMI Release Notes.

The Neuron Compiler includes bug fixes and performance enhancements specifically targeting the Trn2 platform.

Neuron SDK Release - December 20, 2024

21 Dec 07:34

Choose a tag to compare

Overview: Neuron 2.21.0 introduces support for AWS Trainium 2 and Trn2 instances, including the trn2.48xlarge instance type and Trn2 UltraServer. The release adds new capabilities in both training and inference of large-scale models. It introduces NxD Inference (beta), a PyTorch-based library for deployment, Neuron Profiler 2.0 (beta), and PyTorch 2.5 support across the Neuron SDK, and Logical NeuronCore Configuration (LNC) for optimizing NeuronCore allocation. The release enables Llama 3.1 405B model inference on a single trn2.48xlarge instance.

NxD Inference: NxD Inference (beta) is a new PyTorch-based inference library for deploying large-scale models on AWS Inferentia and Trainium instances. It enables PyTorch model onboarding with minimal code changes and integrates with vLLM. NxDI supports various model architectures, including Llama versions for text processing (Llama 2, Llama 3, Llama 3.1, Llama 3.2, and Llama 3.3), Llama 3.2 multimodal for multimodal tasks, and Mixture-of-Experts (MoE) model architectures including Mixtral and DBRX. The library supports quantization methods, includes dynamic sampling, and is compatible with HuggingFace checkpoints and generate() API. NxDI also supports distributed strategies including tensor parallelism and incorporates speculative decoding techniques (Draft model and EAGLE). The release includes a Llama 3.1 405B model sample on a single trn2.48xlarge instance Llama 3.1 405B model inference.

For more information, see NxD Inference documentation and check the NxD Inference Github repository: aws-neuron/neuronx-distributed-inference

Transformers NeuronX (TNx): This release introduces several new features, including flash decoding support for speculative decoding, and on-device generation in speculative decoding flows. It adds Eagle speculative decoding with greedy and lossless sampling, as well as support for CPU compilation and sharded model saving. Performance improvements include optimized MLP and QKV for Llama models with sequence parallel norm and control over concurrent compilation workers.

Training Highlights: NxD Training in this release adds support for HuggingFace Llama3/3.1 70B on trn2 instances, introduces DPO support for post-training model alignment, and adds support for Mixture-of-Experts (MoE) models including Mixtral 7B. The release includes improved checkpoint conversion capabilities and supports MoE with Tensor, Sequence, Pipeline, and Expert parallelism.

ML Frameworks: Neuron 2.21.0 adds PyTorch 2.5 coming with improved support for eager mode, FP8, and Automatic Mixed Precision capabilities. JAX support extends to version 0.4.35, including support for JAX caching APIs.

Logical NeuronCore Configuration (LNC): This release introduces LNC for Trainium2 instances, optimizing NeuronCore allocation for ML applications. LNC offers two configurations: default (LNC=2) combining two physical cores, and alternative (LNC=1) mapping each physical core individually. This feature allows users to efficiently manage resources for large-scale model training and deployment through runtime variables and compiler flags.

Neuron Profiler 2.0: The new profiler provides system and device-level profiling, timeline annotations, container integration, and support for distributed workloads. It includes trace export capabilities for Perfetto visualization and integration with JAX and PyTorch profilers, and support for Logical NeuronCore Configuration (LNC).

Neuron Kernel Interface (NKI): NKI now supports Trainium2 including Logical NeuronCore Configuration (LNC), adds SPMD capabilities for multi-core operations, and includes new modules and APIs including support for float8_e5m2 datatype.

Deep Learning Containers (DLAMIs): This release expands support for JAX 0.4 within the Multi Framework DLAMI. It also introduces NeuronX Distributed Training (NxDT), Inference (NxDI), and Core (NxD) with PyTorch 2.5 support. Additionally, a new Single Framework DLAMI for TensorFlow 2.10 on Ubuntu 22 is now available.

Deep Learning Containers (DLCs): This release introduces new DLCs for JAX 0.4 training and PyTorch 2.5.1 inference and training. All DLCs have been updated to Ubuntu 22, and the pytorch-inference-neuronx DLC now supports both NxD Inference and TNx libraries.

Documentation: Documentation updates include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and Trn2 UltraServer.

Software Maintenance: This release includes the following announcements:

  • Announcing migration of NxD Core examples from NxD Core repository to NxD Inference repository in next release
  • Announcing end of support for Neuron DET tool starting next release
  • PyTorch Neuron versions 1.9 and 1.10 no longer supported
  • Announcing end of support for PyTorch 2.1 for Trn1, Trn2 and Inf2 starting next release
  • Announcing end of support for PyTorch 1.13 for Trn1 and Inf2 starting next release
  • Announcing end of support for Python 3.8 in future releases
  • Announcing end of support for Ubuntu20 DLCs and DLAMIs

Amazon Q: Use Q Developer as your Neuron Expert for general technical guidance and to jumpstart your NKI kernel development.

Neuron SDK Release - December 3, 2024

03 Dec 20:50

Choose a tag to compare

Neuron 2.21 beta introduces support for AWS Trainium2 and Trn2 instances, including the trn2.48xlarge instance type and u-trn2 UltraServer. The release showcases Llama 3.1 405B model inference using NxD Inference on a single trn2.48xlarge instance, and FUJI 70B model training using the AXLearn library across eight trn2.48xlarge instances.

NxD Inference, a new PyTorch-based library for deploying large language models and multi-modality models, is introduced in this release. It integrates with vLLM and enables PyTorch model onboarding with minimal code changes. The release also adds support for AXLearn training for JAX models.

The new Neuron Profiler 2.0 introduced in this release offers system and device-level profiling, timeline annotations, and container integration. The profiler supports distributed workloads and provides trace export capabilities for Perfetto visualization.

The documentation has been updated to include architectural details about Trainium2 and NeuronCore-v3, along with specifications and topology information for the trn2.48xlarge instance type and u-trn2 UltraServer.

Note:
This release (Neuron 2.21 Beta) was only tested with Trn2 instances. The next release (Neuron 2.21) will support all instances (Inf1, Inf2, Trn1, and Trn2).

For access to this release (Neuron 2.21 Beta) contact your account manager.

Neuron SDK Release - November 20, 2024

21 Nov 03:15

Choose a tag to compare

Neuron 2.20.2 release fixes a stability issue in Neuron Scheduler Extension that previously caused crashes in Kubernetes (K8) deployments. See Neuron K8 Release Notes.

This release also addresses a security patch update to Neuron Driver that fixes a kernel address leak issue. See more on Neuron Driver Release Notes and Neuron Runtime Release Notes.

Addtionally, Neuron 2.20.2 release updates torch-neuronx and libneuronxla packages to add support for torch-xla 2.1.5 package which fixes checkpoint loading issues with Zero Redundancy Optimizer (ZeRO-1). See PyTorch Neuron (torch-neuronx) release notes and Neuron XLA pluggable device (libneuronxla) release notes.

Neuron supported DLAMIs and DLCs are updated with this release (Neuron 2.20.2 SDK). The Training DLC is also updated to address the version dependency issues in NxD Training library. See Neuron DLC Release Notes.

NxD Training library in Neuron 2.20.2 release is updated to transformers 4.36.0 package. See NxD Training Release Notes (neuronx-distributed-training).

Neuron SDK Release - October 25, 2024

26 Oct 04:04

Choose a tag to compare

Neuron 2.20.1 release addresses an issue with the Neuron Persistent Cache that was brought forth in 2.20 release. In the 2.20 release, the Neuron persistent cache issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.

This release also addresses the excessive lock wait time issue during neuron_parallel_compile graph extraction for large cluster training. See PyTorch Neuron (torch-neuronx) release notes and Neuron XLA pluggable device (libneuronxla) release notes.

Additionally, Neuron 2.20.1 introduces new Multi Framework DLAMI for Amazon Linux 2023 (AL2023) that customers can use to easily get started with latest Neuron SDK on multiple frameworks that Neuron supports. See Neuron DLAMI Release Notes.

Neuron 2.20.1 Training DLC is also updated to pre-install the necessary dependencies and support NxD Training library out of the box. See Neuron DLC Release Notes

Neuron SDK Release - September 16th, 2024

17 Sep 01:58

Choose a tag to compare

Neuron 2.20 release introduces usability improvements and new capabilities across training and inference workloads. A key highlight is the introduction of Neuron Kernel Interface (beta). NKI, pronounced ‘Nicky’, is enabling developers to build optimized custom compute kernels for Trainium and Inferentia. Additionally, this release introduces NxD Training (beta), a PyTorch-based library enabling efficient distributed training, with a user-friendly interface compatible with NeMo. This release also introduces the support for the JAX framework (beta).

Neuron 2.20 also adds inference support for Pixart-alpha and Pixart-sigma Diffusion-Transformers (DiT) models, and adds support for Llama 3.1 8B,70B and 405B models inference supporting up to 128K context length.

Neuron SDK Release - July 19, 2024

22 Jul 17:20
800d00f

Choose a tag to compare

This release (Neuron 2.19.1) addresses an issue with the Neuron Persistent Cache that was introduced in the previous release, Neuron 2.19. The issue resulted in a cache-miss scenario when attempting to load a previously compiled Neuron Executable File Format (NEFF) from a different path or Python environment than the one used for the initial Neuron SDK installation and NEFF compilation. This release resolves the cache-miss problem, ensuring that NEFFs can be loaded correctly regardless of the path or Python environment used to install the Neuron SDK, as long as they were compiled using the same Neuron SDK version.