[MNT] enforce strict CPU-GPU numerical parity for ROCKET (< 1e-7 divergence in Kernel creation) by Adityakushwaha2006 · Pull Request #3177 · aeon-toolkit/aeon

Adityakushwaha2006 · 2025-12-17T18:31:54Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Executive Summary: Fixing Critical CPU-GPU Divergence (< 1e-7 Error)
This PR resolves a critical scientific reproducibility issue in ROCKETGPU. Prior to this fix, the GPU implementation exhibited massive statistical divergence from the CPU baseline (up to 68.9% mean error on multivariate data), rendering cross-platform experiments invalid.

This fix restores numerical parity by enforcing strict architectural alignment, reducing average divergence from ~11% to 0.000007% (< 10^{-7}) across all tested datasets.

1. The Problem: Why this is Critical
During an empirical audit, I discovered ROCKETGPU produced fundamentally different results than Rocket (CPU) despite identical random_state.

Multivariate Failure: On the BasicMotions dataset, specific features showed a 2,050% difference (CPU: 3.67 vs GPU: 78.88), confirming the GPU was using a mathematically distinct algorithm.
Univariate Failure: Even simple datasets like GunPoint showed ~9.6% divergence due to RNG desynchronization.

2. Root Cause Analysis & Fixes
I identified and resolved three specific structural defects:

Fixed "Two-Loop" RNG Mismatch:
Issue: The CPU pre-generates all channel counts in a primary loop, then weights in a secondary loop. The GPU used a single loop, causing the RNG sequence to desynchronize at Call [ENH] refactor BaseClassifier attribute _threads_to_use to _n_jobs #102.
Fix: Refactored _define_parameters in _rocket_gpu.py to mirror the CPU's two-loop architecture exactly.
Fixed Channel Sorting Bug (Multivariate Error):
Issue: The GPU implementation explicitly sorted randomly selected channel indices (e.g., [4, 1, 3] -> [1, 3, 4]). Due to floating-point associativity, this changed the convolution's topological summation, causing the 68.9% divergence in multivariate data.
Fix: Removed the np.sort() operation. The GPU now processes channels in the exact same random order as the CPU.
Enforced Hardware Determinism:
Issue: CUDA kernels are non-deterministic by default.
Fix: Added flags to force TensorFlow to use deterministic ops (TF_DETERMINISTIC_OPS='1').

3. Verification & Results
Post-fix benchmarking confirms statistical parity is restored:

4. Backward Compatibility
Added a legacy_rng flag to ROCKETGPU:

legacy_rng=True: Uses old (divergent) behavior (for existing users).
legacy_rng=False (Default): Uses new, strictly reproducible logic.

Does your contribution introduce a new dependency? If yes, which one?

No.

Any other comments?

This PR prioritizes scientific correctness over speed. The strict synchronization introduces overhead for small datasets (e.g., BasicMotions), but this is an intentional trade-off to guarantee validity. Future optimization can address performance via batch parameter injection.

A new test test_rocket_cpu_gpu_parity has been added to verify divergence stays < 1e-5.

PR checklist

For all contributions

I've added myself to the [list of contributors](https://github.com/aeon-toolkit/aeon/blob/main/.all-contributorsrc). Alternatively, you can use the [@all-contributors](https://allcontributors.org/docs/en/bot/usage) bot to do this for you after the PR has been merged.
The PR title starts with either [ENH], [MNT], [DOC], [BUG], [REF], [DEP] or [GOV] indicating whether the PR topic is related to enhancement, maintenance, documentation, bugs, refactoring, deprecation or governance.

For new estimators and functions

I've added the estimator/function to the online [API documentation](https://www.aeon-toolkit.org/en/latest/api_reference.html).
(OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

For developers with write access

(OPTIONAL) I've updated aeon's [CODEOWNERS](https://github.com/aeon-toolkit/aeon/blob/main/CODEOWNERS) to receive notifications about future changes to these files.

- Resolved structural RNG mismatch: aligned GPU kernel generation loops to match CPU's two-loop architecture - Fixed multivariate divergence: removed channel index sorting in GPU that altered convolution topology - Added legacy_rng flag: ensures backward compatibility for existing users - Configured TF determinism: eliminated non-deterministic hardware noise in CUDA operations - Achieved < 1e-7 divergence on GunPoint, BasicMotions, and ECG200

aeon-actions-bot · 2025-12-17T18:32:17Z

Thank you for contributing to `aeon`

I have added the following labels to this PR based on the title: [ maintenance ].
I have added the following labels to this PR based on the changes made: [ transformations ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

Run pre-commit checks for all files
Run mypy typecheck tests
Run all pytest tests and configurations
Run all notebook example tests
Run numba-disabled codecov tests
Stop automatic pre-commit fixes (always disabled for drafts)
Disable numba cache loading
Regenerate expected results for testing
Push an empty commit to re-run CI checks

Adityakushwaha2006 · 2025-12-17T18:51:27Z

I have opened this as a Draft to share the architectural fixes.
Please ignore any CI/Linting failures for the moment.
Current State:
Correctness: The logic is verified locally and achieves strict parity (less than 10^{-7} divergence) across all test datasets.
Performance: As expected, the strict synchronization introduces overhead.
Immediate Next Steps:
I am now moving to Phase 2: Optimization to recover the speed loss. Once the performance is stabilized, I will clean up the CI checks, finalize the documentation, and mark this ready for final review.

Adityakushwaha2006 · 2025-12-18T18:36:18Z

Update:
I ran the current script for all the 15 available test cases, and i noticed some concerns in the current PR code.
These tests now constitute of full GPU vs CPU runs, and its still showing parity in final feature generation.

I had tested 'kernel' generation matches earlier, and theres less than 10^-7 divergence in parity there, that too simply because of computation methods and rounding errors.
What i had assumed earlier is that the method of convolution would 'have' some effect and the divergence would be affected by the transformation step , but it wouldnt be as major as the fix step ( if kernels are different everything down the line messes up , divergence and randomness gets multiplied)...as the tests revel leven with kernels being almost same ....the transformation step introduces some parity.
The parity isnt extreme, and goes up to under 0.5% in most datasets, with some beinng around 1%...
However i did find an outlier here, BasicMotions dataset outputs a 244% divergence, which is simply only caused due to transfom step induced divergence , the kernel similarity there as seen above as well is less than 10^-7 .

Just wanted to inform that before moving forward with the speed optimisation of the gpu code and merging that finally, i will be intermediately focusing on minimizing , convolution based parity, which will also result in an intermediate draft PR.

KERNEL SIMILARITY - PARITY MATCH

- Add legacy_rng parameter (default=False) for CPU parity - Fix RNG sequence in _define_parameters() to match CPU exactly - Achieve <10^-7 kernel divergence across 15 datasets - Add deprecation warning for legacy_rng=True - Update test to verify kernel parity (Phase 1 scope) - Document known limitation: multivariate transform divergence (Phase 2) Fixes aeon-toolkit#1248 Phase 1: Kernel generation parity - COMPLETE Phase 2: Transform parity for multivariate - Future work

Adityakushwaha2006 · 2025-12-19T15:54:31Z

The CI failures on this PR are not caused by the ROCKET GPU changes. They're caused by a pre-existing bug in the AEFCN network module that affects all tests across the repository.

Root Cause: AEFCNNetwork.build_network() in aeon/networks/_ae_fcn.py passes a numpy scalar to Dense(units=...), which Keras rejects.

Error: ValueError: Received an invalid value for units, expected a positive integer. Received: units=19200

Location: Line in _ae_fcn.py:
latent_space = tf.keras.layers.Dense(units=np.prod(shape_before_flattent))(flatten_layer)

Why This Blocks My PR

This PR only modifies: rocketGPU/_rocket_gpu.py and rocketGPU/tests/test_base_rocketGPU.py
CI runs the entire test suite before testing ROCKET changes
AEFCN tests fail early, preventing ROCKET tests from running
All 7 CI jobs (macOS, Ubuntu, Windows) fail on the same AEFCN error

TensorFlow Keras Dense layer requires Python int for units parameter. Using np.prod() directly returns numpy.int64 which Keras rejects with: ValueError: Received an invalid value for units, expected a positive integer. Fixed in 3 AE network classes: - AEFCNNetwork: decoder Dense layer - AEResNetNetwork: decoder Dense layer - AEDCNNNetwork: decoder Dense layer This fix unblocks all TF-based CI tests and notebooks.

TensorFlow requires float32 tensors but test was passing float64. Error: 'expected float tensor but is double tensor' Added .astype(np.float32) conversion to test data.

TensorFlow Conv2D requires float32 tensors but numpy operations were upcasting to float64, causing errors across all platforms. Fixed two sources of float64 upcasting: 1. Kernel normalization: Added .astype(np.float32) after reshape 2. Bias generation: Wrapped np.random.uniform() with np.float32() This ensures all tensors passed to TF operations are float32.

TensorFlow automatically converts numpy arrays to float64 tensors by default, even if the numpy array is float32. This was causing Conv2D dtype errors. Solution: Wrap kernel in tf.constant(..., dtype=tf.float32) before passing to conv1d to ensure TensorFlow receives float32 tensor. Also added float32 casts to kernel normalization and bias generation as defense-in-depth, though the tf.constant conversion is the critical fix. Tested locally - dtype errors resolved.

The same PPV/Max features were being appended twice to output_features_filter, causing output to have 2x the expected number of samples (e.g., 20 instead of 10). Removed duplicate append block.

Adityakushwaha2006 · 2025-12-19T21:42:56Z

Note for the reviewer:
This PR also contains the fix implemented in the the PR #3182 , since 3182 is urgent , if it is merged first then if needed i can rebase this , and then run the cl again

hadifawaz1999 · 2025-12-20T16:32:39Z

you can update branch to main, the PR for the AE fix got merged

Added tf.random.set_seed(1) to 3 test files to resolve RuntimeError when TensorFlow determinism is enabled but no seed is set. Files fixed: - aeon/classification/shapelet_based/tests/test_ls.py - aeon/regression/deep_learning/tests/test_deep_regressor_base.py - aeon/networks/tests/test_deepar.py All tests now pass locally with determinism enabled.

Adityakushwaha2006 · 2025-12-22T21:00:56Z

(STILL REVIEW READY)

Hey , found a Bug in the implementation while i was playing around with benchmarking performance.
It arises in the from kernel divergence for multivariate datasets.

This does not change anything quantifiably for the performance based on this PR tho, multivariate datasets were producing divergence either way, this is just another place i am going to focus my analysis into.

The current PR , does ensure that univariate kernel generation and feature generation achieve parity .

UPDATE:

It was just a simple ordering issue in the feature storage in GPU and CPU (just a benchmarking issue actually) , the kernel values were same , their ordering differed in both , so they are producing the same kernels . This causes no issue for users or implementation, however to make testing easy , ill change ordering in future PRs where i work on transform step parity achievement.

(informing as the same might be visible in your validation and testing scripts aswell)

Adityakushwaha2006 · 2025-12-29T22:13:10Z

This PR is superseeded by PR #3211 , with a CuPy backend.
With the current limitations to Tensorflow approach , it holds no merit compared to the 100% parity and 100x speedup in computation and full multivariate support offered by the CuPy GPU variant .

Closing this PR due to the same @hadifawaz1999.

Adityakushwaha2006 · 2026-01-04T22:43:05Z

Currently Drafted , ill confirm and complete this PR based upon the decision made in discussions of PR #3211 .
Please hold off a review till undrafted.

Adityakushwaha2006 · 2026-01-06T21:33:57Z

Closing this PR , this implementation has gotten messy and isnt the best way to go about the changes , much so after the discussions in the related PR. I shall take a look at this later , with a better and cleaner approach.

Adityakushwaha2006 requested review from MatthewMiddlehurst and TonyBagnall as code owners December 17, 2025 18:31

aeon-actions-bot bot added maintenance Continuous integration, unit testing & package distribution transformations Transformations package labels Dec 17, 2025

Adityakushwaha2006 marked this pull request as draft December 17, 2025 18:48

Adityakushwaha2006 mentioned this pull request Dec 19, 2025

[MNT] Test consistency between CPU & GPU version of ROCKET #1248

Open

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch 2 times, most recently from 877e693 to c1bd777 Compare December 19, 2025 10:26

Adityakushwaha2006 marked this pull request as ready for review December 19, 2025 15:50

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from b5ec7e8 to 8798075 Compare December 19, 2025 18:21

Adityakushwaha2006 requested a review from hadifawaz1999 as a code owner December 19, 2025 18:21

Fix: Robust warning test and TF determinism

7a2d0b3

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from 66d929c to 7a2d0b3 Compare December 19, 2025 18:51

Fix: Convert test data to float32 for TensorFlow

31179f6

TensorFlow requires float32 tensors but test was passing float64. Error: 'expected float tensor but is double tensor' Added .astype(np.float32) conversion to test data.

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from ce5082c to 31179f6 Compare December 19, 2025 19:13

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from 40dabeb to 6aec859 Compare December 19, 2025 19:50

Adityakushwaha2006 added 2 commits December 20, 2025 02:00

Fix: Remove duplicate feature append causing doubled output samples

7a069cc

The same PPV/Max features were being appended twice to output_features_filter, causing output to have 2x the expected number of samples (e.g., 20 instead of 10). Removed duplicate append block.

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from 40fc8f2 to 7a069cc Compare December 19, 2025 20:43

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from c60ce8a to 68f789f Compare December 20, 2025 17:42

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch 3 times, most recently from ccdb24a to 7a069cc Compare December 20, 2025 18:33

Adityakushwaha2006 and others added 3 commits December 21, 2025 00:04

Merge branch 'main' into fix/rocket-gpu-numerical-parity

e1a1c20

Automatic pre-commit fixes

4af38fe

Adityakushwaha2006 requested a review from dguijo as a code owner December 20, 2025 19:22

Automatic pre-commit fixes

31c3835

Adityakushwaha2006 changed the title ~~[MNT] enforce strict CPU-GPU numerical parity for ROCKET (< 1e-7 divergence)~~ [MNT] enforce strict CPU-GPU numerical parity for ROCKET (< 1e-7 divergence in Kernel creation) Dec 22, 2025

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from 80a032d to 31c3835 Compare December 27, 2025 04:15

Adityakushwaha2006 mentioned this pull request Dec 29, 2025

[ENH] GPU Rocket 2.0: CuPy Backend ,Exact CPU Parity (<1e-7), 100x Speedup, Full Multivariate Support #3211

Closed

5 tasks

Adityakushwaha2006 closed this Dec 29, 2025

Adityakushwaha2006 reopened this Jan 4, 2026

Adityakushwaha2006 marked this pull request as draft January 4, 2026 22:41

Kenrel Parity, Univariate feature parity with CPU on same seed

9e38c95

Adityakushwaha2006 closed this Jan 6, 2026

Adityakushwaha2006 force-pushed the fix/rocket-gpu-numerical-parity branch from 7505511 to 8fca6ab Compare January 6, 2026 20:42

Adityakushwaha2006 reopened this Jan 6, 2026

Merge branch 'main' into fix/rocket-gpu-numerical-parity

e2376eb

Adityakushwaha2006 closed this Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] enforce strict CPU-GPU numerical parity for ROCKET (< 1e-7 divergence in Kernel creation)#3177

[MNT] enforce strict CPU-GPU numerical parity for ROCKET (< 1e-7 divergence in Kernel creation)#3177
Adityakushwaha2006 wants to merge 14 commits intoaeon-toolkit:mainfrom
Adityakushwaha2006:fix/rocket-gpu-numerical-parity

Adityakushwaha2006 commented Dec 17, 2025

Uh oh!

aeon-actions-bot bot commented Dec 17, 2025

Uh oh!

Adityakushwaha2006 commented Dec 17, 2025 •

edited

Loading

Uh oh!

Adityakushwaha2006 commented Dec 18, 2025 •

edited

Loading

Uh oh!

Adityakushwaha2006 commented Dec 19, 2025

Uh oh!

Adityakushwaha2006 commented Dec 19, 2025

Uh oh!

hadifawaz1999 commented Dec 20, 2025

Uh oh!

Adityakushwaha2006 commented Dec 22, 2025 •

edited

Loading

Uh oh!

Adityakushwaha2006 commented Dec 29, 2025

Uh oh!

Adityakushwaha2006 commented Jan 4, 2026

Uh oh!

Adityakushwaha2006 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Adityakushwaha2006 commented Dec 17, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Any other comments?

PR checklist

For all contributions

For new estimators and functions

For developers with write access

Uh oh!

aeon-actions-bot bot commented Dec 17, 2025

Thank you for contributing to aeon

Uh oh!

Adityakushwaha2006 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adityakushwaha2006 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adityakushwaha2006 commented Dec 19, 2025

Uh oh!

Adityakushwaha2006 commented Dec 19, 2025

Uh oh!

hadifawaz1999 commented Dec 20, 2025

Uh oh!

Adityakushwaha2006 commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UPDATE:

Uh oh!

Adityakushwaha2006 commented Dec 29, 2025

Uh oh!

Adityakushwaha2006 commented Jan 4, 2026

Uh oh!

Adityakushwaha2006 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Thank you for contributing to `aeon`

Adityakushwaha2006 commented Dec 17, 2025 •

edited

Loading

Adityakushwaha2006 commented Dec 18, 2025 •

edited

Loading

Adityakushwaha2006 commented Dec 22, 2025 •

edited

Loading