[MNT] enforce strict CPU-GPU numerical parity for ROCKET (< 1e-7 divergence in Kernel creation)#3177
Conversation
- Resolved structural RNG mismatch: aligned GPU kernel generation loops to match CPU's two-loop architecture - Fixed multivariate divergence: removed channel index sorting in GPU that altered convolution topology - Added legacy_rng flag: ensures backward compatibility for existing users - Configured TF determinism: eliminated non-deterministic hardware noise in CUDA operations - Achieved < 1e-7 divergence on GunPoint, BasicMotions, and ECG200
Thank you for contributing to
|
|
I have opened this as a Draft to share the architectural fixes. |
- Add legacy_rng parameter (default=False) for CPU parity - Fix RNG sequence in _define_parameters() to match CPU exactly - Achieve <10^-7 kernel divergence across 15 datasets - Add deprecation warning for legacy_rng=True - Update test to verify kernel parity (Phase 1 scope) - Document known limitation: multivariate transform divergence (Phase 2) Fixes aeon-toolkit#1248 Phase 1: Kernel generation parity - COMPLETE Phase 2: Transform parity for multivariate - Future work
877e693 to
c1bd777
Compare
|
The CI failures on this PR are not caused by the ROCKET GPU changes. They're caused by a pre-existing bug in the AEFCN network module that affects all tests across the repository. Root Cause: AEFCNNetwork.build_network() in aeon/networks/_ae_fcn.py passes a numpy scalar to Dense(units=...), which Keras rejects. Error: ValueError: Received an invalid value for units, expected a positive integer. Received: units=19200 Location: Line in _ae_fcn.py: Why This Blocks My PR
|
TensorFlow Keras Dense layer requires Python int for units parameter. Using np.prod() directly returns numpy.int64 which Keras rejects with: ValueError: Received an invalid value for units, expected a positive integer. Fixed in 3 AE network classes: - AEFCNNetwork: decoder Dense layer - AEResNetNetwork: decoder Dense layer - AEDCNNNetwork: decoder Dense layer This fix unblocks all TF-based CI tests and notebooks.
b5ec7e8 to
8798075
Compare
66d929c to
7a2d0b3
Compare
TensorFlow requires float32 tensors but test was passing float64. Error: 'expected float tensor but is double tensor' Added .astype(np.float32) conversion to test data.
ce5082c to
31179f6
Compare
TensorFlow Conv2D requires float32 tensors but numpy operations were upcasting to float64, causing errors across all platforms. Fixed two sources of float64 upcasting: 1. Kernel normalization: Added .astype(np.float32) after reshape 2. Bias generation: Wrapped np.random.uniform() with np.float32() This ensures all tensors passed to TF operations are float32.
40dabeb to
6aec859
Compare
TensorFlow automatically converts numpy arrays to float64 tensors by default, even if the numpy array is float32. This was causing Conv2D dtype errors. Solution: Wrap kernel in tf.constant(..., dtype=tf.float32) before passing to conv1d to ensure TensorFlow receives float32 tensor. Also added float32 casts to kernel normalization and bias generation as defense-in-depth, though the tf.constant conversion is the critical fix. Tested locally - dtype errors resolved.
The same PPV/Max features were being appended twice to output_features_filter, causing output to have 2x the expected number of samples (e.g., 20 instead of 10). Removed duplicate append block.
40fc8f2 to
7a069cc
Compare
|
Note for the reviewer: |
|
you can update branch to main, the PR for the AE fix got merged |
c60ce8a to
68f789f
Compare
ccdb24a to
7a069cc
Compare
Added tf.random.set_seed(1) to 3 test files to resolve RuntimeError when TensorFlow determinism is enabled but no seed is set. Files fixed: - aeon/classification/shapelet_based/tests/test_ls.py - aeon/regression/deep_learning/tests/test_deep_regressor_base.py - aeon/networks/tests/test_deepar.py All tests now pass locally with determinism enabled.
|
(STILL REVIEW READY) Hey , found a Bug in the implementation while i was playing around with benchmarking performance. This does not change anything quantifiably for the performance based on this PR tho, multivariate datasets were producing divergence either way, this is just another place i am going to focus my analysis into. The current PR , does ensure that univariate kernel generation and feature generation achieve parity . UPDATE:It was just a simple ordering issue in the feature storage in GPU and CPU (just a benchmarking issue actually) , the kernel values were same , their ordering differed in both , so they are producing the same kernels . This causes no issue for users or implementation, however to make testing easy , ill change ordering in future PRs where i work on transform step parity achievement. (informing as the same might be visible in your validation and testing scripts aswell) |
80a032d to
31c3835
Compare
|
This PR is superseeded by PR #3211 , with a CuPy backend. Closing this PR due to the same @hadifawaz1999. |
|
Currently Drafted , ill confirm and complete this PR based upon the decision made in discussions of PR #3211 . |
7505511 to
8fca6ab
Compare
|
Closing this PR , this implementation has gotten messy and isnt the best way to go about the changes , much so after the discussions in the related PR. I shall take a look at this later , with a better and cleaner approach. |

Reference Issues/PRs
-Fixes #1248
What does this implement/fix? Explain your changes.
Executive Summary: Fixing Critical CPU-GPU Divergence (< 1e-7 Error)
This PR resolves a critical scientific reproducibility issue in
ROCKETGPU. Prior to this fix, the GPU implementation exhibited massive statistical divergence from the CPU baseline (up to 68.9% mean error on multivariate data), rendering cross-platform experiments invalid.This fix restores numerical parity by enforcing strict architectural alignment, reducing average divergence from ~11% to 0.000007% (< 10^{-7}) across all tested datasets.
1. The Problem: Why this is Critical
During an empirical audit, I discovered
ROCKETGPUproduced fundamentally different results thanRocket(CPU) despite identicalrandom_state.BasicMotionsdataset, specific features showed a 2,050% difference (CPU: 3.67 vs GPU: 78.88), confirming the GPU was using a mathematically distinct algorithm.GunPointshowed ~9.6% divergence due to RNG desynchronization.2. Root Cause Analysis & Fixes
I identified and resolved three specific structural defects:
Fixed "Two-Loop" RNG Mismatch:
Issue: The CPU pre-generates all channel counts in a primary loop, then weights in a secondary loop. The GPU used a single loop, causing the RNG sequence to desynchronize at Call [ENH] refactor BaseClassifier attribute _threads_to_use to _n_jobs #102.
Fix: Refactored
_define_parametersin_rocket_gpu.pyto mirror the CPU's two-loop architecture exactly.Fixed Channel Sorting Bug (Multivariate Error):
Issue: The GPU implementation explicitly sorted randomly selected channel indices (e.g.,
[4, 1, 3]->[1, 3, 4]). Due to floating-point associativity, this changed the convolution's topological summation, causing the 68.9% divergence in multivariate data.Fix: Removed the
np.sort()operation. The GPU now processes channels in the exact same random order as the CPU.Enforced Hardware Determinism:
Issue: CUDA kernels are non-deterministic by default.
Fix: Added flags to force TensorFlow to use deterministic ops (
TF_DETERMINISTIC_OPS='1').3. Verification & Results

Post-fix benchmarking confirms statistical parity is restored:
4. Backward Compatibility
Added a
legacy_rngflag toROCKETGPU:legacy_rng=True: Uses old (divergent) behavior (for existing users).legacy_rng=False(Default): Uses new, strictly reproducible logic.Does your contribution introduce a new dependency? If yes, which one?
No.
Any other comments?
This PR prioritizes scientific correctness over speed. The strict synchronization introduces overhead for small datasets (e.g.,
BasicMotions), but this is an intentional trade-off to guarantee validity. Future optimization can address performance via batch parameter injection.A new test
test_rocket_cpu_gpu_parityhas been added to verify divergence stays< 1e-5.PR checklist
For all contributions
For new estimators and functions
__maintainer__at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.For developers with write access