Skip to content

Don't keep box muller transform state between kernel launches#649

Merged
neworderofjamie merged 6 commits intomasterfrom
xorwow_mem_reduce
Jan 28, 2025
Merged

Don't keep box muller transform state between kernel launches#649
neworderofjamie merged 6 commits intomasterfrom
xorwow_mem_reduce

Conversation

@neworderofjamie
Copy link
Contributor

@neworderofjamie neworderofjamie commented Jan 13, 2025

This builds on the new hybrid HIP/CUDA backend from #647 so review that first!

As described in #648, beyond its (arguably excessive) 192 bit of state, the curand XORWOW RNG used to provide randomness for neuron and custom connectivity updates also stores 160 bits of box muller transform state in the curandState struct (BM draws two numbers and produces two normally distributed values so this state is used to cache one of those results for subsequent calls to curand_normal).

In this PR, when using CUDA and HIP backends, we create our own XORWowStateInternal struct in definitions.h without the BM state and store this in memory. At the start of the neuron and custom connectivity update kernels, we copy the fields from the XORWowStateInternal struct into a local curandState and, at the end, we copy them back.

Excitingly, because we are very memory bandwidth bound, this makes the neuron kernel on the cortical microcircuit model about 60% faster. On my A5000 (running for 1 second):

Before[seconds] After [seconds]
neuron update 0.18 0.11
presynaptic update 0.23 0.24

@codecov
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

Attention: Patch coverage is 98.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.97%. Comparing base (54101ca) to head (beb19d4).
Report is 19 commits behind head on master.

Files with missing lines Patch % Lines
include/genn/genn/code_generator/environment.h 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #649      +/-   ##
==========================================
+ Coverage   88.62%   88.97%   +0.35%     
==========================================
  Files         111      108       -3     
  Lines       15062    14768     -294     
==========================================
- Hits        13348    13140     -208     
+ Misses       1714     1628      -86     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@neworderofjamie neworderofjamie force-pushed the xorwow_mem_reduce branch 3 times, most recently from e61dc4b to 1294128 Compare January 15, 2025 12:55
…ase`` entries as well as initialisers for cleanup code
…IP backend

* Write struct to definitions
* Use new class for allocating memory and struct fields
* Reimplemented population RNG preamble and postamble in ``BackendSIMT::getPopulationRNG`` using new destructor mechanism to copy from and to internal struct
Copy link
Member

@tnowotny tnowotny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow ... what a detail and what an effect. It doesn't necessarily look very elegant but the savings seem worth it!

Base automatically changed from hip to master January 28, 2025 19:13
@neworderofjamie neworderofjamie merged commit e9e40b9 into master Jan 28, 2025
2 checks passed
@neworderofjamie neworderofjamie deleted the xorwow_mem_reduce branch January 28, 2025 19:15
This was referenced Sep 12, 2025
@neworderofjamie neworderofjamie mentioned this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants