Don't keep box muller transform state between kernel launches#649
Merged
neworderofjamie merged 6 commits intomasterfrom Jan 28, 2025
Merged
Don't keep box muller transform state between kernel launches#649neworderofjamie merged 6 commits intomasterfrom
neworderofjamie merged 6 commits intomasterfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #649 +/- ##
==========================================
+ Coverage 88.62% 88.97% +0.35%
==========================================
Files 111 108 -3
Lines 15062 14768 -294
==========================================
- Hits 13348 13140 -208
+ Misses 1714 1628 -86 ☔ View full report in Codecov by Sentry. |
e61dc4b to
1294128
Compare
…ase`` entries as well as initialisers for cleanup code
…IP backend * Write struct to definitions * Use new class for allocating memory and struct fields * Reimplemented population RNG preamble and postamble in ``BackendSIMT::getPopulationRNG`` using new destructor mechanism to copy from and to internal struct
… generated in correct scope
1294128 to
beb19d4
Compare
tnowotny
approved these changes
Jan 28, 2025
Member
tnowotny
left a comment
There was a problem hiding this comment.
Wow ... what a detail and what an effect. It doesn't necessarily look very elegant but the savings seem worth it!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This builds on the new hybrid HIP/CUDA backend from #647 so review that first!
As described in #648, beyond its (arguably excessive) 192 bit of state, the curand XORWOW RNG used to provide randomness for neuron and custom connectivity updates also stores 160 bits of box muller transform state in the
curandStatestruct (BM draws two numbers and produces two normally distributed values so this state is used to cache one of those results for subsequent calls tocurand_normal).In this PR, when using CUDA and HIP backends, we create our own
XORWowStateInternalstruct in definitions.h without the BM state and store this in memory. At the start of the neuron and custom connectivity update kernels, we copy the fields from theXORWowStateInternalstruct into a localcurandStateand, at the end, we copy them back.Excitingly, because we are very memory bandwidth bound, this makes the neuron kernel on the cortical microcircuit model about 60% faster. On my A5000 (running for 1 second):