Stop overflow of aggregatedCounts when storing 4B+ values#70
Closed
colings86 wants to merge 1 commit intotdunning:masterfrom
colings86:fix/manyValuesBug
Closed
Stop overflow of aggregatedCounts when storing 4B+ values#70colings86 wants to merge 1 commit intotdunning:masterfrom colings86:fix/manyValuesBug
colings86 wants to merge 1 commit intotdunning:masterfrom
colings86:fix/manyValuesBug
Conversation
Previous to this change aggregatedCounts could overflow if more than `2 * Integer.MAX_VALUE` values were added to the tree. this is because the aggregatedCounts stored in the tree which contains, for each node, the sum of the count for the node and its children was stored as an int[]. When more than `2 * Integer.MAX_VALUE` values are stored in the tree the counts of especially the nodes which are the direct children of the root node can overflow and cause strange result when quantiles for that node or to the right of that node are requested. Technically the root node's aggregated count will overflow when more than `1 * Integer.MAX_VALUE` values are inserted but this has no effect on the results since the aggregatedCount for the root node is never used. The solution is to store the aggregatedCounts as a long[] which will increase the memory usage per centroid to increase from 29 bytes (13 bytes from IntAVLTree and 16 bytes from AVLGroupTree) to 33 bytes so will increase the memory usage by ~12%.
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previous to this change aggregatedCounts could overflow if more than
2 * Integer.MAX_VALUEvalues were added to the tree. this is becausethe aggregatedCounts stored in the tree which contains, for each node,
the sum of the count for the node and its children was stored as an int[].
When more than
2 * Integer.MAX_VALUEvalues are stored in the tree thecounts of especially the nodes which are the direct children of the root
node can overflow and cause strange result when quantiles for that node
or to the right of that node are requested.
Technically the root node's aggregated count will overflow when more
than
1 * Integer.MAX_VALUEvalues are inserted but this has no effecton the results since the aggregatedCount for the root node is never used.
The solution is to store the aggregatedCounts as a long[] which will
increase the memory usage per centroid to increase from 29 bytes (13
bytes from IntAVLTree and 16 bytes from AVLGroupTree) to 33 bytes so
will increase the memory usage by ~12%.