Skip to content

Stop overflow of aggregatedCounts when storing 4B+ values#70

Closed
colings86 wants to merge 1 commit intotdunning:masterfrom
colings86:fix/manyValuesBug
Closed

Stop overflow of aggregatedCounts when storing 4B+ values#70
colings86 wants to merge 1 commit intotdunning:masterfrom
colings86:fix/manyValuesBug

Conversation

@colings86
Copy link
Copy Markdown

Previous to this change aggregatedCounts could overflow if more than
2 * Integer.MAX_VALUE values were added to the tree. this is because
the aggregatedCounts stored in the tree which contains, for each node,
the sum of the count for the node and its children was stored as an int[].
When more than 2 * Integer.MAX_VALUE values are stored in the tree the
counts of especially the nodes which are the direct children of the root
node can overflow and cause strange result when quantiles for that node
or to the right of that node are requested.

Technically the root node's aggregated count will overflow when more
than 1 * Integer.MAX_VALUE values are inserted but this has no effect
on the results since the aggregatedCount for the root node is never used.

The solution is to store the aggregatedCounts as a long[] which will
increase the memory usage per centroid to increase from 29 bytes (13
bytes from IntAVLTree and 16 bytes from AVLGroupTree) to 33 bytes so
will increase the memory usage by ~12%.

Previous to this change aggregatedCounts could overflow if more than
`2 * Integer.MAX_VALUE` values were added to the tree. this is because
the aggregatedCounts stored in the tree which contains, for each node,
the sum of the count for the node and its children was stored as an int[].
When more than `2 * Integer.MAX_VALUE` values are stored in the tree the
counts of especially the nodes which are the direct children of the root
node can overflow and cause strange result when quantiles for that node
or to the right of that node are requested.

Technically the root node's aggregated count will overflow when more
than `1 * Integer.MAX_VALUE` values are inserted but this has no effect
on the results since the aggregatedCount for the root node is never used.

The solution is to store the aggregatedCounts as a long[] which will
increase the memory usage per centroid to increase from 29 bytes (13
bytes from IntAVLTree and 16 bytes from AVLGroupTree) to 33 bytes so
will increase the memory usage by ~12%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant