Skip to content

protect against register overflow#7

Open
tsenart wants to merge 2 commits intoseiflotfy:masterfrom
tsenart:master
Open

protect against register overflow#7
tsenart wants to merge 2 commits intoseiflotfy:masterfrom
tsenart:master

Conversation

@tsenart
Copy link
Contributor

@tsenart tsenart commented Mar 4, 2025

Adds overflow protection to the Count-Min-Log sketch implementation. Previously, incrementing a register at its maximum value would wrap around to zero, leading to incorrect counts. Now, registers saturate at their maximum value and updates that would cause overflow are rejected.

The changes:

  • Check for overflow before incrementing in Update
  • Add overflow protection to BulkUpdate
  • Simplify pointValue by removing redundant zero check
  • Handle overflow in value calculation
  • Add comprehensive tests for overflow behavior

This is a critical fix for correctness when dealing with high-frequency items, particularly when using smaller register sizes like uint8.

Also, replace fixed-width encoding with variable-length integers for sketch serialization.
The new format uses a fixed 16-byte header followed by varint-encoded registers. Note: This is a breaking change to the binary format.

Adds overflow protection to the Count-Min-Log sketch implementation.
Previously, incrementing a register at its maximum value would wrap
around to zero, leading to incorrect counts. Now, registers saturate
at their maximum value and updates that would cause overflow are
rejected.

The changes:
- Check for overflow before incrementing in Update
- Add overflow protection to BulkUpdate
- Handle overflow in value calculation
- Add comprehensive tests for overflow behavior

This is a critical fix for correctness when dealing with high-frequency
items, particularly when using smaller register sizes like uint8.
Replace fixed-width encoding with variable-length integers for sketch
serialization. This change:

- Removes unsafe.Sizeof dependency
- Reduces serialized size for sparse sketches
- Adds bounds checking for register capacity
- Simplifies encoding/decoding logic
- Adds fuzzing for serialization correctness

The new format uses a fixed 16-byte header followed by varint-encoded
registers. Note: This is a breaking change to the binary format.

Added benchmarks show the impact on serialization performance and size.

```
goos: darwin
goarch: arm64
pkg: github.com/seiflotfy/count-min-log
cpu: Apple M3 Max
                                                   │   before    │                after                 │
                                                   │   sec/op    │    sec/op     vs base                │
Sketch/fn=MarshalBinary/eps=0.010/delta=0.010-16     1.631µ ± 2%    1.137µ ± 6%   -30.27% (p=0.002 n=6)
Sketch/fn=UnmarshalBinary/eps=0.010/delta=0.010-16   925.8n ± 1%   2264.5n ± 1%  +144.59% (p=0.002 n=6)
Sketch/fn=MarshalBinary/eps=0.050/delta=0.010-16     344.4n ± 7%    243.8n ± 4%   -29.20% (p=0.002 n=6)
Sketch/fn=UnmarshalBinary/eps=0.050/delta=0.010-16   255.5n ± 4%    522.5n ± 7%  +104.52% (p=0.002 n=6)
Sketch/fn=MarshalBinary/eps=0.100/delta=0.010-16     183.1n ± 2%    130.6n ± 1%   -28.72% (p=0.002 n=6)
Sketch/fn=UnmarshalBinary/eps=0.100/delta=0.010-16   166.1n ± 1%    301.6n ± 2%   +81.61% (p=0.002 n=6)
geomean                                              399.1n         484.4n        +21.37%

                                                 │    before    │                after                │
                                                 │    bytes     │    bytes      vs base               │
Sketch/fn=MarshalBinary/eps=0.010/delta=0.010-16   2.672Ki ± 0%   1.344Ki ± 0%  -49.71% (p=0.002 n=6)
Sketch/fn=MarshalBinary/eps=0.050/delta=0.010-16     566.0 ± 0%     291.0 ± 0%  -48.59% (p=0.002 n=6)
Sketch/fn=MarshalBinary/eps=0.100/delta=0.010-16     296.0 ± 0%     156.0 ± 0%  -47.30% (p=0.002 n=6)
geomean                                              771.0          396.8       -48.54%

                                                   │    before    │                 after                 │
                                                   │     B/op     │     B/op      vs base                 │
Sketch/fn=MarshalBinary/eps=0.010/delta=0.010-16     3.000Ki ± 0%   1.375Ki ± 0%  -54.17% (p=0.002 n=6)
Sketch/fn=UnmarshalBinary/eps=0.010/delta=0.010-16   3.172Ki ± 0%   3.172Ki ± 0%        ~ (p=1.000 n=6) ¹
Sketch/fn=MarshalBinary/eps=0.050/delta=0.010-16       576.0 ± 0%     320.0 ± 0%  -44.44% (p=0.002 n=6)
Sketch/fn=UnmarshalBinary/eps=0.050/delta=0.010-16     752.0 ± 0%     752.0 ± 0%        ~ (p=1.000 n=6) ¹
Sketch/fn=MarshalBinary/eps=0.100/delta=0.010-16       320.0 ± 0%     160.0 ± 0%  -50.00% (p=0.002 n=6)
Sketch/fn=UnmarshalBinary/eps=0.100/delta=0.010-16     464.0 ± 0%     464.0 ± 0%        ~ (p=1.000 n=6) ¹
geomean                                                928.7          658.7       -29.07%
¹ all samples are equal

                                                   │   before   │               after                │
                                                   │ allocs/op  │ allocs/op   vs base                │
Sketch/fn=MarshalBinary/eps=0.010/delta=0.010-16     1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=6) ¹
Sketch/fn=UnmarshalBinary/eps=0.010/delta=0.010-16   5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=6) ¹
Sketch/fn=MarshalBinary/eps=0.050/delta=0.010-16     1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=6) ¹
Sketch/fn=UnmarshalBinary/eps=0.050/delta=0.010-16   5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=6) ¹
Sketch/fn=MarshalBinary/eps=0.100/delta=0.010-16     1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=6) ¹
Sketch/fn=UnmarshalBinary/eps=0.100/delta=0.010-16   5.000 ± 0%   5.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                                              2.236        2.236       +0.00%
¹ all samples are equal
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant