Format: Implement LZ offset bias (+1) to eliminate zero-offset attack vectors by hellobertrand · Pull Request #104 · hellobertrand/zxc

hellobertrand · 2026-02-20T07:58:10Z

This PR introduces an offset bias to the ZXC format's LZ77 sequences. Offsets are now encoded as actual_offset - 1 and decoded as stored_value + 1 (defined via a new constant ZXC_LZ_OFFSET_BIAS).
This fundamentally changes the format's safety guarantees by making offset == 0 impossible by construction. A crafted 0 value in the offset stream now safely translates to a decoded offset of 1 (which acts as a valid Run-Length Encoding / RLE operation), dismantling a major class of out-of-bounds / uninitialized memory read vulnerabilities at the structural level.

Impact & Benchmarks

Security: Attack surface for fuzzers is measurably reduced. Zero reliance on runtime conditionals to prevent offset-0 crashes.
Format compatibility: Format-breaking change. Previous versions of the decoder will fail to decompress new files correctly (and vice versa).

codecov · 2026-02-20T07:59:05Z

Codecov Report

❌ Patch coverage is 94.73684% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/lib/zxc_driver.c	71.42%	4 Missing ⚠️
src/lib/zxc_dispatch.c	96.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

* Improves CLI benchmark mode Updates the benchmark mode to run for a specified duration instead of a fixed number of iterations. It now measures and reports the best (fastest) compression and decompression speeds achieved during the test. Also prints number of iterations performed. This change provides more consistent and reliable benchmark results, as it accounts for variations in system performance and allows for longer test runs to capture peak speeds. * Refines benchmark output and speed calculation Updates benchmark to use MB/s Removes warm-up iterations as they do not significantly impact results. Simplifies JSON output and standard output messages. * Adds progress updates for benchmark runs Improves the benchmark tool's user experience by adding progress updates during compression and decompression. The updates display the number of iterations and the elapsed time, giving the user a better indication of the tool's progress. This change only affects the CLI tool when JSON output is disabled and the quiet mode is not enabled. * Improves benchmark mode output and const correctness Fix test_cli.sh in JSON benchmark mode. Also, adds `const` to several variables within the benchmark function to reflect that they are not modified after initialization, improving code readability and preventing accidental modification.

Introduces a comprehensive set of error codes to the ZXC library, enhancing error reporting and handling. Functions now return negative zxc_error_t codes on failure, allowing for more precise error diagnosis. Includes a utility function, zxc_error_name(), to retrieve human-readable error messages for debugging.

Adds a new unit test to verify the functionality of the error code name lookup.

This test suite covers a wide range of error conditions, including null pointers, zero-sized buffers, insufficient buffer capacity, corrupted headers, truncated data, size mismatches, checksum failures, and other potential issues. It ensures the functions correctly identify and report these errors by returning appropriate error codes or negative values.

Adds unit tests to validate stream-based compression and decompression error codes. These tests cover scenarios such as null input, small files, bad magic numbers, corrupted footers and checksums, and truncated files.

Improves error code testing by reducing the destination buffer size to trigger specific error conditions more reliably and consistently.

Updates the `zxc_error_name` function to accept a constant integer to prevent potential unintended modifications of the error code.

Ensures all functions return the ZXC_OK macro upon successful completion, improving code readability and consistency across the codebase.

Addresses inconsistencies in documentation regarding code formatting and parameter ranges across various header and source files. Specifically, it replaces en-dashes with hyphens in numeric ranges and ensures consistent use of symbols.

Refines documentation by correcting minor inaccuracies and improving clarity. Replaces `@typedef` with `@struct` for struct documentation. Simplifies code by removing redundant `#ifndef` guards around `ZXC_DEPRECATED` macro. Adds branch prediction hints and memory alignment/optimization macros to improve performance.

Introduces a Doxyfile to automate the generation of API documentation. This configuration file sets up Doxygen to extract documentation from the source code, improving maintainability and usability of the library.

Introduces comprehensive error handling for the zxc library. Defines specific error codes and maps them to Rust `Error` enum variants, allowing for more precise error reporting and handling in Rust code. Also includes descriptive error messages for better debugging and user experience.

Ensures that the `fread` calls in the error handling tests are successful. If `fread` fails, the function now prints an error message, frees allocated memory, closes the file, and returns, preventing potential issues with subsequent operations on incomplete data.

Corrects the numeric block probing logic to accurately handle block sizes that are multiples of `uint32_t`, ensuring proper detection of numeric arrays.

Adds a check to ensure that the number of values to be decoded does not exceed the number of values remaining. This prevents potential data corruption issues during decompression.

Ensures that the varint reader advances the pointer to 'end + 1' instead of potentially stopping within the bounds of 'end' when insufficient bytes are available, preventing a potential out-of-bounds read in subsequent operations.

Corrects an offset validation to prevent potential out-of-bounds memory access during decompression. The validation logic was flawed, leading to incorrect offset checks.

Addresses potential buffer overflows by using PATH_MAX for temporary buffers in output path validation and generation. Also, adds an explicit check for output path length to prevent truncation when the auto-generated output path would exceed the buffer size.

Adds a check to ensure the number of values to decode does not exceed the remaining values in the block, preventing potential data corruption.

Ensures correct buffer size calculation when handling NUM blocks in compression and decompression routines.

Implements a bias in the offset stream, subtracting 1 during compression and adding 1 during decompression. This ensures that offsets are always non-zero, preventing potential division-by-zero errors and memory safety issues when decoding. This change also simplifies the decoder logic by removing an explicit zero-offset check.

Updates the file format version to 5 in the documentation and internal header file.

Applies a constant bias to LZ77 offsets during both encoding and decoding to ensure that offsets are always positive.

This unifies NEON64 and NEON32 implementations by using a single conditional compilation flag, avoiding redundant code and enabling ARMv7 support.

Adds new numeric data generation functions to improve test coverage: - `gen_num_data_zero`: Generates data with zero deltas. - `gen_num_data_small`: Generates data with small alternating deltas. - `gen_num_data_large`: Generates data with very large deltas. Adds a large file test case (15MB) to stress block boundaries, including NUM block testing. This helps ensure the compression and decompression logic correctly handles multi-block scenarios, particularly with different data patterns.

hellobertrand marked this pull request as draft February 20, 2026 08:07

hellobertrand force-pushed the feat/offset-pos branch 2 times, most recently from 2a65c1f to bc6e9b1 Compare February 21, 2026 08:55

hellobertrand force-pushed the main branch from 8e45030 to 475b84f Compare February 21, 2026 14:10

hellobertrand added 25 commits February 21, 2026 15:10

Adds unit test for error code names

ea341eb

Adds a new unit test to verify the functionality of the error code name lookup.

Adds stream error code tests

d28abba

Adds unit tests to validate stream-based compression and decompression error codes. These tests cover scenarios such as null input, small files, bad magic numbers, corrupted footers and checksums, and truncated files.

Fix error code handling in tests

af00bee

Improves error code testing by reducing the destination buffer size to trigger specific error conditions more reliably and consistently.

Updates argument type for error name function

f94d205

Updates the `zxc_error_name` function to accept a constant integer to prevent potential unintended modifications of the error code.

Refactors return values for consistency

e435552

Ensures all functions return the ZXC_OK macro upon successful completion, improving code readability and consistency across the codebase.

Update functions documentation

b4671ff

Update documentation to use Doxygen style

08c4638

Update documentation to use Doxygen style

8e2919f

Fixes non ASCII characters into documentation

c450846

Addresses inconsistencies in documentation regarding code formatting and parameter ranges across various header and source files. Specifically, it replaces en-dashes with hyphens in numeric ranges and ensures consistent use of symbols.

Adds Doxyfile for documentation generation

5431a05

Introduces a Doxyfile to automate the generation of API documentation. This configuration file sets up Doxygen to extract documentation from the source code, improving maintainability and usability of the library.

Fix non-ASCII character

0967ea7

Refines numeric block probing

d94a2ff

Corrects the numeric block probing logic to accurately handle block sizes that are multiples of `uint32_t`, ensuring proper detection of numeric arrays.

Fix potential NUM decoder underflow

6388c39

Adds a check to ensure that the number of values to be decoded does not exceed the number of values remaining. This prevents potential data corruption issues during decompression.

Fixes potential out-of-bounds read in varint decoding

21bc654

Ensures that the varint reader advances the pointer to 'end + 1' instead of potentially stopping within the bounds of 'end' when insufficient bytes are available, preventing a potential out-of-bounds read in subsequent operations.

Corrects offset validation

4e0cc5c

Corrects an offset validation to prevent potential out-of-bounds memory access during decompression. The validation logic was flawed, leading to incorrect offset checks.

Fixes potential NUM decoder underflow

74f7dcd

Adds a check to ensure the number of values to decode does not exceed the remaining values in the block, preventing potential data corruption.

Introduces constant for chunk header size

9fa9184

Ensures correct buffer size calculation when handling NUM blocks in compression and decompression routines.

Format code

4a6c2ad

hellobertrand added 4 commits February 21, 2026 15:11

Increments file format version to 5.

b13d2cb

Updates the file format version to 5 in the documentation and internal header file.

Uses constant offset bias to LZ77 compression

9955e6d

Applies a constant bias to LZ77 offsets during both encoding and decoding to ensure that offsets are always positive.

Unifies NEON64 + NEON32 (block NUM) implementations

4558258

This unifies NEON64 and NEON32 implementations by using a single conditional compilation flag, avoiding redundant code and enabling ARMv7 support.

hellobertrand force-pushed the feat/offset-pos branch from bc6e9b1 to 4558258 Compare February 21, 2026 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Format: Implement LZ offset bias (+1) to eliminate zero-offset attack vectors#104

Format: Implement LZ offset bias (+1) to eliminate zero-offset attack vectors#104
hellobertrand wants to merge 30 commits intomainfrom
feat/offset-pos

hellobertrand commented Feb 20, 2026

Uh oh!

codecov bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

hellobertrand commented Feb 20, 2026

Impact & Benchmarks

Uh oh!

codecov bot commented Feb 20, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant