Skip to content

Comments

Format: Implement LZ offset bias (+1) to eliminate zero-offset attack vectors#104

Draft
hellobertrand wants to merge 30 commits intomainfrom
feat/offset-pos
Draft

Format: Implement LZ offset bias (+1) to eliminate zero-offset attack vectors#104
hellobertrand wants to merge 30 commits intomainfrom
feat/offset-pos

Conversation

@hellobertrand
Copy link
Owner

This PR introduces an offset bias to the ZXC format's LZ77 sequences. Offsets are now encoded as actual_offset - 1 and decoded as stored_value + 1 (defined via a new constant ZXC_LZ_OFFSET_BIAS).
This fundamentally changes the format's safety guarantees by making offset == 0 impossible by construction. A crafted 0 value in the offset stream now safely translates to a decoded offset of 1 (which acts as a valid Run-Length Encoding / RLE operation), dismantling a major class of out-of-bounds / uninitialized memory read vulnerabilities at the structural level.

Impact & Benchmarks

  • Security: Attack surface for fuzzers is measurably reduced. Zero reliance on runtime conditionals to prevent offset-0 crashes.
  • Format compatibility: Format-breaking change. Previous versions of the decoder will fail to decompress new files correctly (and vice versa).

@codecov
Copy link

codecov bot commented Feb 20, 2026

Codecov Report

❌ Patch coverage is 94.73684% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/lib/zxc_driver.c 71.42% 4 Missing ⚠️
src/lib/zxc_dispatch.c 96.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@hellobertrand hellobertrand marked this pull request as draft February 20, 2026 08:07
@hellobertrand hellobertrand force-pushed the feat/offset-pos branch 2 times, most recently from 2a65c1f to bc6e9b1 Compare February 21, 2026 08:55
* Improves CLI benchmark mode

Updates the benchmark mode to run for a specified duration
instead of a fixed number of iterations. It now measures and
reports the best (fastest) compression and decompression speeds achieved during the test. Also prints number of iterations
performed.

This change provides more consistent and reliable benchmark
results, as it accounts for variations in system performance
and allows for longer test runs to capture peak speeds.

* Refines benchmark output and speed calculation

Updates benchmark to use MB/s
Removes warm-up iterations as they do not significantly impact results. Simplifies JSON output and standard output messages.

* Adds progress updates for benchmark runs

Improves the benchmark tool's user experience by adding progress updates during compression and decompression.

The updates display the number of iterations and the elapsed time, giving the user a better indication of the tool's progress.

This change only affects the CLI tool when JSON output is disabled and the quiet mode is not enabled.

* Improves benchmark mode output and const correctness

Fix test_cli.sh in JSON benchmark mode.

Also, adds `const` to several variables within the benchmark
function to reflect that they are not modified after initialization,
improving code readability and preventing accidental modification.
Introduces a comprehensive set of error codes to the ZXC library,
enhancing error reporting and handling.

Functions now return negative zxc_error_t codes on failure,
allowing for more precise error diagnosis.

Includes a utility function, zxc_error_name(), to retrieve
human-readable error messages for debugging.
Adds a new unit test to verify the functionality of the error code name lookup.
This test suite covers a wide range of error conditions, including null pointers, zero-sized buffers, insufficient buffer capacity, corrupted headers, truncated data, size mismatches, checksum failures, and other potential issues.

It ensures the functions correctly identify and report these errors by returning appropriate error codes or negative values.
Adds unit tests to validate stream-based compression and decompression error codes.

These tests cover scenarios such as null input, small files, bad magic numbers, corrupted footers and checksums, and truncated files.
Improves error code testing by reducing the destination buffer size to trigger specific error conditions more reliably and consistently.
Updates the `zxc_error_name` function to accept a constant integer to prevent potential unintended modifications of the error code.
Ensures all functions return the ZXC_OK macro upon successful completion, improving code readability and consistency across the codebase.
Addresses inconsistencies in documentation regarding code formatting and parameter ranges across various header and source files.
Specifically, it replaces en-dashes with hyphens in numeric ranges
and ensures consistent use of symbols.
Refines documentation by correcting minor inaccuracies and improving clarity.

Replaces `@typedef` with `@struct` for struct documentation.

Simplifies code by removing redundant `#ifndef` guards around `ZXC_DEPRECATED` macro.

Adds branch prediction hints and memory alignment/optimization macros to improve performance.
Introduces a Doxyfile to automate the generation of API documentation.

This configuration file sets up Doxygen to extract documentation from the source code, improving maintainability and usability of the library.
Introduces comprehensive error handling for the zxc library.
Defines specific error codes and maps them to Rust `Error` enum variants, allowing for more precise error reporting and handling in Rust code. Also includes descriptive error messages for better debugging and user experience.
Ensures that the `fread` calls in the error handling tests
are successful. If `fread` fails, the function now prints an error message, frees allocated memory, closes the file, and returns, preventing potential issues with subsequent operations on incomplete data.
Corrects the numeric block probing logic to accurately handle
block sizes that are multiples of `uint32_t`, ensuring
proper detection of numeric arrays.
Adds a check to ensure that the number of values to be
decoded does not exceed the number of values remaining.
This prevents potential data corruption issues during
decompression.
Ensures that the varint reader advances the pointer to 'end + 1' instead of potentially stopping within the bounds of 'end' when insufficient bytes are available, preventing a potential out-of-bounds read in subsequent operations.
Corrects an offset validation to prevent potential
out-of-bounds memory access during decompression. The
validation logic was flawed, leading to incorrect
offset checks.
Addresses potential buffer overflows by using PATH_MAX
for temporary buffers in output path validation and generation.

Also, adds an explicit check for output path length to prevent
truncation when the auto-generated output path would exceed the buffer size.
Adds a check to ensure the number of values to decode does not exceed the remaining values in the block, preventing potential data corruption.
Ensures correct buffer size calculation when handling NUM blocks in compression and decompression routines.
Implements a bias in the offset stream, subtracting 1 during compression and adding 1 during decompression.
This ensures that offsets are always non-zero, preventing potential division-by-zero errors and memory safety issues
when decoding. This change also simplifies the decoder logic by removing an explicit zero-offset check.
Updates the file format version to 5 in the documentation and internal header file.
Applies a constant bias to LZ77 offsets during both encoding and
decoding to ensure that offsets are always positive.
This unifies NEON64 and NEON32 implementations by using a single conditional compilation flag, avoiding redundant code and enabling ARMv7 support.
Adds new numeric data generation functions to improve test coverage:

- `gen_num_data_zero`: Generates data with zero deltas.
- `gen_num_data_small`: Generates data with small alternating deltas.
- `gen_num_data_large`: Generates data with very large deltas.

Adds a large file test case (15MB) to stress block boundaries, including NUM block testing.
This helps ensure the compression and decompression logic correctly handles multi-block scenarios, particularly with different data patterns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant