Format: Implement LZ offset bias (+1) to eliminate zero-offset attack vectors#104
Draft
hellobertrand wants to merge 30 commits intomainfrom
Draft
Format: Implement LZ offset bias (+1) to eliminate zero-offset attack vectors#104hellobertrand wants to merge 30 commits intomainfrom
hellobertrand wants to merge 30 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
2a65c1f to
bc6e9b1
Compare
* Improves CLI benchmark mode Updates the benchmark mode to run for a specified duration instead of a fixed number of iterations. It now measures and reports the best (fastest) compression and decompression speeds achieved during the test. Also prints number of iterations performed. This change provides more consistent and reliable benchmark results, as it accounts for variations in system performance and allows for longer test runs to capture peak speeds. * Refines benchmark output and speed calculation Updates benchmark to use MB/s Removes warm-up iterations as they do not significantly impact results. Simplifies JSON output and standard output messages. * Adds progress updates for benchmark runs Improves the benchmark tool's user experience by adding progress updates during compression and decompression. The updates display the number of iterations and the elapsed time, giving the user a better indication of the tool's progress. This change only affects the CLI tool when JSON output is disabled and the quiet mode is not enabled. * Improves benchmark mode output and const correctness Fix test_cli.sh in JSON benchmark mode. Also, adds `const` to several variables within the benchmark function to reflect that they are not modified after initialization, improving code readability and preventing accidental modification.
Introduces a comprehensive set of error codes to the ZXC library, enhancing error reporting and handling. Functions now return negative zxc_error_t codes on failure, allowing for more precise error diagnosis. Includes a utility function, zxc_error_name(), to retrieve human-readable error messages for debugging.
Adds a new unit test to verify the functionality of the error code name lookup.
This test suite covers a wide range of error conditions, including null pointers, zero-sized buffers, insufficient buffer capacity, corrupted headers, truncated data, size mismatches, checksum failures, and other potential issues. It ensures the functions correctly identify and report these errors by returning appropriate error codes or negative values.
Adds unit tests to validate stream-based compression and decompression error codes. These tests cover scenarios such as null input, small files, bad magic numbers, corrupted footers and checksums, and truncated files.
Improves error code testing by reducing the destination buffer size to trigger specific error conditions more reliably and consistently.
Updates the `zxc_error_name` function to accept a constant integer to prevent potential unintended modifications of the error code.
Ensures all functions return the ZXC_OK macro upon successful completion, improving code readability and consistency across the codebase.
Addresses inconsistencies in documentation regarding code formatting and parameter ranges across various header and source files. Specifically, it replaces en-dashes with hyphens in numeric ranges and ensures consistent use of symbols.
Refines documentation by correcting minor inaccuracies and improving clarity. Replaces `@typedef` with `@struct` for struct documentation. Simplifies code by removing redundant `#ifndef` guards around `ZXC_DEPRECATED` macro. Adds branch prediction hints and memory alignment/optimization macros to improve performance.
Introduces a Doxyfile to automate the generation of API documentation. This configuration file sets up Doxygen to extract documentation from the source code, improving maintainability and usability of the library.
Introduces comprehensive error handling for the zxc library. Defines specific error codes and maps them to Rust `Error` enum variants, allowing for more precise error reporting and handling in Rust code. Also includes descriptive error messages for better debugging and user experience.
Ensures that the `fread` calls in the error handling tests are successful. If `fread` fails, the function now prints an error message, frees allocated memory, closes the file, and returns, preventing potential issues with subsequent operations on incomplete data.
Corrects the numeric block probing logic to accurately handle block sizes that are multiples of `uint32_t`, ensuring proper detection of numeric arrays.
Adds a check to ensure that the number of values to be decoded does not exceed the number of values remaining. This prevents potential data corruption issues during decompression.
Ensures that the varint reader advances the pointer to 'end + 1' instead of potentially stopping within the bounds of 'end' when insufficient bytes are available, preventing a potential out-of-bounds read in subsequent operations.
Corrects an offset validation to prevent potential out-of-bounds memory access during decompression. The validation logic was flawed, leading to incorrect offset checks.
Addresses potential buffer overflows by using PATH_MAX for temporary buffers in output path validation and generation. Also, adds an explicit check for output path length to prevent truncation when the auto-generated output path would exceed the buffer size.
Adds a check to ensure the number of values to decode does not exceed the remaining values in the block, preventing potential data corruption.
Ensures correct buffer size calculation when handling NUM blocks in compression and decompression routines.
Implements a bias in the offset stream, subtracting 1 during compression and adding 1 during decompression. This ensures that offsets are always non-zero, preventing potential division-by-zero errors and memory safety issues when decoding. This change also simplifies the decoder logic by removing an explicit zero-offset check.
Updates the file format version to 5 in the documentation and internal header file.
Applies a constant bias to LZ77 offsets during both encoding and decoding to ensure that offsets are always positive.
This unifies NEON64 and NEON32 implementations by using a single conditional compilation flag, avoiding redundant code and enabling ARMv7 support.
bc6e9b1 to
4558258
Compare
Adds new numeric data generation functions to improve test coverage: - `gen_num_data_zero`: Generates data with zero deltas. - `gen_num_data_small`: Generates data with small alternating deltas. - `gen_num_data_large`: Generates data with very large deltas. Adds a large file test case (15MB) to stress block boundaries, including NUM block testing. This helps ensure the compression and decompression logic correctly handles multi-block scenarios, particularly with different data patterns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces an offset bias to the ZXC format's LZ77 sequences. Offsets are now encoded as
actual_offset - 1and decoded asstored_value + 1(defined via a new constantZXC_LZ_OFFSET_BIAS).This fundamentally changes the format's safety guarantees by making
offset == 0impossible by construction. A crafted0value in the offset stream now safely translates to a decoded offset of1(which acts as a valid Run-Length Encoding / RLE operation), dismantling a major class of out-of-bounds / uninitialized memory read vulnerabilities at the structural level.Impact & Benchmarks