perf: SIMD acceleration and hot-path optimizations via Highway#663
Open
KimBioInfoStudio wants to merge 13 commits intoOpenGene:masterfrom
Open
perf: SIMD acceleration and hot-path optimizations via Highway#663KimBioInfoStudio wants to merge 13 commits intoOpenGene:masterfrom
KimBioInfoStudio wants to merge 13 commits intoOpenGene:masterfrom
Conversation
- Stack-allocate bloom filter positions array in duplicate.cpp (removes 2 malloc/free per read) - Replace temp buffer with direct append in read.cpp appendToString (removes 1 malloc/free per read output) - Use stack strings with move-to-heap handoff in peprocessor.cpp and seprocessor.cpp (removes up to 10 malloc/free per pack) Eliminates ~600M allocator operations on a typical 100M read-pair run. Output is bit-for-bit identical. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrate Google Highway (v1.3.0) as a git submodule to provide portable SIMD vectorization with runtime CPU dispatch (SSE4, AVX2, AVX-512, NEON, SVE). Four performance-critical functions are accelerated: - passFilter: vectorized quality threshold counting and N-base detection - passLowComplexityFilter: vectorized adjacent-difference counting - reverseComplement: parallel complement lookup + vector reversal - overlap analysis: vectorized mismatch counting for PE read alignment All unit tests pass and output is bit-identical to the scalar baseline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert three hot-path switch statements to static const lookup tables for branchless base-to-value dispatch: - stats.cpp base2val(): BASE2VAL[256] for kmer computation - duplicate.cpp seq2intvector(): SEQ_HASH_VAL[256] for bloom filter hashing - polyx.cpp trimPolyX(): POLYX_BASE_IDX[256] for poly-X tail trimming Eliminates branch mispredictions on every base in these per-read loops. All tests pass, output is bit-identical to baseline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace all hn::Load/Store with hn::LoadU/StoreU to prevent alignment faults on AVX2/AVX-512 (std::string data is not guaranteed to be 32/64-byte aligned) - Replace PromoteUpperTo with SumsOf2 for quality accumulation to ensure compatibility with HWY_SCALAR target - Fix misleading "may alias src" comment on reverseComplement (in-place operation is not safe with SIMD reverse) - Add comprehensive SIMD unit tests comparing all 4 functions against scalar reference implementations across edge cases (empty, len=1, non-aligned lengths, long strings) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 8x Eq + 8x IfThenElse comparison chain with a single And + TableLookupBytes using the low nibble of DNA base ASCII codes. DNA bases A/a(1), C/c(3), T/t(4), G/g(7) have unique low nibbles, enabling a 16-byte lookup table for complement mapping. Also add uncompressed (fq→fq) mode to the e2e benchmark script to better isolate CPU-bound performance from gzip overhead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the Highway git submodule and link against the system-installed libhwy (-lhwy). Users should ensure Highway is available via their package manager (e.g. brew install highway, apt install libhwy-dev). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These bundled zlib headers are unused since fastp switched to isa-l for gzip decoding. Remove to reduce source tree clutter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 tasks
Update CI to use latest runners and install Highway as system dependency. Build isa-l and libdeflate from source for consistent versions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Makefile: Linux fully static, macOS maximizes static linking (.a when available, fallback to dynamic) - Remove separate `make static` target; `make` handles both platforms - CI: use package manager for all deps instead of building from source - Prefer system-installed headers for isa-l and libdeflate via __has_include, with bundled headers as fallback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build Highway from source into /tmp/hwy-install instead of using brew (which only provides dylib). This produces a fully static fastp binary on macOS with zero 3rd-party runtime dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add countMismatchesBounded() that exits early when mismatches exceed the limit, avoiding unnecessary work. Replace the scalar inner loop in AdapterTrimmer::trimBySequence with the SIMD version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ubuntu 24.04's libhwy-dev is 1.0.7 which lacks SumsOf2 (added in 1.1.0). Ubuntu's libisal-dev only ships .so (no .a), breaking -static linking. Build both from source on Ubuntu to ensure static linking works. Update README to note Highway >= 1.1.0 requirement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
d337202 to
07354b8
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/zlib/headersBenchmark (1M PE pairs, 4 threads, Apple M4 Pro)
Output correctness verified: all 4 output files identical between upstream and optimized builds.
Dependencies
Highway is linked as a system library (
-lhwy). Install via:brew install highway(macOS) — CI builds static from sourceapt install libhwy-dev(Ubuntu 23.04+)conda install -c conda-forge libhwyBuild changes
makenow produces statically-linked binaries (Linux: full static, macOS: maximize static via.adiscovery)<libdeflate.h>,<isa-l/igzip_lib.h>) preferred over bundled, with__has_includefallbackubuntu-latest/macos-latestwith package manager dependenciesTest plan
fastp --versionruns correctlytestSimd())Supersedes #662.
🤖 Generated with Claude Code