Skip to content

added parallelization to the actual cross-checking part of Crosscheck…#2033

Draft
yfarjoun wants to merge 1 commit intobroadinstitute:masterfrom
yfarjoun:yf_use_threads_in_crosscheck
Draft

added parallelization to the actual cross-checking part of Crosscheck…#2033
yfarjoun wants to merge 1 commit intobroadinstitute:masterfrom
yfarjoun:yf_use_threads_in_crosscheck

Conversation

@yfarjoun
Copy link
Contributor

Summary

Parallelizes fingerprint comparisons using the existing NUM_THREADS parameter.

Previously, while fingerprint loading used multiple threads, the actual comparisons were performed sequentially. This PR adds parallelization to the two comparison methods:

  • crossCheckFingerprints() - pairwise comparison of all fingerprint groups
  • checkFingerprintsBySample() - sample-by-sample comparison when using SECOND_INPUT

Changes

  • A few thread-safety changes
  • Modified crossCheckFingerprints() to use a parallel IntStream over row indices with a custom ForkJoinPool(NUM_THREADS)
  • Modified checkFingerprintsBySample() to use parallelStream() with a custom ForkJoinPool(NUM_THREADS)

Notes

  • The NUM_THREADS parameter now controls both fingerprint loading and comparisons
  • The matrix output (crosscheckMatrix) is safe without synchronization since each cell is written by exactly one (row, col) pair
  • Removed per-comparison progress logging since it's not meaningful with parallel execution; replaced with a single log message at the start showing total comparisons and
    thread count

Test plan

  • Existing CrosscheckFingerprintsTest tests pass
  • Verify parallel execution with NUM_THREADS > 1 produces identical results to single-threaded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant