Skip to content

Potential Performance Optimizations #740

@schnee

Description

@schnee

Preface: I recognize the nature of ESP-based mining: these efforts are extremely unlikely to result in finding a block. I recently flashed my ESP32 - S3 board to NerdMiner_v2 (open source for the win!) and found that the hash rate dropped to around 30% of the closed source's rate.

I asked GPT 5.1 Codex to examine the code and suggest performance improvements, specifically in the mining routines. Follows are the recommendations. I could ask the LLM to vibe code these, and then I could submit a PR, but a) I do not have the dev env set up to run even rudimentary tests nor b) am I a C++ developer nor c) do I have much understanding of mining space. Basically, no one should accept any code from me...

I offer these for consideration.

Hashrate Optimization Opportunities

Summary

  • Focus: improve SHA-256d throughput in src/mining.cpp, src/ShaTests/nerdSHA256plus.cpp, and hardware miner paths.
  • Target: approach 3× hashrate increase via batching, hardware pipelining, and memory-layout changes.

Key Bottlenecks

  1. Dynamic job queues (src/mining.cpp:142-205): frequent std::shared_ptr allocations and mutex contention for every 4K nonce chunk.
  2. Per-nonce schedule rebuild (src/ShaTests/nerdSHA256plus.cpp:494-575): the SHA message schedule is recomputed from scratch even though only the nonce word changes.
  3. Idle hardware accelerator (src/mining.cpp:833-889): ESP SHA engine sits idle while the CPU refills registers, losing ~40% of potential throughput.
  4. Hot data in slow memory (src/mining.cpp:612-633): block header buffers and bake tables live in heap memory that may spill to PSRAM.
  5. Expensive share validation (src/utils.cpp:78-148): floating-point difficulty checks and full hash copies run even for weak shares.

Recommendations

1. Lock-Free, Preallocated Job Queues (≈10‑15%)

  • Replace std::list<std::shared_ptr<JobRequest>> with fixed-size ring buffers in DRAM.
  • Keep 4 preallocated JobRequest structs per worker; update only nonce_start when refilling work.
  • Increase NONCE_PER_JOB_SW to ~65 536 to amortize queue management overhead.

2. Batched Message Schedule for Software Miner (1.8‑2.5×)

  • Precompute the SHA message schedule for the first nonce, then update only the words affected by the nonce increment.
  • Maintain two register sets so that rounds 0‑19 for nonce N+1 are prepared while finishing rounds for nonce N.
  • This removes ~60% of the per-nonce instruction count in nerd_sha256d_baked.

3. Pipeline the Hardware SHA Engine (1.5‑2× on S2/S3/C3)

  • Alternate two nonce slots: load digest once, then overlap register writes with accelerator execution.
  • Use DMA (sha_hal_hash_dma) for 64-byte transfers and trigger continue/start without waiting after each register write.
  • Add early hash byte checks before calling diff_from_target to avoid needless float math.

4. Keep Hot Buffers in Fast Memory & Add Fast Rejects (≈15‑20%)

  • Tag miner_data.bytearray_blockheader, job midstate, and bake arrays with DRAM_ATTR/IRAM_ATTR.
  • Extend the existing 16-bit share precheck to 24–32 bits so most hashes exit before the expensive difficulty calculation.
  • Replace floating-point diff_from_target with fixed-point math derived from nbits.

5. Fully Utilize Both Cores / Workers (≈5‑10%)

  • Run two software workers plus one hardware worker even when hardware SHA is enabled; pin tasks per core to avoid contention.
  • Expose a workers_per_core setting so high-clock boards can scale threads within WDT limits.

Validation Checklist

  • Instrument nonce batches with esp_timer_get_time() to capture cycles/hash before & after each change.
  • Monitor heap usage once ring buffers replace dynamic lists to ensure no regressions.
  • Run 30‑minute pool sessions after every major change to confirm the expected hashrate uplift and stability.
  • Add regression tests that replay known block headers through the new batched scheduler to verify correctness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions