-
Notifications
You must be signed in to change notification settings - Fork 568
Open
Description
Preface: I recognize the nature of ESP-based mining: these efforts are extremely unlikely to result in finding a block. I recently flashed my ESP32 - S3 board to NerdMiner_v2 (open source for the win!) and found that the hash rate dropped to around 30% of the closed source's rate.
I asked GPT 5.1 Codex to examine the code and suggest performance improvements, specifically in the mining routines. Follows are the recommendations. I could ask the LLM to vibe code these, and then I could submit a PR, but a) I do not have the dev env set up to run even rudimentary tests nor b) am I a C++ developer nor c) do I have much understanding of mining space. Basically, no one should accept any code from me...
I offer these for consideration.
Hashrate Optimization Opportunities
Summary
- Focus: improve SHA-256d throughput in
src/mining.cpp,src/ShaTests/nerdSHA256plus.cpp, and hardware miner paths. - Target: approach 3× hashrate increase via batching, hardware pipelining, and memory-layout changes.
Key Bottlenecks
- Dynamic job queues (
src/mining.cpp:142-205): frequentstd::shared_ptrallocations and mutex contention for every 4K nonce chunk. - Per-nonce schedule rebuild (
src/ShaTests/nerdSHA256plus.cpp:494-575): the SHA message schedule is recomputed from scratch even though only the nonce word changes. - Idle hardware accelerator (
src/mining.cpp:833-889): ESP SHA engine sits idle while the CPU refills registers, losing ~40% of potential throughput. - Hot data in slow memory (
src/mining.cpp:612-633): block header buffers and bake tables live in heap memory that may spill to PSRAM. - Expensive share validation (
src/utils.cpp:78-148): floating-point difficulty checks and full hash copies run even for weak shares.
Recommendations
1. Lock-Free, Preallocated Job Queues (≈10‑15%)
- Replace
std::list<std::shared_ptr<JobRequest>>with fixed-size ring buffers in DRAM. - Keep 4 preallocated
JobRequeststructs per worker; update onlynonce_startwhen refilling work. - Increase
NONCE_PER_JOB_SWto ~65 536 to amortize queue management overhead.
2. Batched Message Schedule for Software Miner (1.8‑2.5×)
- Precompute the SHA message schedule for the first nonce, then update only the words affected by the nonce increment.
- Maintain two register sets so that rounds 0‑19 for nonce N+1 are prepared while finishing rounds for nonce N.
- This removes ~60% of the per-nonce instruction count in
nerd_sha256d_baked.
3. Pipeline the Hardware SHA Engine (1.5‑2× on S2/S3/C3)
- Alternate two nonce slots: load digest once, then overlap register writes with accelerator execution.
- Use DMA (
sha_hal_hash_dma) for 64-byte transfers and trigger continue/start without waiting after each register write. - Add early hash byte checks before calling
diff_from_targetto avoid needless float math.
4. Keep Hot Buffers in Fast Memory & Add Fast Rejects (≈15‑20%)
- Tag
miner_data.bytearray_blockheader, jobmidstate, andbakearrays withDRAM_ATTR/IRAM_ATTR. - Extend the existing 16-bit share precheck to 24–32 bits so most hashes exit before the expensive difficulty calculation.
- Replace floating-point
diff_from_targetwith fixed-point math derived fromnbits.
5. Fully Utilize Both Cores / Workers (≈5‑10%)
- Run two software workers plus one hardware worker even when hardware SHA is enabled; pin tasks per core to avoid contention.
- Expose a
workers_per_coresetting so high-clock boards can scale threads within WDT limits.
Validation Checklist
- Instrument nonce batches with
esp_timer_get_time()to capture cycles/hash before & after each change. - Monitor heap usage once ring buffers replace dynamic lists to ensure no regressions.
- Run 30‑minute pool sessions after every major change to confirm the expected hashrate uplift and stability.
- Add regression tests that replay known block headers through the new batched scheduler to verify correctness.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels