Potential Performance Optimizations

Preface: I recognize the nature of ESP-based mining: these efforts are extremely unlikely to result in finding a block. I recently flashed my ESP32 - S3 board to NerdMiner_v2 (open source for the win!) and found that the hash rate dropped to around 30% of the closed source's rate. 

I asked GPT 5.1 Codex to examine the code and suggest performance improvements, specifically in the mining routines. Follows are the recommendations. I could ask the LLM to vibe code these, and then I could submit a PR, but a) I do not have the dev env set up to run even rudimentary tests nor b) am I a C++ developer nor c) do I have much understanding of mining space. Basically, no one should accept any code from me...

I offer these for consideration.

# Hashrate Optimization Opportunities

## Summary
- Focus: improve SHA-256d throughput in `src/mining.cpp`, `src/ShaTests/nerdSHA256plus.cpp`, and hardware miner paths.
- Target: approach 3× hashrate increase via batching, hardware pipelining, and memory-layout changes.

## Key Bottlenecks
1. **Dynamic job queues** (`src/mining.cpp:142-205`): frequent `std::shared_ptr` allocations and mutex contention for every 4K nonce chunk.
2. **Per-nonce schedule rebuild** (`src/ShaTests/nerdSHA256plus.cpp:494-575`): the SHA message schedule is recomputed from scratch even though only the nonce word changes.
3. **Idle hardware accelerator** (`src/mining.cpp:833-889`): ESP SHA engine sits idle while the CPU refills registers, losing ~40% of potential throughput.
4. **Hot data in slow memory** (`src/mining.cpp:612-633`): block header buffers and bake tables live in heap memory that may spill to PSRAM.
5. **Expensive share validation** (`src/utils.cpp:78-148`): floating-point difficulty checks and full hash copies run even for weak shares.

## Recommendations

### 1. Lock-Free, Preallocated Job Queues (≈10‑15%)
- Replace `std::list<std::shared_ptr<JobRequest>>` with fixed-size ring buffers in DRAM.
- Keep 4 preallocated `JobRequest` structs per worker; update only `nonce_start` when refilling work.
- Increase `NONCE_PER_JOB_SW` to ~65 536 to amortize queue management overhead.

### 2. Batched Message Schedule for Software Miner (1.8‑2.5×)
- Precompute the SHA message schedule for the first nonce, then update only the words affected by the nonce increment.
- Maintain two register sets so that rounds 0‑19 for nonce _N+1_ are prepared while finishing rounds for nonce _N_.
- This removes ~60% of the per-nonce instruction count in `nerd_sha256d_baked`.

### 3. Pipeline the Hardware SHA Engine (1.5‑2× on S2/S3/C3)
- Alternate two nonce slots: load digest once, then overlap register writes with accelerator execution.
- Use DMA (`sha_hal_hash_dma`) for 64-byte transfers and trigger continue/start without waiting after each register write.
- Add early hash byte checks before calling `diff_from_target` to avoid needless float math.

### 4. Keep Hot Buffers in Fast Memory & Add Fast Rejects (≈15‑20%)
- Tag `miner_data.bytearray_blockheader`, job `midstate`, and `bake` arrays with `DRAM_ATTR`/`IRAM_ATTR`.
- Extend the existing 16-bit share precheck to 24–32 bits so most hashes exit before the expensive difficulty calculation.
- Replace floating-point `diff_from_target` with fixed-point math derived from `nbits`.

### 5. Fully Utilize Both Cores / Workers (≈5‑10%)
- Run two software workers plus one hardware worker even when hardware SHA is enabled; pin tasks per core to avoid contention.
- Expose a `workers_per_core` setting so high-clock boards can scale threads within WDT limits.

## Validation Checklist
- Instrument nonce batches with `esp_timer_get_time()` to capture cycles/hash before & after each change.
- Monitor heap usage once ring buffers replace dynamic lists to ensure no regressions.
- Run 30‑minute pool sessions after every major change to confirm the expected hashrate uplift and stability.
- Add regression tests that replay known block headers through the new batched scheduler to verify correctness.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Performance Optimizations #740

Hashrate Optimization Opportunities

Summary

Key Bottlenecks

Recommendations

1. Lock-Free, Preallocated Job Queues (≈10‑15%)

2. Batched Message Schedule for Software Miner (1.8‑2.5×)

3. Pipeline the Hardware SHA Engine (1.5‑2× on S2/S3/C3)

4. Keep Hot Buffers in Fast Memory & Add Fast Rejects (≈15‑20%)

5. Fully Utilize Both Cores / Workers (≈5‑10%)

Validation Checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential Performance Optimizations #740

Description

Hashrate Optimization Opportunities

Summary

Key Bottlenecks

Recommendations

1. Lock-Free, Preallocated Job Queues (≈10‑15%)

2. Batched Message Schedule for Software Miner (1.8‑2.5×)

3. Pipeline the Hardware SHA Engine (1.5‑2× on S2/S3/C3)

4. Keep Hot Buffers in Fast Memory & Add Fast Rejects (≈15‑20%)

5. Fully Utilize Both Cores / Workers (≈5‑10%)

Validation Checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions