fix: segment GRO/GSO-coalesced packets in PCAP receive path#1780
Merged
midwan merged 1 commit intoBlitterStudio:masterfrom Feb 12, 2026
Merged
fix: segment GRO/GSO-coalesced packets in PCAP receive path#1780midwan merged 1 commit intoBlitterStudio:masterfrom
midwan merged 1 commit intoBlitterStudio:masterfrom
Conversation
Hosts with GRO, GSO, or virtio NICs deliver TCP segments coalesced into packets far larger than the Ethernet MTU (up to 64KB). The PCAP backend silently dropped these because of a 1600-byte hard limit, forcing TCP into RTO-driven single-segment retransmission and reducing throughput to ~3-7 KB/s — roughly 100x slower than real hardware. Add TCP segmentation that splits oversized IPv4/TCP packets back into MSS-sized Ethernet frames with correct IP/TCP headers and checksums before queuing them for the A2065 emulation. Raise the receive queue depth from 10 to 50 to accommodate segmentation bursts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
midwan
approved these changes
Feb 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PCAP bridged networking delivers ~3-7 KB/s throughput on hosts with GRO, GSO, or virtio NICs — roughly 100x slower than real A2065 hardware. This PR fixes the root cause and restores expected throughput.
The Problem
Linux GRO (Generic Receive Offload), GSO, and virtio mergeable receive buffers coalesce multiple TCP segments into single oversized packets before delivering them to pcap sockets. A burst of 10 standard 1460-byte TCP segments arrives as a single 14,600-byte packet.
The PCAP backend had a hard-coded 1600-byte receive limit (
MAX_PSIZE):Every coalesced packet was silently dropped. The only packets that survived were single-segment retransmissions (1460 bytes) triggered after the TCP sender's retransmission timeout expired — typically 300ms on LAN, 500ms+ for remote servers. This forced TCP into a degenerate one-segment-per-RTO-cycle mode:
Disabling GRO/GSO/TSO via
ethtooldoes not help on virtualized hosts (Proxmox/KVM, Hyper-V, etc.) because the hypervisor coalesces packets in the virtual NIC before the guest kernel sees them.The Fix
Replace the hard drop with TCP segmentation. When the PCAP backend receives a packet larger than a standard Ethernet frame (1514 bytes):
Per-segment details:
The receive queue depth is also raised from 10 to 50. A single GRO-coalesced packet segments into 10+ frames, so the old limit would drop most segments from a single burst.
Testing
Tested on Debian 13 (kernel 6.12) running as a Proxmox VM with a virtio NIC. Emulated A4000/040 with A2065 NIC, Roadshow TCP/IP stack. GRO/GSO/TSO left enabled at default settings.
LAN FTP (200KB file):
Remote FTP (Aminet, transatlantic):
The LAN result (885 KB/s) exceeds real A4000/040 + A2065 benchmarks (400-600 KB/s), which is expected since the emulated 68040 runs at an effective clock speed higher than 25 MHz. The remote result (60 KB/s) is RTT-bound, consistent with Roadshow's 33KB TCP window over a ~170ms path.
Generated with Claude Code