-
Notifications
You must be signed in to change notification settings - Fork 72
perf: cache framer extension values on Conn and pool framer structs #800
Description
Problem
Every call to Conn.exec() and Conn.recv() invokes newFramerWithExts(), which:
- Allocates a new
framerstruct on the heap - Allocates a 128-byte buffer (
make([]byte, defaultBufSize)) that grows dynamically per frame - Scans
cqlProtoExtsthree times viafindCQLProtoExtByName()to extractflagLWT,rateLimitingErrorCode, andtabletsRoutingV1-- values that are constant for the lifetime of a connection - Recomputes
flagsandprotofromcompressorandversion-- also constant per connection
There are 4 call sites in conn.go (lines 849, 859, 890, 1219), all passing identical arguments: c.compressor, c.version, c.cqlProtoExts, c.logger. This means every query round-trip does 2 allocations and 3 linear scans that produce the same results every time.
Additionally, setTabletSupported(framer.tabletsRoutingV1) is called after every newFramerWithExts -- an atomic.StoreInt32 that stores the same value each time.
Proposed Fix
-
Cache extension-derived values on
Conn: ComputeflagLWT,rateLimitingErrorCode,tabletsRoutingV1,flags, andprotoonce during connection setup and store them on theConnstruct. Eliminate the 3xfindCQLProtoExtByNamescans and redundantsetTabletSupportedcalls per query. -
Pool
framerstructs: Usesync.Poolto reuseframerstructs and their backing buffers across queries, eliminating 2 heap allocations per query.
Expected Impact
Based on CPU profiling of the benchmark suite, newFramerWithExts and runtime.mallocgc are significant contributors to per-query CPU time. This optimization should reduce allocations by 2 per query and eliminate redundant computation on every frame operation.