Cache Tags objects to eliminate per-call allocations on metric hot paths#74
Cache Tags objects to eliminate per-call allocations on metric hot paths#74sakrah wants to merge 5 commits intoopensearch-project:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #74 +/- ##
============================================
+ Coverage 88.98% 89.01% +0.03%
- Complexity 4574 4581 +7
============================================
Files 302 302
Lines 13903 13912 +9
Branches 2093 2093
============================================
+ Hits 12371 12384 +13
+ Misses 928 924 -4
Partials 604 604
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
440f25f to
9959b2a
Compare
SourceBuilderVisitor: replace Tags.create().addTag() calls with static final Tags fields for the two pushdown mode values. PipelineStageExecutor: cache Tags per (stageName, stageType, location) in a static ConcurrentHashMap so pipeline stage metrics reuse the same Tags instance across executions. RestPromQLAction / RestM3QLAction: cache Tags by request tag map content (HashMap.equals/hashCode) in a static ConcurrentHashMap, avoiding Tags.create() on every request for repeated tag combinations. These changes ensure the same Tags instance flows through to M3MetricsRegistry's tag cache, which in turn feeds the same Map object to M3MetricsCollector's identity-based metric cache — achieving zero-allocation metric recording on cache hits. Signed-off-by: sakrah <sakrah@uber.com> Signed-off-by: Sam Akrah <sakrah@uber.com>
Signed-off-by: Sam Akrah <sakrah@uber.com>
Signed-off-by: Sam Akrah <sakrah@uber.com> Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: Sam Akrah <sakrah@uber.com>
9959b2a to
5d1ed0a
Compare
| } | ||
|
|
||
| static Tags getOrCreateTags(Map<String, String> tags) { | ||
| return requestTagsCache.computeIfAbsent(tags, m -> { |
There was a problem hiding this comment.
Mutable HashMap stored as ConcurrentHashMap key. If the map is ever mutated after this call, the cached entry becomes unreachable (hashCode drift). Consider Map.copyOf(tags) as the key.
Same applies to RestM3QLAction.incrementMetrics.
There was a problem hiding this comment.
tags as a local variable in prepareRequest will return after prepareRequest returns. There is no reference to it anywhere else. And your recommended fix — Map.copyOf(tags) — will create a new immutable map on every single call, including cache hits. That's an allocation on the hot path, which is exactly what this PR is trying to eliminate.
There was a problem hiding this comment.
you can wrap the tags in Collections.unmodifiableMap, which should be cheap, it's just an interface wrapper that doesn't allow .put etc calls. it doesn't protect against the data changing from underneath it if someone holds a reference to the underlying map, but like you said, in this case, it's local variable, so it should be fine.
and if for whatever reason someone tries to get the key from the map and mutate it later, it won't allow the mutation.
but same comment as the other PR, think this will slowly leak memory over time.
since this pattern is recurring in so many places, we should consider building the cache utility directly into the TSDBMetrics class so it's managed in a central place. not necessarily to automatically cache everything, but to initalize the cache and provide methods to easily access it and manage the life cycle etc.
| private void incrementMetrics(Map<String, String> tags) { | ||
| Tags finalTags = Tags.create(); | ||
| tags.forEach(finalTags::addTag); | ||
| Tags finalTags = requestTagsCache.computeIfAbsent(tags, m -> { |
There was a problem hiding this comment.
This duplicates the caching pattern from RestPromQLAction.getOrCreateTags. Consider extracting a shared utility (e.g. on TSDBMetrics) so the logic lives in one place.
There was a problem hiding this comment.
Both classes have their own cache on purpose — PromQL and M3QL are separate actions with separate tag sets. Merging them into one shared cache mixes unrelated data. And if we just share the conversion logic but keep separate caches, we're extracting a 3-line lambda. Not worth it for two call sites.
There was a problem hiding this comment.
No. I didn't mean on every call. Map.copyOf only needs to be inside the computeIfAbsent lambda (the cache-miss path). But I guess it's fine
| tags.put("query_type", "instant"); | ||
| tags.put("status", "200"); | ||
|
|
||
| Tags result1 = RestPromQLAction.getOrCreateTags(tags); |
There was a problem hiding this comment.
This passes the same HashMap instance twice. Production creates a new HashMap per request. Use two separate maps with equal content to actually exercise the content-based cache lookup:
Map<String, String> tags2 = new HashMap<>(tags);
Tags result2 = RestPromQLAction.getOrCreateTags(tags2);
assertSame(result1, result2);…nces Signed-off-by: Sam Akrah <sakrah@uber.com>
| * freshly-created {@link Tags} object. The downstream M3 bridge caches | ||
| * the {@link Tags} → {@code Map<String,String>} conversion | ||
| * and the resolved Tally metric handle using <b>object identity</b> | ||
| * (OpenSearch's {@link Tags} does not override {@code equals}/{@code hashCode}). | ||
| * Passing the same instance on every call keeps the hot path allocation-free; | ||
| * a new instance with identical content will still work correctly but will | ||
| * bypass both caches, re-allocating a {@code HashMap} and re-resolving the | ||
| * Tally scope on every call.</p> |
There was a problem hiding this comment.
downstream M3 bridge
this implementation is telemetry implementation agnostic, so this should be reworded and all "m3" and "tally" references should be removed. the optimizations should be applicable to all implementations
There was a problem hiding this comment.
agreed here. Maybe we don't even need this, Npw that we have equal/Hashcode added to Tags class.
Signed-off-by: Sam Akrah <sakrah@uber.com>
Description
Profiling under production-like bulk indexing load revealed that a bulk of CPU was spent in the metrics hot path, primarily due to repeated
Tags.create().addTag(...)allocations on every request. This PR cachesTagsinstances so that repeated metric calls with the same tag values reuse the same object, achieving zero-allocation metric recording on cache hits.Changes
SourceBuilderVisitor: Replace inlineTags.create().addTag("mode", ...)withstatic final Tagsfields for the two pushdown mode valuesPipelineStageExecutor: CacheTagsper(stageName, stageType, location)in a staticConcurrentHashMapso pipeline stage metrics reuse the same instance across executionsRestPromQLAction/RestM3QLAction: CacheTagsby request tag map content (HashMap.equals/hashCode) in a staticConcurrentHashMap, avoidingTags.create()on every request for repeated tag combinationsRestPromQLActionTests: Add test forgetOrCreateTagscaching behaviorPerformance Impact
Flame graph analysis (30K-sample profiles under bulk indexing):
85% reduction in hot-path metric CPU overhead.
Test Plan
testGetOrCreateTagsCachesAndReturnsforRestPromQLActiontag cachingIssues Resolved
List any issues this PR will resolve, e.g. Closes [...].
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.