Skip to content

feat: Adds statistics to debug/vars for tag value cache size#27259

Open
devanbenz wants to merge 9 commits intomaster-1.xfrom
db/tag-v-cache-metrics
Open

feat: Adds statistics to debug/vars for tag value cache size#27259
devanbenz wants to merge 9 commits intomaster-1.xfrom
db/tag-v-cache-metrics

Conversation

@devanbenz
Copy link

@devanbenz devanbenz commented Feb 27, 2026

This commit adds metrics to track how many bytes the tag value cache is using

Example output for 3 databases:

  "database:_internal": {
    "name": "database",
    "tags": {
      "database": "_internal"
    },
    "values": {
      "numMeasurements": 13,
      "numSeries": 40,
      "tagValueCacheBytes": 144
    }
  },
  "database:foo": {
    "name": "database",
    "tags": {
      "database": "foo"
    },
    "values": {
      "numMeasurements": 5,
      "numSeries": 100000,
      "tagValueCacheBytes": 48
    }
  },
  "database:stress": {
    "name": "database",
    "tags": {
      "database": "stress"
    },
    "values": {
      "numMeasurements": 2,
      "numSeries": 12519999,
      "tagValueCacheBytes": 650654
    }
  },

// take an initial sample
atomic.StoreInt64(&i.cacheBytes, int64(i.tagValueCache.HeapSize()))

const cacheTrigger = 10 * time.Second
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 second updates should be fine 🤔

Copy link
Contributor

@davidby-influx davidby-influx Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why collect on a schedule other than Monitor? Run the code when there is a call to Shard.Statistics like we do for SeriesN. This lets the Monitor service control frequency of collection, and reduces complexity on start and stop.

@devanbenz
Copy link
Author

devanbenz commented Feb 27, 2026

There is a race condition for TestServer_BackupAndRestore caused by these changes. I'm currently investigating.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds visibility into TSI tag value cache memory usage by sampling cache heap size and exporting it via debug/vars database statistics.

Changes:

  • Add tagValueCacheBytes statistic to per-database store statistics.
  • Implement tag value cache heap-size estimation and periodic sampling in the TSI index.
  • Add tests validating TagValueSeriesIDCache.HeapSize() behavior (growth/eviction/nil set).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tsdb/store.go Adds tagValueCacheBytes into database statistics by aggregating index-reported cache bytes.
tsdb/shard.go Modifies shard close behavior when nil-ing the index reference.
tsdb/index/tsi1/index.go Adds sampled cache-bytes metric, a closing signal, and a background sampler goroutine.
tsdb/index/tsi1/cache.go Implements HeapSize() estimation for the tag value series ID cache.
tsdb/index/tsi1/cache_test.go Adds tests for HeapSize() growth/eviction/nil handling.
tsdb/index/inmem/inmem.go Implements TagValueCacheBytes() for in-memory index as always zero.
tsdb/index.go Extends the Index interface with TagValueCacheBytes().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@devanbenz devanbenz requested a review from davidby-influx March 2, 2026 17:07
@devanbenz devanbenz marked this pull request as ready for review March 2, 2026 17:07
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible to simplify this.

for name, mmap := range c.cache {
size += len(name) + int(unsafe.Sizeof(mmap))
for key, tkmap := range mmap {
size += len(key) + int(unsafe.Sizeof(tkmap))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For strings like key, is len(key) the size used, or is it unsafe.Sizeof(key) + len(key)

// take an initial sample
atomic.StoreInt64(&i.cacheBytes, int64(i.tagValueCache.HeapSize()))

const cacheTrigger = 10 * time.Second
Copy link
Contributor

@davidby-influx davidby-influx Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why collect on a schedule other than Monitor? Run the code when there is a call to Shard.Statistics like we do for SeriesN. This lets the Monitor service control frequency of collection, and reduces complexity on start and stop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants