Add IndexedByteLabels with embedded offset header for O(log n) label … by ZiwenWan · Pull Request #44 · opensearch-project/time-series-db

ZiwenWan · 2026-02-08T05:58:32Z

…lookups

Introduce IndexedByteLabels, a new Labels implementation that embeds an offset header directly in the byte array for O(log n) binary search on get() and has() with zero initialization cost at query time.

Byte format:
HEADER: [label_count:2][offset_0:2]...[offset_n-1:2]
DATA: [name_len][name][value_len][value]... (same as ByteLabels)

The offset header is written at index time and read directly during binary search — no lazy index building, no separate allocation.

Add BINARY_INDEXED_BYTESLABEL as a new LabelStorageType index setting (alongside BINARY and SORTED_SET) with its own storage reader.

Description

Describe what this change achieves.

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov · 2026-02-08T06:31:25Z

Codecov Report

❌ Patch coverage is 77.15232% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.44%. Comparing base (07f8f12) to head (7090f56).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
.../opensearch/tsdb/core/model/IndexedByteLabels.java	86.25%	17 Missing and 19 partials ⚠️
...opensearch/tsdb/core/mapping/LabelStorageType.java	17.39%	18 Missing and 1 partial ⚠️
...h/tsdb/core/reader/IndexedBinaryLabelsStorage.java	0.00%	11 Missing ⚠️
...ava/org/opensearch/tsdb/core/model/ByteLabels.java	60.00%	1 Missing and 1 partial ⚠️
...b/query/aggregator/TimeSeriesUnfoldAggregator.java	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main      #44      +/-   ##
============================================
- Coverage     88.61%   88.44%   -0.18%     
- Complexity     4083     4167      +84     
============================================
  Files           281      283       +2     
  Lines         12724    13018     +294     
  Branches       1879     1927      +48     
============================================
+ Hits          11276    11514     +238     
- Misses          902      938      +36     
- Partials        546      566      +20

Flag	Coverage Δ
unittests	`88.44% <77.15%> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...b/query/aggregator/TimeSeriesUnfoldAggregator.java	`80.93% <0.00%> (ø)`
...ava/org/opensearch/tsdb/core/model/ByteLabels.java	`92.23% <60.00%> (+2.86%)`	⬆️
...h/tsdb/core/reader/IndexedBinaryLabelsStorage.java	`0.00% <0.00%> (ø)`
...opensearch/tsdb/core/mapping/LabelStorageType.java	`64.81% <17.39%> (-29.48%)`	⬇️
.../opensearch/tsdb/core/model/IndexedByteLabels.java	`86.25% <86.25%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…lookups Introduce IndexedByteLabels, a new Labels implementation that embeds an offset header directly in the byte array for O(log n) binary search on get() and has() with zero initialization cost at query time. Byte format: HEADER: [label_count:2][offset_0:2]...[offset_n-1:2] DATA: [name_len][name][value_len][value]... (same as ByteLabels) The offset header is written at index time and read directly during binary search — no lazy index building, no separate allocation. Add BINARY_INDEXED_BYTESLABEL as a new LabelStorageType index setting (alongside BINARY and SORTED_SET) with its own storage reader. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Ziwen Wan <ziwen.wan@uber.com>

Jinny-Wang · 2026-02-09T22:22:12Z

src/main/java/org/opensearch/tsdb/core/model/ByteLabels.java

-        ByteLabels other = (ByteLabels) o;
-        return Arrays.equals(this.data, other.data);
+        if (o instanceof ByteLabels other) {
+            return Arrays.equals(this.data, other.data);


instead of directly jumping to Arrays.equals which can be O(n)
shall we try comparing the hashcode of the two ByteLabels first?

ZiwenWan requested review from itschrispeck, philiplhchan and yupeng9 as code owners February 8, 2026 05:58

ZiwenWan force-pushed the label-optimizations branch from b19eeed to 702aa12 Compare February 8, 2026 06:20

ZiwenWan marked this pull request as draft February 8, 2026 06:54

ZiwenWan force-pushed the label-optimizations branch 2 times, most recently from 54e11f9 to 551976c Compare February 9, 2026 07:39

Jinny-Wang reviewed Feb 9, 2026

View reviewed changes

ZiwenWan force-pushed the label-optimizations branch from 551976c to 7090f56 Compare February 9, 2026 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IndexedByteLabels with embedded offset header for O(log n) label …#44

Add IndexedByteLabels with embedded offset header for O(log n) label …#44
ZiwenWan wants to merge 1 commit intoopensearch-project:mainfrom
ZiwenWan:label-optimizations

ZiwenWan commented Feb 8, 2026

Uh oh!

codecov bot commented Feb 8, 2026 •

edited

Loading

Uh oh!

Jinny-Wang Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZiwenWan commented Feb 8, 2026

Description

Issues Resolved

Uh oh!

codecov bot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Jinny-Wang Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 8, 2026 •

edited

Loading