Skip to content

Add IndexedByteLabels with embedded offset header for O(log n) label …#44

Draft
ZiwenWan wants to merge 1 commit intoopensearch-project:mainfrom
ZiwenWan:label-optimizations
Draft

Add IndexedByteLabels with embedded offset header for O(log n) label …#44
ZiwenWan wants to merge 1 commit intoopensearch-project:mainfrom
ZiwenWan:label-optimizations

Conversation

@ZiwenWan
Copy link
Copy Markdown
Contributor

@ZiwenWan ZiwenWan commented Feb 8, 2026

…lookups

Introduce IndexedByteLabels, a new Labels implementation that embeds an offset header directly in the byte array for O(log n) binary search on get() and has() with zero initialization cost at query time.

Byte format:
HEADER: [label_count:2][offset_0:2]...[offset_n-1:2]
DATA: [name_len][name][value_len][value]... (same as ByteLabels)

The offset header is written at index time and read directly during binary search — no lazy index building, no separate allocation.

Add BINARY_INDEXED_BYTESLABEL as a new LabelStorageType index setting (alongside BINARY and SORTED_SET) with its own storage reader.

Description

Describe what this change achieves.

Issues Resolved

List any issues this PR will resolve, e.g. Closes [...].

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 8, 2026

Codecov Report

❌ Patch coverage is 77.15232% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.44%. Comparing base (07f8f12) to head (7090f56).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
.../opensearch/tsdb/core/model/IndexedByteLabels.java 86.25% 17 Missing and 19 partials ⚠️
...opensearch/tsdb/core/mapping/LabelStorageType.java 17.39% 18 Missing and 1 partial ⚠️
...h/tsdb/core/reader/IndexedBinaryLabelsStorage.java 0.00% 11 Missing ⚠️
...ava/org/opensearch/tsdb/core/model/ByteLabels.java 60.00% 1 Missing and 1 partial ⚠️
...b/query/aggregator/TimeSeriesUnfoldAggregator.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main      #44      +/-   ##
============================================
- Coverage     88.61%   88.44%   -0.18%     
- Complexity     4083     4167      +84     
============================================
  Files           281      283       +2     
  Lines         12724    13018     +294     
  Branches       1879     1927      +48     
============================================
+ Hits          11276    11514     +238     
- Misses          902      938      +36     
- Partials        546      566      +20     
Flag Coverage Δ
unittests 88.44% <77.15%> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...b/query/aggregator/TimeSeriesUnfoldAggregator.java 80.93% <0.00%> (ø)
...ava/org/opensearch/tsdb/core/model/ByteLabels.java 92.23% <60.00%> (+2.86%) ⬆️
...h/tsdb/core/reader/IndexedBinaryLabelsStorage.java 0.00% <0.00%> (ø)
...opensearch/tsdb/core/mapping/LabelStorageType.java 64.81% <17.39%> (-29.48%) ⬇️
.../opensearch/tsdb/core/model/IndexedByteLabels.java 86.25% <86.25%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ZiwenWan ZiwenWan marked this pull request as draft February 8, 2026 06:54
@ZiwenWan ZiwenWan force-pushed the label-optimizations branch 2 times, most recently from 54e11f9 to 551976c Compare February 9, 2026 07:39
…lookups

Introduce IndexedByteLabels, a new Labels implementation that embeds an
offset header directly in the byte array for O(log n) binary search on
get() and has() with zero initialization cost at query time.

Byte format:
  HEADER: [label_count:2][offset_0:2]...[offset_n-1:2]
  DATA:   [name_len][name][value_len][value]... (same as ByteLabels)

The offset header is written at index time and read directly during
binary search — no lazy index building, no separate allocation.

Add BINARY_INDEXED_BYTESLABEL as a new LabelStorageType index setting
(alongside BINARY and SORTED_SET) with its own storage reader.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Ziwen Wan <ziwen.wan@uber.com>
ByteLabels other = (ByteLabels) o;
return Arrays.equals(this.data, other.data);
if (o instanceof ByteLabels other) {
return Arrays.equals(this.data, other.data);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of directly jumping to Arrays.equals which can be O(n)
shall we try comparing the hashcode of the two ByteLabels first?

Image

@ZiwenWan ZiwenWan force-pushed the label-optimizations branch from 551976c to 7090f56 Compare February 9, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants