[Indexing] Add support of "extra fields" outside _source indexing by laminelam · Pull Request #20635 · opensearch-project/OpenSearch

laminelam · 2026-02-15T22:36:08Z

Resolves #20765

This PR introduces ExtraFieldValues, a side-channel mechanism for supplying field values outside of _source during indexing.

This change is part of a broader effort to enhance gRPC indexing, enabling primitive vector values to be indexed as non-_source data for improved ingestion performance.

Key points:

Adds ExtraFieldValues support to IndexRequest and UpdateRequest (doc and upsert paths).
Enables mappers to consume values directly, bypassing JSON parsing.
Currently supports float arrays only (primitive and packed little-endian representations).
Transport serialization is version-gated for backward compatibility.
For now, only doc-based updates and upserts are supported (script not supported).

Benchmark

Benchmarked ingest throughput across increasing vector dimensions using four representations:

JSON (float array)
JSON (binary/base64)
SMILE
ExtraFieldValues (non _source side-channel)

Across all tested dimensions:

JSON float arrays show the lowest throughput and degrade rapidly as vector size increases.

JSON (binary/base64) performs significantly better than float arrays.

SMILE and JSON(base64) are generally comparable, with SMILE often providing significant improvements at higher dimensions.

ExtraFieldValues consistently provides the highest throughput and scales best with increasing vector size.

At higher dimensions (e.g., 1024+), ExtraFieldValues shows a clear and consistent advantage over the _source-based approaches, achieving up to 60× higher throughput compared to JSON arrays.

These results indicate that bypassing _source parsing for large vector payloads can substantially improve ingest throughput.

Note: ExtraFieldValues was tested with the PackedFloatArray option (packed little-endian floats)

Ingest Throughput (baseline = JSON float array)

docs/sec (X× faster with json array as base)

Dim	JSON (array)	JSON (base64/binary)	SMILE (binary)	ExtraFieldValues
8	19,619 (1.0×)	30,817 (1.57×)	27,405 (1.40×)	26,793 (1.37×)
64	6,572 (1.0×)	23,637 (3.60×)	28,010 (4.26×)	26,284 (4.00×)
128	3,649 (1.0×)	19,428 (5.32×)	14,375 (3.94×)	25,899 (7.10×)
512	1,132 (1.0×)	9,723 (8.59×)	12,532 (11.07×)	20,830 (18.40×)
768	778 (1.0×)	6,883 (8.85×)	10,424 (13.40×)	18,443 (23.70×)
1024	591 (1.0×)	5,630 (9.53×)	8,453 (14.31×)	17,663 (29.90×)
1536	399 (1.0×)	3,951 (9.89×)	6,061 (15.18×)	15,398 (38.56×)
2048	299 (1.0×)	3,034 (10.14×)	4,659 (15.56×)	14,887 (49.74×)
3072	196 (1.0×)	2,060 (10.50×)	3,058 (15.59×)	11,840 (60.34×)

Related Issues

#19638

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

coderabbitai · 2026-02-15T22:36:43Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces comprehensive support for storing extra field values (fields outside _source) across OpenSearch's indexing and update paths. It adds typed value containers (BYTES, FLOAT_ARRAY), integrates them into request objects, and implements parsing logic to validate and store these values via the mapper framework.

Changes

Cohort / File(s)	Summary
Extra Field Value Type System `server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java`, `BytesValue.java`, `FloatArrayValue.java`, `PackedFloatArray.java`, `PrimitiveFloatArray.java`	Introduces sealed interface with nested Type enum (BYTES, FLOAT_ARRAY) and streaming protocol. Provides implementations for bytes data and float array storage (packed and primitive), with lazy decoding and caching for packed arrays.
Extra Field Values Container `server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java`	Immutable map-based container for field path to ExtraFieldValue mapping; supports serialization/deserialization and provides query accessors.
Request Path Integration `server/src/main/java/org/opensearch/action/index/IndexRequest.java`, `update/UpdateRequest.java`, `update/UpdateHelper.java`, `bulk/TransportShardBulkAction.java`, `SourceToParse.java`	Adds `extraFieldValues` field and accessors to IndexRequest; adds `docExtraFieldValues` and `upsertExtraFieldValues` setters to UpdateRequest with validation against scripted updates; propagates values through UpdateHelper and bulk processing; extends SourceToParse with version-aware serialization.
Parsing and Mapper Framework `server/src/main/java/org/opensearch/index/mapper/DocumentParser.java`, `FieldMapper.java`	Adds `applyExtraFieldValues` processing logic to DocumentParser with path validation and mapper support checks; introduces `supportsExtraFieldValues()` hook in FieldMapper for declarative support.
Unit Tests - Value Types `server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java`, `ExtraFieldValueTests.java`, `ExtraFieldValuesTests.java`, `PackedFloatArrayTests.java`, `PrimitiveFloatArrayTests.java`	Validates serialization round-trips, immutability, lazy decoding with caching, bounds checking, and type dispatch for all value implementations.
Unit Tests - Ingestion & Validation `server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java`, `index/mapper/extrasource/ExtraFieldValuesIngestTests.java`	Tests parsing integration, rejection of extra field values with scripted updates, handling of unmapped fields, and root/nested field paths.
Integration Tests `server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java`, `update/UpdateIT.java`	End-to-end validation of index, update, upsert, and bulk operations carrying extra field values; verifies stored field metadata preservation.
Test Infrastructure `server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java`, `ExtraTestFieldMapper.java`	Custom mapper plugin and field type for test environments; ExtraTestFieldMapper parses ExtraFieldValue implementations and stores type/dimension metadata.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Request as Index/Update/Bulk<br/>Request
    participant Transport as Transport<br/>ShardBulkAction
    participant Parser as Document<br/>Parser
    participant FieldMapper as Field<br/>Mapper
    participant Storage as Lucene<br/>Document

    Client->>Request: create with<br/>extraFieldValues()
    Request->>Transport: process (primary)
    Transport->>Parser: call with<br/>SourceToParse
    Parser->>Parser: applyExtraFieldValues()
    loop for each extra field
        Parser->>Parser: validate path &<br/>nested parents
        Parser->>FieldMapper: resolve mapper
        FieldMapper->>FieldMapper: supportsExtraFieldValues()?
        alt supported
            FieldMapper->>FieldMapper: parseCreateField()
            FieldMapper->>Storage: store field +<br/>metadata (_type, _dim)
        else not supported
            FieldMapper->>Parser: throw validation error
        end
    end
    Parser->>Storage: finalize document
    Transport->>Transport: replicate (replica)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

opensearch-project/OpenSearch#19958: Modifies DocumentParser's internalParseDocument flow for disabling nested objects/flattening, similar interaction pattern with core parsing logic.

Suggested labels

enhancement, Indexing & Search, Search:Performance

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 17.42% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and concisely describes the main change: adding support for 'extra fields' outside _source during indexing, which aligns with the core purpose of the changeset.
Description check	✅ Passed	The PR description comprehensively covers the change objectives, includes detailed context about ExtraFieldValues functionality, provides benchmark results showing performance improvements, and completes all checklist items appropriately.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@server/src/main/java/org/opensearch/action/update/UpdateRequest.java`:
- Around line 724-733: The Javadoc for UpdateRequest.docExtraFieldValues has a
typo ("extral") and should be corrected to "extra"; update the method comment
block above public UpdateRequest docExtraFieldValues(ExtraFieldValues values) to
read "Sets extra field values for the partial document update" and adjust any
related wording (e.g., "These values...") while keeping the rest of the
documentation and references to ExtraFieldValues,
safeDoc().extraFieldValues(values), and ExtraFieldValues#EMPTY unchanged.

🧹 Nitpick comments (5)

server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java (1)
54-57: Minor: Duplicate assertion.

Line 56 duplicates the assertion at line 54 (read.dimension() is checked twice for vals.length).
🔧 Remove duplicate assertion
         assertThat(read.dimension(), is(vals.length));
         assertThat(read.isPackedLE(), is(false));
-        assertThat(read.dimension(), is(vals.length));
         assertThat(read.asFloatArray(), equalTo(vals));
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java (1)

153-189: Resolve or track the EFV/JSON exclusivity TODO. The TODO hints at a missing validation rule; leaving both JSON and EFV populated could create ambiguous precedence. Consider enforcing it here or opening a follow‑up.

If you want, I can draft the validation or open a tracking issue.
server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java (2)
60-65: Add braces around the for-loop body for consistency.

The loop body at lines 62-63 lacks braces, which is inconsistent with the style in writePayloadTo (lines 55-57).
🔧 Suggested fix
     static PrimitiveFloatArray readBodyFrom(StreamInput in, int dim) throws IOException {
         final float[] v = new float[dim];
-        for (int j = 0; j < dim; j++)
-            v[j] = in.readFloat();
+        for (int j = 0; j < dim; j++) {
+            v[j] = in.readFloat();
+        }
         return new PrimitiveFloatArray(v);
     }
24-26: Consider null validation in constructor.

The constructor accepts the array without null-checking. If a null array is passed, subsequent calls to dimension() or get() will throw NPE. Since this is a public class, defensive validation may be warranted.
🛡️ Suggested fix
     public PrimitiveFloatArray(float[] v) {
+        java.util.Objects.requireNonNull(v, "v must not be null");
         this.v = v;
     }
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java (1)
159-163: Remove unused helper method setPrivateField.

This helper method is defined but never used in any test. Consider removing it to avoid dead code, or add a test that requires it.
🧹 Suggested fix
-    private static void setPrivateField(Object target, String fieldName, Object value) throws Exception {
-        Field f = target.getClass().getDeclaredField(fieldName);
-        f.setAccessible(true);
-        f.set(target, value);
-    }

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c244c0 and 5d4649e.

📒 Files selected for processing (24)

server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java
server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java
server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
server/src/main/java/org/opensearch/action/index/IndexRequest.java
server/src/main/java/org/opensearch/action/update/UpdateHelper.java
server/src/main/java/org/opensearch/action/update/UpdateRequest.java
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java
server/src/main/java/org/opensearch/index/mapper/FieldMapper.java
server/src/main/java/org/opensearch/index/mapper/SourceToParse.java
server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java
server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java
server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java
server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java
server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java
server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.java
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java
server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-01-13T17:40:27.167Z

Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

server/src/main/java/org/opensearch/action/index/IndexRequest.java
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java
server/src/main/java/org/opensearch/action/update/UpdateRequest.java
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java
server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java
server/src/main/java/org/opensearch/action/update/UpdateHelper.java
server/src/main/java/org/opensearch/index/mapper/SourceToParse.java
server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java
server/src/main/java/org/opensearch/index/mapper/FieldMapper.java
server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java
server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java
server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java

🧬 Code graph analysis (8)

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java (1)

libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)

BytesArray (48-151)

server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java (1)

libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)

BytesArray (48-151)

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java (1)

libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)

BytesArray (48-151)

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)

server/src/main/java/org/opensearch/index/mapper/FieldMapper.java (3)

Builder (85-214)

Builder (740-793)

Builder (894-913)

server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java (3)

libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)

BytesArray (48-151)

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)

ExtraFieldValuesMapperPlugin (19-28)

server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java (1)

PrimitiveFloatArray (21-66)

server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java (1)

server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java (1)

ExperimentalApi (20-53)

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)

ExtraTestFieldMapper (26-106)

server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java (1)

libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)

BytesArray (48-151)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)

GitHub Check: gradle-check
GitHub Check: detect-breaking-change
GitHub Check: precommit (25, windows-latest)
GitHub Check: precommit (21, macos-15)
GitHub Check: precommit (25, macos-15-intel)
GitHub Check: precommit (21, windows-2025, true)
GitHub Check: precommit (25, ubuntu-24.04-arm)
GitHub Check: precommit (25, macos-15)
GitHub Check: precommit (21, ubuntu-24.04-arm)
GitHub Check: Analyze (java)
GitHub Check: precommit (25, ubuntu-latest)
GitHub Check: precommit (21, windows-latest)
GitHub Check: precommit (21, ubuntu-latest)
GitHub Check: precommit (21, macos-15-intel)
GitHub Check: assemble (21, ubuntu-latest)
GitHub Check: assemble (21, ubuntu-24.04-arm)
GitHub Check: assemble (21, windows-latest)
GitHub Check: assemble (25, ubuntu-latest)
GitHub Check: assemble (25, windows-latest)
GitHub Check: assemble (25, ubuntu-24.04-arm)
GitHub Check: Mend Security Check

🔇 Additional comments (35)

server/src/main/java/org/opensearch/action/update/UpdateHelper.java (1)

254-254: LGTM - Correct propagation of ExtraFieldValues in doc-based update path.

This correctly forwards extraFieldValues from the incoming doc request to the final IndexRequest, ensuring EFV data flows through the update pipeline for non-script updates.

server/src/main/java/org/opensearch/index/mapper/FieldMapper.java (2)

336-341: LGTM - Clean opt-in hook for ExtraFieldValues support.

The method follows the established pattern in FieldMapper (similar to parsesArrayValue()), providing a sensible default of false that requires mappers to explicitly opt-in for EFV support.

50-50: The ExtraFieldValues import is used in the Javadoc comment at line 337 ({@link ExtraFieldValues}). The import is necessary for proper Javadoc link resolution and should be retained.

Likely an incorrect or invalid review comment.

server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java (1)

653-666: LGTM - Good test coverage for script + ExtraFieldValues validation.

The test correctly validates that combining a script with ExtraFieldValues on the upsert path triggers the expected validation error. The test structure is consistent with other validation tests in this file.

server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (2)

665-672: LGTM - Correct EFV propagation on primary path.

The SourceToParse construction now includes extraFieldValues() from the index request, enabling the parsing layer to consume extra field values during primary shard indexing.

918-919: LGTM - Consistent EFV propagation on replica path.

The replica path mirrors the primary path change, ensuring extraFieldValues are consistently available during replica shard indexing. This symmetry is important for correct replication behavior.

server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java (1)

550-564: LGTM - Good integration test for script + ExtraFieldValues rejection.

This integration test complements the unit test in UpdateRequestTests by validating the same behavior through the full client API stack. The test follows the established patterns in this file (using indexOrAlias(), expectThrows, etc.).

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java (1)

21-69: LGTM - Comprehensive serialization tests for ExtraFieldValue types.

The tests cover essential scenarios:

BytesValue round-trip with content verification

FloatArrayValue round-trip with dimension and value verification

Error handling for unknown type IDs

Good coverage of the serialization contract.

server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java (1)

17-35: LGTM - Clean record implementation for BytesValue.

The record pattern is appropriate for this immutable value type. The implementation is minimal and focused with proper visibility for readBodyFrom.

One consideration: the record accepts a potentially null BytesReference without validation. If null safety is a concern, consider adding a compact constructor with a null check. However, this may not be necessary if callers are expected to validate inputs.

server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java (1)

20-75: LGTM - Thorough test coverage for PrimitiveFloatArray.

The tests comprehensively cover:

Basic accessors (dimension, get, asFloatArray, isPackedLE)

Error handling (packedBytes() throws on unpacked instance)

Serialization round-trip

Empty array edge case

Good alignment with the PrimitiveFloatArray implementation.

server/src/main/java/org/opensearch/action/update/UpdateRequest.java (3)

63-63: LGTM. ExtraFieldValues import aligns with the new API usage.

239-245: Good guard against scripted EFV usage. Prevents unsupported script paths from accepting extra field values.

821-829: LGTM. Upsert EFV setter mirrors the doc path cleanly.

server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java (1)

1-82: Solid round‑trip coverage for BytesValue. Exercises BytesArray, composite, and dispatch paths.

server/src/main/java/org/opensearch/index/mapper/DocumentParser.java (3)

50-51: LGTM. EFV imports are consistent with the new parsing flow.

146-146: LGTM. Applying EFV before metadata post-parse keeps the pipeline ordering consistent.

191-205: LGTM. Nested‑parent guard is clear and appropriately scoped.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java (1)

1-89: Nice coverage of immutability and stream round‑trip.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)

1-27: LGTM. Clear and minimal mapper plugin registration for tests.

server/src/main/java/org/opensearch/action/index/IndexRequest.java (5)

65-65: LGTM. EFV import matches the new request field.

116-116: LGTM. Defaulting to ExtraFieldValues.EMPTY avoids null checks.

158-160: LGTM. Version‑gated EFV deserialization looks correct.

504-519: LGTM. Setter null‑resets to EMPTY and the getter keeps the API clean.

702-704: LGTM. Version‑gated EFV serialization is consistent with the read path.

server/src/main/java/org/opensearch/index/mapper/SourceToParse.java (3)

40-40: LGTM. EFV import is consistent with new constructor overloads.

62-84: LGTM. Constructor chaining keeps backward compatibility and defaults EFV to EMPTY.

110-112: LGTM. Simple, clear EFV accessor.

server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java (1)

1-51: LGTM. Clean immutable container with streamable support.

server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java (1)

17-68: LGTM!

The interface design is clean with a well-defined contract:

Proper sealed interface extending ExtraFieldValue

Symmetric read/write protocol with writeBodyTo/readBodyFrom

Clear separation between header metadata (packedLE flag, dimension) and payload serialization via writePayloadTo

server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java (1)

38-202: LGTM!

Comprehensive integration tests covering:

Basic index operations with BytesValue

Update operations with PrimitiveFloatArray

Upsert operations with BytesValue

Bulk operations with mixed EFV types

The test structure is clean with appropriate assertions on stored field values.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.java (1)

29-135: LGTM!

Well-structured unit tests for EFV ingestion covering:

Root field with BytesValue

Nested object path with PrimitiveFloatArray

Error handling when no mapper exists for the extra field

The tests properly exercise the DocumentMapper.parse() path with SourceToParse containing ExtraFieldValues.

server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java (1)

17-90: LGTM!

Clean sealed interface design with:

Appropriate @ExperimentalApi annotation for the new feature

Efficient type lookup via byte id with O(1) array access

Well-defined wire encoding protocol: [typeId][body]

Good error handling for unknown type ids

The TODO for future types (double, int, long) is noted and the design accommodates extensibility.

server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java (1)

35-165: LGTM!

Well-implemented packed float array with:

Efficient lazy decoding with proper volatile fields for thread-safety

Zero-copy optimization for BytesArray backed references

Correct little-endian decoding in decodeAt()

Robust validation with overflow protection using Math.multiplyExact

Proper bounds checking in get()

The documented concurrency behavior (multiple threads may transiently materialize but results are equivalent) is correctly implemented with Java's volatile memory model guarantees.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)

26-106: LGTM!

Appropriate test-only mapper implementation that:

Properly implements supportsExtraFieldValues() returning true

Consumes ExtraFieldValue via context.parseExternalValue()

Handles both BytesValue and FloatArrayValue types

Stores metadata fields for test verification (type, length/dimension, first float)

This provides the necessary test infrastructure for validating the EFV feature end-to-end.

server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java (1)

28-157: LGTM!

Comprehensive test coverage for PackedFloatArray including:

Basic construction and value retrieval

Zero-copy optimization verification for BytesArray slices

Composite BytesReference materialization behavior

Cache identity verification for asFloatArray()

Serialization round-trip

Validation and bounds checking error cases

The use of reflection to verify internal state is appropriate for testing optimization guarantees.

coderabbitai · 2026-02-15T22:44:49Z

server/src/main/java/org/opensearch/action/update/UpdateRequest.java

+    /**
+     * Sets extral field values for the partial document update ({@code doc}).
+     * <p>
+     * These values are applied only when the update is executed using {@code doc} (i.e., no script).
+     * {@code null} clears the values and resets to {@link ExtraFieldValues#EMPTY}.
+     */
+    public UpdateRequest docExtraFieldValues(ExtraFieldValues values) {
+        safeDoc().extraFieldValues(values);
+        return this;
+    }


⚠️ Potential issue | 🟡 Minor

Fix Javadoc typo ("extral").

📝 Proposed fix

- * Sets extral field values for the partial document update ({`@code` doc}). + * Sets extra field values for the partial document update ({`@code` doc}).

🤖 Prompt for AI Agents

In `@server/src/main/java/org/opensearch/action/update/UpdateRequest.java` around lines 724 - 733, The Javadoc for UpdateRequest.docExtraFieldValues has a typo ("extral") and should be corrected to "extra"; update the method comment block above public UpdateRequest docExtraFieldValues(ExtraFieldValues values) to read "Sets extra field values for the partial document update" and adjust any related wording (e.g., "These values...") while keeping the rest of the documentation and references to ExtraFieldValues, safeDoc().extraFieldValues(values), and ExtraFieldValues#EMPTY unchanged.

github-actions · 2026-02-15T22:51:44Z

❌ Gradle check result for 5d4649e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

use writeFloatArray/readFloatArray Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

github-actions · 2026-02-16T00:59:37Z

❌ Gradle check result for 573da92: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…rialization (version mismatch) Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

github-actions · 2026-02-16T03:18:57Z

❌ Gradle check result for 0f11c94: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

laminelam · 2026-02-16T14:44:58Z

Hi @msfroh

As per our conversation, this is the non grpc part of the contribution.

The contribution is split across multiple PRs to simplify review and maintain separation of concerns.

msfroh

Thanks @laminelam ! The change overall looks good to me. I left a couple of questions/suggestions.

msfroh · 2026-03-11T18:01:08Z

server/src/main/java/org/opensearch/action/index/IndexRequest.java

+        extraFieldValues = in.getVersion().onOrAfter(Version.V_3_6_0)
+            ? Objects.requireNonNullElse(in.readOptionalWriteable(ExtraFieldValues::new), ExtraFieldValues.EMPTY)
+            : ExtraFieldValues.EMPTY;


Can you replace the ternary operator with an if/else? It's mostly so that when we remove the version check in a future version (either 4.0 or 5.0), the resulting diff is a little cleaner.

msfroh · 2026-03-11T18:20:05Z

server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java

+
+import java.io.IOException;
+
+public non-sealed interface FloatArrayValue extends ExtraFieldValue {


I'm wondering if it makes sense to add both PackedFloatArray and PrimitiveFloatArray?

Is the idea that clients can choose which kind of encoding they want to use when they send a float array? If that's the case, I'm wondering if we can convert to whichever one is better. That conversion can probably happen at the gRPC layer. I'm just thinking that we can remove some code if we only support one float array representation. Also, we could pare down some of the interface methods like isPackedLE.

Thanks @msfroh for the review.

The two representations move the cost to different parts of the indexing pipeline: client --> coordinator --> shard --> mapper.

With PrimitiveFloatArray, floats are serialized one by one during transport (client to coordinator and coordinator to shard). The mapper then receives an already decoded float array and does no additional work. Still, the elements are serialized twice during the full cycle.

With PackedFloatArray, the vector is transported as a packed byte sequence. This avoids the float serialization loops during transport, but the mapper needs to decode the floats when they are accessed.

So the trade off is where the CPU cost happens: PrimitiveFloatArray makes transport (including serialization/deserialization) heavier but the mapper cheaper, while PackedFloatArray makes transport cheaper but adds a small decode cost at the mapper.

For large vectors the transport serialization tends to dominate, which is why the packed representation performs much better in the benchmarks.

So, we want to allow clients to choose the trade off that fits their needs: simplicity (float[]) vs optimized transport (packed)

See the benchmark here
(vector = PackedFloatArray VS float = PrimitiveFloatArray)

msfroh · 2026-03-11T18:31:22Z

Looks like we're getting some errors for missing JavaDoc on public classes:

/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java:17: error: BytesValue (record): javadocs are missing
public record BytesValue(BytesReference bytes) implements ExtraFieldValue {
       ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java:18: error: ExtraFieldValue (interface): javadocs are missing
public sealed interface ExtraFieldValue permits FloatArrayValue, BytesValue {
              ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java:21: error: Type (enum): javadocs are missing
    enum Type {
    ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java:17: error: FloatArrayValue (interface): javadocs are missing
public non-sealed interface FloatArrayValue extends ExtraFieldValue {
                  ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java:21: error: ExtraFieldValues (class): javadocs are missing
public final class ExtraFieldValues implements Writeable {

Do these all need to be public? I guess we the ones above probably do, since ExtraFieldValues and ExtraFieldValue are both referenced from DocumentParser. Then the field mappers that use them will need BytesValue or FloatArrayValue to be able to extract the actual values. We might be able to make the FloatArrayValue implementations package-private, though (if we want to keep both of them).

laminelam · 2026-03-11T21:02:22Z

Do these all need to be public? I guess we the ones above probably do, since ExtraFieldValues and ExtraFieldValue are both referenced from DocumentParser. Then the field mappers that use them will need BytesValue or FloatArrayValue to be able to extract the actual values. We might be able to make the FloatArrayValue implementations package-private, though (if we want to keep both of them).

Yes, was planning to add some javadoc before the PR is merged.
You're right FloatArrayValue implementations could be made private (with some public factory methods). Will do.

replace the ternary operator with if/else make FloatArrayValue implementations package private Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

github-actions · 2026-03-17T02:45:16Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 988c50d.

Path	Line	Severity	Description
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java	156	low	Uses Java reflection with setAccessible(true) to access private fields, bypassing access controls. Annotated @SuppressForbidden with a TODO comment to remove it. Scoped to test code only and clearly labeled as temporary, so likely not malicious but warrants review.
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java	172	low	A TODO comment explicitly acknowledges missing validation: 'EFV is only allowed if JSON is absent (validation)'. This incomplete guard means a field could simultaneously receive values from both the JSON _source and ExtraFieldValues, potentially causing unexpected data behavior. Not malicious, but the acknowledged gap could be exploited in certain flows.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-17T02:46:27Z

PR Reviewer Guide 🔍

(Review updated until commit `362b03d`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java [33]: add double, int and long server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [164]: EFV is only allowed if JSON is absent (validation) server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java [156]: remove reflection-based private field access
🔀 Multiple PR themes Sub-PR theme: Add ExtraFieldValue data model and serialization (extrasource package) Relevant files: server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java server/src/main/java/org/opensearch/index/mapper/extrasource/package-info.java server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java Sub-PR theme: Wire ExtraFieldValues through IndexRequest, UpdateRequest, and DocumentParser Relevant files: server/src/main/java/org/opensearch/action/index/IndexRequest.java server/src/main/java/org/opensearch/action/update/UpdateRequest.java server/src/main/java/org/opensearch/action/update/UpdateHelper.java server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java server/src/main/java/org/opensearch/index/mapper/SourceToParse.java server/src/main/java/org/opensearch/index/mapper/FieldMapper.java server/src/main/java/org/opensearch/index/mapper/DocumentParser.java server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.java server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java
⚡ Recommended focus areas for review TODO Validation Gap The TODO comment "EFV is only allowed if JSON is absent (validation)" indicates that there is no validation preventing a field from being supplied both in _source JSON and via ExtraFieldValues simultaneously. This could lead to ambiguous or conflicting indexing behavior and should be addressed before the feature is considered complete. // TODO: EFV is only allowed if JSON is absent (validation) Thread Safety The `bytes`, `bytesOffset`, and `cached` fields are declared `volatile`, but `ensureBytes()` and `asFloatArray()` perform multi-step non-atomic updates (e.g., setting `bytesOffset` then `bytes`). Under concurrent access, a thread could observe `bytes != null` but a stale `bytesOffset`, leading to incorrect decoding. Consider using a single volatile reference to an immutable holder object, or synchronization. private void ensureBytes() { if (bytes != null) { return; } byte[] arr; int off; // Avoid compaction/copy for the common array-backed case // Otherwise, compact once if needed (composite/paged/etc). if (packed instanceof BytesArray ba) { arr = ba.array(); off = ba.offset(); } else { arr = BytesReference.toBytes(packed); off = 0; } bytesOffset = off; bytes = arr; Reflection in Tests The test uses reflection with `@SuppressForbidden` to access private fields `bytes` and `bytesOffset` of `PackedFloatArray`, with a TODO to remove it. This is a test smell and the TODO should be resolved — either by exposing a package-private accessor or redesigning the test to avoid internal state inspection. // TODO remove this @SuppressForbidden(reason = "todo, remove this") private static Object getPrivateField(Object target, String fieldName) throws Exception { Field f = target.getClass().getDeclaredField(fieldName); f.setAccessible(true); return f.get(target); } Missing Upsert EFV In `prepareUpdateIndexRequest`, only `currentRequest.extraFieldValues()` (the doc's EFV) is propagated to the final index request. The upsert path's ExtraFieldValues (set via `upsertExtraFieldValues`) may not be correctly applied when the update results in an upsert. Verify that the upsert EFV is properly routed during the upsert execution path. .extraFieldValues(currentRequest.extraFieldValues()) Null packedBytes Contract The `packedBytes()` method on `PrimitiveFloatArray` throws `IllegalStateException` when called, but the interface contract does not clearly document when it is safe to call this method. Callers checking `isPackedLE()` before calling `packedBytes()` is an implicit contract that should be explicitly documented, and any code path that calls `packedBytes()` without checking `isPackedLE()` first is a potential runtime error. BytesReference packedBytes();

github-actions · 2026-03-17T02:47:14Z

PR Code Suggestions ✨

Latest suggestions up to 362b03d

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix off-by-one in nested parent validation loop The loop iterates from `i=1` to `parts.length - 1` (due to the early break), but the `parent` path is built using `parts[i]` which is the next segment, not the current one. This means the last intermediate path segment is never checked. The `parent` variable should be updated at the end of each iteration using `parts[i]` to correctly build the cumulative path for the next check. server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [194-208] private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) { if (parts.length <= 1) return; String parent = parts[0]; - for (int i = 1; i < parts.length; i++) { + for (int i = 1; i < parts.length - 1; i++) { final ObjectMapper om = context.docMapper().objectMappers().get(parent); if (om != null && om.nested().isNested()) { throw new MapperParsingException( "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]" ); } - if (i == parts.length - 1) break; parent = parent + "." + parts[i]; + } + // check the last intermediate segment + final ObjectMapper om = context.docMapper().objectMappers().get(parent); + if (om != null && om.nested().isNested()) { + throw new MapperParsingException( + "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]" + ); } } Suggestion importance[1-10]: 7 __ Why: The loop logic is indeed confusing: the `parent` variable is updated with `parts[i]` at the end of the iteration but the loop breaks early at `i == parts.length - 1`, meaning the last intermediate path segment is never validated against nested mappers. The suggested fix correctly restructures the loop to check all intermediate segments.	Medium
Possible issue	Fix data race in lazy byte materialization The `ensureBytes()` method has a data race: `bytesOffset` is written before `bytes`, so another thread could see a non-null `bytes` with a stale `bytesOffset = 0` (the default). Since `bytes` is the visibility guard, `bytesOffset` must be written first, or both fields should be updated atomically. Consider writing `bytesOffset` before `bytes` to ensure the offset is visible before the guard field is set. server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114] private void ensureBytes() { if (bytes != null) { return; } - ... + + byte[] arr; + int off; + + if (packed instanceof BytesArray ba) { + arr = ba.array(); + off = ba.offset(); + } else { + arr = BytesReference.toBytes(packed); + off = 0; + } + + // Write offset before bytes to ensure visibility ordering bytesOffset = off; bytes = arr; } Suggestion importance[1-10]: 6 __ Why: There is a real data race where `bytesOffset` could be read as 0 by another thread after `bytes` is set but before `bytesOffset` is written. However, the `improved_code` is essentially the same as the `existing_code` (just reordering the two assignments), and the fix is already present in the existing code since `bytesOffset = off` is written before `bytes = arr`. The suggestion is valid but the existing code already has the correct ordering.	Low
General	Ensure correct extra field values used in update vs upsert When `prepareUpdateIndexRequest` is called for an upsert (document does not exist), `currentRequest` may be the `upsert` IndexRequest rather than the `doc` IndexRequest. Propagating `currentRequest.extraFieldValues()` without distinguishing between the two cases could apply the wrong extra field values. Ensure the correct source of `extraFieldValues` is selected based on whether this is an update or an upsert operation. server/src/main/java/org/opensearch/action/update/UpdateHelper.java [250-258] final IndexRequest finalIndexRequest = Requests.indexRequest(request.index()) .id(request.id()) .routing(routing) .source(updatedSourceAsMap, updateSourceContentType) - .extraFieldValues(currentRequest.extraFieldValues()) + .extraFieldValues(request.docExtraFieldValues() != null ? request.docExtraFieldValues() : currentRequest.extraFieldValues()) .setIfSeqNo(getResult.getSeqNo()) .setIfPrimaryTerm(getResult.getPrimaryTerm()) .waitForActiveShards(request.waitForActiveShards()) .timeout(request.timeout()) Suggestion importance[1-10]: 3 __ Why: The suggestion misunderstands the code context. `prepareUpdateIndexRequest` is specifically for the update (non-upsert) path, and `currentRequest` is the `doc` IndexRequest in this context. The `improved_code` references a non-existent `request.docExtraFieldValues()` method, making it incorrect.	Low
General	Remove reflection-based internal state inspection from tests Using reflection to access private fields in tests creates a tight coupling to implementation details and is fragile. The `TODO` comment acknowledges this should be removed. Consider adding package-private or test-visible accessors, or restructuring the test to verify observable behavior (e.g., correctness of decoded values) rather than internal state. server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java [156-163] -// TODO remove this -@SuppressForbidden(reason = "todo, remove this") -private static Object getPrivateField(Object target, String fieldName) throws Exception { - Field f = target.getClass().getDeclaredField(fieldName); - f.setAccessible(true); - return f.get(target); -} +// Verify observable behavior instead of internal state: +// The correctness of the no-copy optimization is implicitly verified by +// asserting that decoded values match expected floats after get()/asFloatArray(). Suggestion importance[1-10]: 2 __ Why: While the suggestion to avoid reflection-based testing is valid as a code quality concern, the `improved_code` removes the tests entirely rather than refactoring them, which reduces test coverage. The existing TODO comment already acknowledges this technical debt.	Low

Previous suggestions

Suggestions up to commit 72047a8

Category	Suggestion	Impact
Possible issue	Fix incomplete nested parent validation loop The loop condition `if (i == parts.length - 1) break;` causes the last intermediate segment to be skipped when building the `parent` path, meaning the deepest intermediate object is never checked for nesting. The `break` should be removed so all intermediate path segments are validated, and the loop should naturally stop before the leaf field name. server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [194-208] private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) { if (parts.length <= 1) return; String parent = parts[0]; - for (int i = 1; i < parts.length; i++) { + for (int i = 1; i < parts.length - 1; i++) { final ObjectMapper om = context.docMapper().objectMappers().get(parent); if (om != null && om.nested().isNested()) { throw new MapperParsingException( "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]" ); } - if (i == parts.length - 1) break; parent = parent + "." + parts[i]; + } + // Check the last intermediate segment too + final ObjectMapper om = context.docMapper().objectMappers().get(parent); + if (om != null && om.nested().isNested()) { + throw new MapperParsingException( + "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]" + ); } } Suggestion importance[1-10]: 7 __ Why: The original loop has a logic bug: the `break` at `i == parts.length - 1` exits before updating `parent` to the last intermediate segment, so the deepest intermediate object is never checked for nesting. The suggested fix correctly iterates through all intermediate segments. This is a real correctness issue in the nested parent validation.	Medium
Possible issue	Fix volatile field write ordering to prevent data race The `ensureBytes()` method has a data race: `bytes` is checked without synchronization, and `bytesOffset` is written before `bytes`, so another thread could observe a non-null `bytes` with a stale `bytesOffset` of 0. Since `bytes` is `volatile`, writing `bytes` last (after `bytesOffset`) ensures the happens-before relationship is correct. server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114] private void ensureBytes() { if (bytes != null) { return; } - ... + + byte[] arr; + int off; + + if (packed instanceof BytesArray ba) { + arr = ba.array(); + off = ba.offset(); + } else { + arr = BytesReference.toBytes(packed); + off = 0; + } + + // Write bytesOffset before bytes to ensure visibility via volatile write on bytes bytesOffset = off; - bytes = arr; + bytes = arr; // volatile write; establishes happens-before for bytesOffset } Suggestion importance[1-10]: 6 __ Why: The `ensureBytes()` method writes `bytesOffset` before `bytes` (the volatile field), which is actually the correct ordering for establishing happens-before via the volatile write on `bytes`. The suggestion's improved code is essentially the same as the existing code with only a comment added, so the actual fix is minimal. The concern about the data race is valid but the existing code already has the correct write order.	Low
General	Remove reflection-based internal state access in tests Using reflection to access private fields in tests is fragile and couples the test to implementation details. The tests that verify `bytes` and `bytesOffset` internal state should instead be validated through the public API (e.g., `get()` and `asFloatArray()` return correct values), or the internal state should be exposed via package-private accessors. The `TODO` comment indicates this is known; it should be resolved before merging. server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java [156-163] -// TODO remove this -@SuppressForbidden(reason = "todo, remove this") -private static Object getPrivateField(Object target, String fieldName) throws Exception { - Field f = target.getClass().getDeclaredField(fieldName); - f.setAccessible(true); - return f.get(target); -} +// Removed reflection-based internal state checks; correctness is verified via public API: +// pfa.get(i) and pfa.asFloatArray() already assert correct decoding behavior. Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies that using reflection to access private fields couples tests to implementation details and is fragile. However, the `improved_code` simply removes the helper without showing how the tests that use it would be refactored, making it incomplete as a standalone fix.	Low
General	Document ExtraFieldValues behavior on NOOP updates The validation only rejects `ExtraFieldValues` when a `script` is present, but does not guard against `ExtraFieldValues` being set on the `doc` request when `docAsUpsert` is true and a script is absent. More importantly, there is no validation preventing `ExtraFieldValues` from being silently ignored when the update falls into a NOOP path. Consider also validating that `ExtraFieldValues` are not set on `doc` when `docAsUpsert` is enabled without a script, or at minimum add a comment explaining the intended behavior. server/src/main/java/org/opensearch/action/update/UpdateRequest.java [240-245] if (script != null) { if ((doc != null && !doc.extraFieldValues().isEmpty()) \|\| (upsertRequest != null && !upsertRequest.extraFieldValues().isEmpty())) { validationException = addValidationError("ExtraFieldValues are not supported with scripted updates", validationException); } } +// Note: ExtraFieldValues on doc are propagated via UpdateHelper.prepareUpdateIndexRequest; +// they are intentionally ignored on NOOP (no-change) updates. Suggestion importance[1-10]: 1 __ Why: The suggestion only adds a comment to the existing code without any functional change. The `existing_code` and `improved_code` are functionally identical, making this a documentation-only suggestion with minimal impact.	Low

Suggestions up to commit 2e90030

Category	Suggestion	Impact
Possible issue	Fix incorrect path accumulation in nested parent validation The loop iterates from `i=1` to `parts.length - 1` (due to the early break), but `parent` is only updated with `parts[i]` when `i < parts.length - 1`. This means the last intermediate segment is never appended, so the path built for checking is incorrect for deeper nesting. The `parent` accumulation should happen at the start of each iteration before the break check. server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [191-205] private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) { if (parts.length <= 1) return; String parent = parts[0]; - for (int i = 1; i < parts.length; i++) { + for (int i = 1; i < parts.length - 1; i++) { final ObjectMapper om = context.docMapper().objectMappers().get(parent); if (om != null && om.nested().isNested()) { throw new MapperParsingException( "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]" ); } - if (i == parts.length - 1) break; parent = parent + "." + parts[i]; } } Suggestion importance[1-10]: 7 __ Why: The loop logic is indeed buggy: the `parent` path is only updated when `i < parts.length - 1`, but the nested check for the last intermediate segment is skipped due to the early `break`. The suggested fix simplifies the loop to `i < parts.length - 1` and removes the break, which correctly checks all intermediate path segments. This is a real logic bug that could cause nested parent validation to be bypassed for deeper paths.	Medium
Possible issue	Fix potential data race in lazy byte materialization The `ensureBytes()` method uses non-atomic writes to `bytesOffset` and `bytes` (both `volatile`), which can cause a data race: another thread may see the updated `bytes` but the stale `bytesOffset = 0` (default), leading to incorrect decoding. The assignment to `bytesOffset` must happen before `bytes` is published, or both fields should be set atomically. server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114] private void ensureBytes() { if (bytes != null) { return; } - ... + + byte[] arr; + int off; + + if (packed instanceof BytesArray ba) { + arr = ba.array(); + off = ba.offset(); + } else { + arr = BytesReference.toBytes(packed); + off = 0; + } + + // Write offset before bytes to ensure correct visibility ordering bytesOffset = off; + // Volatile write to bytes acts as a publication barrier bytes = arr; } Suggestion importance[1-10]: 6 __ Why: There is a real potential data race: `bytesOffset` is written before `bytes`, but since both are `volatile`, the JMM guarantees that the volatile write to `bytes` happens-after the write to `bytesOffset`, so any thread that reads `bytes != null` will also see the correct `bytesOffset`. However, the improved code is essentially identical to the existing code (same ordering), so the suggestion doesn't actually change behavior. The comment in the PR already acknowledges that multiple threads may transiently allocate, but results are equivalent.	Low
General	Defensive copy to prevent external array mutation The constructor stores the provided `float[]` array directly without copying, so external mutations to the array after construction will silently corrupt the stored value. Since `asFloatArray()` also returns the same reference, callers can mutate the internal state. A defensive copy should be made in the constructor. server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java [24-26] public PrimitiveFloatArray(float[] v) { - this.v = v; + this.v = v.clone(); } Suggestion importance[1-10]: 4 __ Why: This is a valid defensive programming concern — storing the array directly allows external mutation. However, the PR's design intent (as noted in the comment "This can be used by clients that already have float[]") suggests performance is a priority, and `asFloatArray()` intentionally returns the same reference. Adding a defensive copy would change the documented behavior and add overhead.	Low
General	Ensure correct extra field values source during doc update The `extraFieldValues` are propagated from `currentRequest` (the doc update request) to the final index request during a doc-based update. However, if the document already existed and is being updated (not upserted), the extra field values from the previous version are not considered — this may be intentional, but it should be verified that `currentRequest` here always refers to the doc request and not the upsert request, since `prepareUpdateIndexRequest` is called for doc-based updates where `currentRequest` should be `request.doc()`. server/src/main/java/org/opensearch/action/update/UpdateHelper.java [250-259] final IndexRequest finalIndexRequest = Requests.indexRequest(request.index()) .id(request.id()) .routing(routing) .source(updatedSourceAsMap, updateSourceContentType) - .extraFieldValues(currentRequest.extraFieldValues()) + .extraFieldValues(request.doc() != null ? request.doc().extraFieldValues() : ExtraFieldValues.EMPTY) .setIfSeqNo(getResult.getSeqNo()) .setIfPrimaryTerm(getResult.getPrimaryTerm()) .waitForActiveShards(request.waitForActiveShards()) .timeout(request.timeout()) Suggestion importance[1-10]: 3 __ Why: The suggestion asks to verify that `currentRequest` refers to `request.doc()`, but without seeing the full context of `prepareUpdateIndexRequest`, this is speculative. The `improved_code` introduces `ExtraFieldValues.EMPTY` which requires an additional import and changes behavior in a way that may not be correct if `currentRequest` is already `request.doc()`. This is more of a verification concern than a clear bug.	Low

Suggestions up to commit 988c50d

Category	Suggestion	Impact
Possible issue	Fix incorrect intermediate path construction in nested validation The loop iterates from `i=1` to `parts.length - 1` (due to the early break), but the parent path is built using `parts[i]` instead of `parts[i-1]`, causing incorrect intermediate path construction. For example, with parts `["a","b","c"]`, after checking `"a"` the next parent becomes `"a.b"` but the code appends `parts[1]="b"` correctly only by coincidence; however the last segment `parts[parts.length-2]` is never checked. The parent accumulation should use `parts[i]` before the check, not after. server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [191-205] private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) { if (parts.length <= 1) return; - String parent = parts[0]; - for (int i = 1; i < parts.length; i++) { - final ObjectMapper om = context.docMapper().objectMappers().get(parent); + StringBuilder parent = new StringBuilder(parts[0]); + for (int i = 0; i < parts.length - 1; i++) { + if (i > 0) { + parent.append('.').append(parts[i]); + } + final ObjectMapper om = context.docMapper().objectMappers().get(parent.toString()); if (om != null && om.nested().isNested()) { throw new MapperParsingException( "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]" ); } - if (i == parts.length - 1) break; - parent = parent + "." + parts[i]; } } Suggestion importance[1-10]: 7 __ Why: The original loop logic has a subtle bug: it checks `parent` starting at `parts[0]`, then builds the next parent using `parts[i]` where `i` starts at 1, but the early `break` at `i == parts.length - 1` means the second-to-last intermediate path (e.g., `"a.b"` for `["a","b","c"]`) is never checked. The suggested fix correctly iterates over all intermediate paths from index 0 to `parts.length - 2`.	Medium
Possible issue	Fix data race in lazy byte materialization The `ensureBytes()` method has a data race: `bytesOffset` is written before `bytes`, so another thread could see a non-null `bytes` with a stale `bytesOffset = 0` (the initial value) before the assignment completes. Since `bytes` is the visibility guard (checked first), `bytesOffset` must be written before `bytes` is published, but the JMM doesn't guarantee ordering without synchronization. Swap the assignment order so `bytesOffset` is always set before `bytes` becomes visible. server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114] private void ensureBytes() { if (bytes != null) { return; } - ... + + byte[] arr; + int off; + + if (packed instanceof BytesArray ba) { + arr = ba.array(); + off = ba.offset(); + } else { + arr = BytesReference.toBytes(packed); + off = 0; + } + + // Write offset before bytes to ensure correct visibility ordering bytesOffset = off; - bytes = arr; + bytes = arr; // volatile write; publishes bytesOffset } Suggestion importance[1-10]: 6 __ Why: There is a legitimate JMM concern: `bytesOffset` is a non-final volatile field written before `bytes`, but since `bytes` is the null-check guard, another thread could observe `bytes != null` with a stale `bytesOffset`. The fix (writing `bytesOffset` before `bytes`) is already what the existing code does, so the `improved_code` is essentially the same as `existing_code`, making this suggestion marginally impactful.	Low
General	Defensive copy to prevent external array mutation The constructor stores the provided `float[]` directly without copying, so external mutation of the array after construction will silently corrupt the stored value. Since `asFloatArray()` also returns the internal array directly, callers can mutate it. A defensive copy should be made in the constructor to ensure immutability of the stored value. server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java [24-26] public PrimitiveFloatArray(float[] v) { - this.v = v; + this.v = v.clone(); } Suggestion importance[1-10]: 3 __ Why: While defensive copying is a valid concern for immutability, `PrimitiveFloatArray` is a package-private class used internally, and `asFloatArray()` already returns the internal array directly (so immutability is not fully enforced regardless). The performance cost of cloning may outweigh the benefit in this context.	Low
General	Ensure ExtraFieldValues validation covers docAsUpsert path The validation only rejects `ExtraFieldValues` when a script is present, but does not guard against the case where `docAsUpsert` is true and the `doc` request carries `ExtraFieldValues` while a script is also set (which is already invalid for other reasons). More importantly, there is no validation preventing `ExtraFieldValues` on the `doc` request when `docAsUpsert` is enabled alongside a script — the existing check covers this, but the condition should also handle the `docAsUpsert` path where `doc` doubles as the upsert. Consider also validating that `ExtraFieldValues` on `doc` are rejected when `script != null && docAsUpsert` to be explicit. server/src/main/java/org/opensearch/action/update/UpdateRequest.java [240-245] if (script != null) { if ((doc != null && !doc.extraFieldValues().isEmpty()) - \|\| (upsertRequest != null && !upsertRequest.extraFieldValues().isEmpty())) { + \|\| (upsertRequest != null && !upsertRequest.extraFieldValues().isEmpty()) + \|\| (docAsUpsert && doc != null && !doc.extraFieldValues().isEmpty())) { validationException = addValidationError("ExtraFieldValues are not supported with scripted updates", validationException); } } Suggestion importance[1-10]: 2 __ Why: The `improved_code` adds a redundant condition `(docAsUpsert && doc != null && !doc.extraFieldValues().isEmpty())` which is already covered by the existing `(doc != null && !doc.extraFieldValues().isEmpty())` check. The suggestion does not add any new coverage and the `improved_code` is logically equivalent to the existing code.	Low

github-actions · 2026-03-17T03:11:47Z

❌ Gradle check result for 988c50d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

github-actions · 2026-03-17T14:24:49Z

Persistent review updated to latest commit 2e90030

github-actions · 2026-03-17T14:33:03Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 72047a8.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java	163	low	TODO comment explicitly acknowledges missing validation: 'EFV is only allowed if JSON is absent (validation)'. This means a document can simultaneously supply a field value via both JSON _source and ExtraFieldValues with no guard against conflicts or override, which could lead to unpredictable indexing behavior. Not malicious, but an acknowledged incomplete security/correctness control.
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java	154	low	A @SuppressForbidden-annotated method uses reflection (Field.setAccessible(true)) to access private fields of PackedFloatArray. The annotation reason says 'todo, remove this', indicating this is acknowledged technical debt. Confined to test code and has no runtime security impact, but suppressing access controls without a strong justification is anomalous.
server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java	85	low	Non-atomic lazy initialization of volatile fields 'bytes' and 'bytesOffset' in ensureBytes() creates a benign but acknowledged data race. The class comment explicitly notes multiple threads may transiently allocate. This is a correctness/safety concern (not malicious), but the intentional acceptance of a data race in a shared mutable object without synchronization is a minor anomaly worth noting.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 0 | Low: 3

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-17T14:34:23Z

Persistent review updated to latest commit 72047a8

github-actions · 2026-03-17T16:35:21Z

❌ Gradle check result for 72047a8: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-18T02:05:08Z

❌ Gradle check result for 72047a8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Michael Froh <msfroh@apache.org>

github-actions · 2026-03-18T16:17:00Z

Persistent review updated to latest commit 362b03d

github-actions · 2026-03-18T18:17:06Z

❌ Gradle check result for 362b03d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-24T23:16:30Z

❌ Gradle check result for 362b03d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

add support of "extra fields" indexing for non _source data

5d4649e

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

laminelam requested review from a team, Bukhtawar, CEHENKLE, Rishikesh1159, anasalkouz, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners February 15, 2026 22:36

coderabbitai bot reviewed Feb 15, 2026

View reviewed changes

fix typo

573da92

use writeFloatArray/readFloatArray Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

fix underlining bug in TransportBulkActionTests.testSerializationDese…

0f11c94

…rialization (version mismatch) Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

opensearch-ci-bot mentioned this pull request Feb 18, 2026

[AUTOCUT] Gradle Check Flaky Test Report for FullClusterRestartIT #15813

Open

github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Mar 2, 2026

msfroh reviewed Mar 11, 2026

View reviewed changes

add javadocs

988c50d

replace the ternary operator with if/else make FloatArrayValue implementations package private Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

add package-info.java and changelog

2e90030

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>

Merge remote-tracking branch 'origin/main' into feature/extra_fields_36

72047a8

Merge branch 'main' into feature/extra_fields_36

362b03d

Signed-off-by: Michael Froh <msfroh@apache.org>

opensearch-ci-bot mentioned this pull request Mar 25, 2026

[AUTOCUT] Gradle Check Flaky Test Report for IngestFromKinesisIT #17678

Open


		import java.io.IOException;

		public non-sealed interface FloatArrayValue extends ExtraFieldValue {

Conversation

laminelam commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Ingest Throughput (baseline = JSON float array)

Related Issues

Check List

Uh oh!

coderabbitai bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

laminelam commented Feb 16, 2026

Uh oh!

msfroh left a comment

Choose a reason for hiding this comment

Uh oh!

msfroh Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

msfroh Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

laminelam Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

msfroh commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laminelam commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 362b03d)

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

laminelam commented Feb 15, 2026 •

edited

Loading

coderabbitai bot commented Feb 15, 2026 •

edited

Loading

msfroh commented Mar 11, 2026 •

edited

Loading

github-actions bot commented Mar 17, 2026 •

edited

Loading

(Review updated until commit `362b03d`)

github-actions bot commented Mar 17, 2026 •

edited

Loading