Skip to content

[Indexing] Add support of "extra fields" outside _source indexing#20635

Open
laminelam wants to merge 7 commits intoopensearch-project:mainfrom
laminelam:feature/extra_fields_36
Open

[Indexing] Add support of "extra fields" outside _source indexing#20635
laminelam wants to merge 7 commits intoopensearch-project:mainfrom
laminelam:feature/extra_fields_36

Conversation

@laminelam
Copy link
Copy Markdown
Contributor

@laminelam laminelam commented Feb 15, 2026

Resolves #20765

This PR introduces ExtraFieldValues, a side-channel mechanism for supplying field values outside of _source during indexing.

This change is part of a broader effort to enhance gRPC indexing, enabling primitive vector values to be indexed as non-_source data for improved ingestion performance.

Key points:

  • Adds ExtraFieldValues support to IndexRequest and UpdateRequest (doc and upsert paths).

  • Enables mappers to consume values directly, bypassing JSON parsing.

  • Currently supports float arrays only (primitive and packed little-endian representations).

  • Transport serialization is version-gated for backward compatibility.

  • For now, only doc-based updates and upserts are supported (script not supported).

Benchmark

Benchmarked ingest throughput across increasing vector dimensions using four representations:

  • JSON (float array)
  • JSON (binary/base64)
  • SMILE
  • ExtraFieldValues (non _source side-channel)

Across all tested dimensions:

JSON float arrays show the lowest throughput and degrade rapidly as vector size increases.

JSON (binary/base64) performs significantly better than float arrays.

SMILE and JSON(base64) are generally comparable, with SMILE often providing significant improvements at higher dimensions.

ExtraFieldValues consistently provides the highest throughput and scales best with increasing vector size.

At higher dimensions (e.g., 1024+), ExtraFieldValues shows a clear and consistent advantage over the _source-based approaches, achieving up to 60× higher throughput compared to JSON arrays.

These results indicate that bypassing _source parsing for large vector payloads can substantially improve ingest throughput.

Note: ExtraFieldValues was tested with the PackedFloatArray option (packed little-endian floats)

extra_fields

Ingest Throughput (baseline = JSON float array)

docs/sec (X× faster with json array as base)

Dim JSON (array) JSON (base64/binary) SMILE (binary) ExtraFieldValues
8 19,619 (1.0×) 30,817 (1.57×) 27,405 (1.40×) 26,793 (1.37×)
64 6,572 (1.0×) 23,637 (3.60×) 28,010 (4.26×) 26,284 (4.00×)
128 3,649 (1.0×) 19,428 (5.32×) 14,375 (3.94×) 25,899 (7.10×)
512 1,132 (1.0×) 9,723 (8.59×) 12,532 (11.07×) 20,830 (18.40×)
768 778 (1.0×) 6,883 (8.85×) 10,424 (13.40×) 18,443 (23.70×)
1024 591 (1.0×) 5,630 (9.53×) 8,453 (14.31×) 17,663 (29.90×)
1536 399 (1.0×) 3,951 (9.89×) 6,061 (15.18×) 15,398 (38.56×)
2048 299 (1.0×) 3,034 (10.14×) 4,659 (15.56×) 14,887 (49.74×)
3072 196 (1.0×) 2,060 (10.50×) 3,058 (15.59×) 11,840 (60.34×)

Related Issues

#19638

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 15, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces comprehensive support for storing extra field values (fields outside _source) across OpenSearch's indexing and update paths. It adds typed value containers (BYTES, FLOAT_ARRAY), integrates them into request objects, and implements parsing logic to validate and store these values via the mapper framework.

Changes

Cohort / File(s) Summary
Extra Field Value Type System
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java, BytesValue.java, FloatArrayValue.java, PackedFloatArray.java, PrimitiveFloatArray.java
Introduces sealed interface with nested Type enum (BYTES, FLOAT_ARRAY) and streaming protocol. Provides implementations for bytes data and float array storage (packed and primitive), with lazy decoding and caching for packed arrays.
Extra Field Values Container
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java
Immutable map-based container for field path to ExtraFieldValue mapping; supports serialization/deserialization and provides query accessors.
Request Path Integration
server/src/main/java/org/opensearch/action/index/IndexRequest.java, update/UpdateRequest.java, update/UpdateHelper.java, bulk/TransportShardBulkAction.java, SourceToParse.java
Adds extraFieldValues field and accessors to IndexRequest; adds docExtraFieldValues and upsertExtraFieldValues setters to UpdateRequest with validation against scripted updates; propagates values through UpdateHelper and bulk processing; extends SourceToParse with version-aware serialization.
Parsing and Mapper Framework
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java, FieldMapper.java
Adds applyExtraFieldValues processing logic to DocumentParser with path validation and mapper support checks; introduces supportsExtraFieldValues() hook in FieldMapper for declarative support.
Unit Tests - Value Types
server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java, ExtraFieldValueTests.java, ExtraFieldValuesTests.java, PackedFloatArrayTests.java, PrimitiveFloatArrayTests.java
Validates serialization round-trips, immutability, lazy decoding with caching, bounds checking, and type dispatch for all value implementations.
Unit Tests - Ingestion & Validation
server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java, index/mapper/extrasource/ExtraFieldValuesIngestTests.java
Tests parsing integration, rejection of extra field values with scripted updates, handling of unmapped fields, and root/nested field paths.
Integration Tests
server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java, update/UpdateIT.java
End-to-end validation of index, update, upsert, and bulk operations carrying extra field values; verifies stored field metadata preservation.
Test Infrastructure
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java, ExtraTestFieldMapper.java
Custom mapper plugin and field type for test environments; ExtraTestFieldMapper parses ExtraFieldValue implementations and stores type/dimension metadata.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Request as Index/Update/Bulk<br/>Request
    participant Transport as Transport<br/>ShardBulkAction
    participant Parser as Document<br/>Parser
    participant FieldMapper as Field<br/>Mapper
    participant Storage as Lucene<br/>Document

    Client->>Request: create with<br/>extraFieldValues()
    Request->>Transport: process (primary)
    Transport->>Parser: call with<br/>SourceToParse
    Parser->>Parser: applyExtraFieldValues()
    loop for each extra field
        Parser->>Parser: validate path &<br/>nested parents
        Parser->>FieldMapper: resolve mapper
        FieldMapper->>FieldMapper: supportsExtraFieldValues()?
        alt supported
            FieldMapper->>FieldMapper: parseCreateField()
            FieldMapper->>Storage: store field +<br/>metadata (_type, _dim)
        else not supported
            FieldMapper->>Parser: throw validation error
        end
    end
    Parser->>Storage: finalize document
    Transport->>Transport: replicate (replica)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • opensearch-project/OpenSearch#19958: Modifies DocumentParser's internalParseDocument flow for disabling nested objects/flattening, similar interaction pattern with core parsing logic.

Suggested labels

enhancement, Indexing & Search, Search:Performance

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and concisely describes the main change: adding support for 'extra fields' outside _source during indexing, which aligns with the core purpose of the changeset.
Description check ✅ Passed The PR description comprehensively covers the change objectives, includes detailed context about ExtraFieldValues functionality, provides benchmark results showing performance improvements, and completes all checklist items appropriately.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@server/src/main/java/org/opensearch/action/update/UpdateRequest.java`:
- Around line 724-733: The Javadoc for UpdateRequest.docExtraFieldValues has a
typo ("extral") and should be corrected to "extra"; update the method comment
block above public UpdateRequest docExtraFieldValues(ExtraFieldValues values) to
read "Sets extra field values for the partial document update" and adjust any
related wording (e.g., "These values...") while keeping the rest of the
documentation and references to ExtraFieldValues,
safeDoc().extraFieldValues(values), and ExtraFieldValues#EMPTY unchanged.
🧹 Nitpick comments (5)
server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java (1)

54-57: Minor: Duplicate assertion.

Line 56 duplicates the assertion at line 54 (read.dimension() is checked twice for vals.length).

🔧 Remove duplicate assertion
         assertThat(read.dimension(), is(vals.length));
         assertThat(read.isPackedLE(), is(false));
-        assertThat(read.dimension(), is(vals.length));
         assertThat(read.asFloatArray(), equalTo(vals));
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java (1)

153-189: Resolve or track the EFV/JSON exclusivity TODO. The TODO hints at a missing validation rule; leaving both JSON and EFV populated could create ambiguous precedence. Consider enforcing it here or opening a follow‑up.

If you want, I can draft the validation or open a tracking issue.

server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java (2)

60-65: Add braces around the for-loop body for consistency.

The loop body at lines 62-63 lacks braces, which is inconsistent with the style in writePayloadTo (lines 55-57).

🔧 Suggested fix
     static PrimitiveFloatArray readBodyFrom(StreamInput in, int dim) throws IOException {
         final float[] v = new float[dim];
-        for (int j = 0; j < dim; j++)
-            v[j] = in.readFloat();
+        for (int j = 0; j < dim; j++) {
+            v[j] = in.readFloat();
+        }
         return new PrimitiveFloatArray(v);
     }

24-26: Consider null validation in constructor.

The constructor accepts the array without null-checking. If a null array is passed, subsequent calls to dimension() or get() will throw NPE. Since this is a public class, defensive validation may be warranted.

🛡️ Suggested fix
     public PrimitiveFloatArray(float[] v) {
+        java.util.Objects.requireNonNull(v, "v must not be null");
         this.v = v;
     }
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java (1)

159-163: Remove unused helper method setPrivateField.

This helper method is defined but never used in any test. Consider removing it to avoid dead code, or add a test that requires it.

🧹 Suggested fix
-    private static void setPrivateField(Object target, String fieldName, Object value) throws Exception {
-        Field f = target.getClass().getDeclaredField(fieldName);
-        f.setAccessible(true);
-        f.set(target, value);
-    }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c244c0 and 5d4649e.

📒 Files selected for processing (24)
  • server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java
  • server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
  • server/src/main/java/org/opensearch/action/index/IndexRequest.java
  • server/src/main/java/org/opensearch/action/update/UpdateHelper.java
  • server/src/main/java/org/opensearch/action/update/UpdateRequest.java
  • server/src/main/java/org/opensearch/index/mapper/DocumentParser.java
  • server/src/main/java/org/opensearch/index/mapper/FieldMapper.java
  • server/src/main/java/org/opensearch/index/mapper/SourceToParse.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java
  • server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/action/index/IndexRequest.java
  • server/src/main/java/org/opensearch/index/mapper/DocumentParser.java
  • server/src/main/java/org/opensearch/action/update/UpdateRequest.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java
  • server/src/main/java/org/opensearch/action/update/UpdateHelper.java
  • server/src/main/java/org/opensearch/index/mapper/SourceToParse.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java
  • server/src/main/java/org/opensearch/index/mapper/FieldMapper.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java
🧬 Code graph analysis (8)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
  • BytesArray (48-151)
server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
  • BytesArray (48-151)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
  • BytesArray (48-151)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)
server/src/main/java/org/opensearch/index/mapper/FieldMapper.java (3)
  • Builder (85-214)
  • Builder (740-793)
  • Builder (894-913)
server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java (3)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
  • BytesArray (48-151)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)
  • ExtraFieldValuesMapperPlugin (19-28)
server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java (1)
  • PrimitiveFloatArray (21-66)
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java (1)
server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java (1)
  • ExperimentalApi (20-53)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)
server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)
  • ExtraTestFieldMapper (26-106)
server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java (1)
libs/core/src/main/java/org/opensearch/core/common/bytes/BytesArray.java (1)
  • BytesArray (48-151)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (21)
  • GitHub Check: gradle-check
  • GitHub Check: detect-breaking-change
  • GitHub Check: precommit (25, windows-latest)
  • GitHub Check: precommit (21, macos-15)
  • GitHub Check: precommit (25, macos-15-intel)
  • GitHub Check: precommit (21, windows-2025, true)
  • GitHub Check: precommit (25, ubuntu-24.04-arm)
  • GitHub Check: precommit (25, macos-15)
  • GitHub Check: precommit (21, ubuntu-24.04-arm)
  • GitHub Check: Analyze (java)
  • GitHub Check: precommit (25, ubuntu-latest)
  • GitHub Check: precommit (21, windows-latest)
  • GitHub Check: precommit (21, ubuntu-latest)
  • GitHub Check: precommit (21, macos-15-intel)
  • GitHub Check: assemble (21, ubuntu-latest)
  • GitHub Check: assemble (21, ubuntu-24.04-arm)
  • GitHub Check: assemble (21, windows-latest)
  • GitHub Check: assemble (25, ubuntu-latest)
  • GitHub Check: assemble (25, windows-latest)
  • GitHub Check: assemble (25, ubuntu-24.04-arm)
  • GitHub Check: Mend Security Check
🔇 Additional comments (35)
server/src/main/java/org/opensearch/action/update/UpdateHelper.java (1)

254-254: LGTM - Correct propagation of ExtraFieldValues in doc-based update path.

This correctly forwards extraFieldValues from the incoming doc request to the final IndexRequest, ensuring EFV data flows through the update pipeline for non-script updates.

server/src/main/java/org/opensearch/index/mapper/FieldMapper.java (2)

336-341: LGTM - Clean opt-in hook for ExtraFieldValues support.

The method follows the established pattern in FieldMapper (similar to parsesArrayValue()), providing a sensible default of false that requires mappers to explicitly opt-in for EFV support.


50-50: The ExtraFieldValues import is used in the Javadoc comment at line 337 ({@link ExtraFieldValues}). The import is necessary for proper Javadoc link resolution and should be retained.

Likely an incorrect or invalid review comment.

server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java (1)

653-666: LGTM - Good test coverage for script + ExtraFieldValues validation.

The test correctly validates that combining a script with ExtraFieldValues on the upsert path triggers the expected validation error. The test structure is consistent with other validation tests in this file.

server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java (2)

665-672: LGTM - Correct EFV propagation on primary path.

The SourceToParse construction now includes extraFieldValues() from the index request, enabling the parsing layer to consume extra field values during primary shard indexing.


918-919: LGTM - Consistent EFV propagation on replica path.

The replica path mirrors the primary path change, ensuring extraFieldValues are consistently available during replica shard indexing. This symmetry is important for correct replication behavior.

server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java (1)

550-564: LGTM - Good integration test for script + ExtraFieldValues rejection.

This integration test complements the unit test in UpdateRequestTests by validating the same behavior through the full client API stack. The test follows the established patterns in this file (using indexOrAlias(), expectThrows, etc.).

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java (1)

21-69: LGTM - Comprehensive serialization tests for ExtraFieldValue types.

The tests cover essential scenarios:

  • BytesValue round-trip with content verification
  • FloatArrayValue round-trip with dimension and value verification
  • Error handling for unknown type IDs

Good coverage of the serialization contract.

server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java (1)

17-35: LGTM - Clean record implementation for BytesValue.

The record pattern is appropriate for this immutable value type. The implementation is minimal and focused with proper visibility for readBodyFrom.

One consideration: the record accepts a potentially null BytesReference without validation. If null safety is a concern, consider adding a compact constructor with a null check. However, this may not be necessary if callers are expected to validate inputs.

server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java (1)

20-75: LGTM - Thorough test coverage for PrimitiveFloatArray.

The tests comprehensively cover:

  • Basic accessors (dimension, get, asFloatArray, isPackedLE)
  • Error handling (packedBytes() throws on unpacked instance)
  • Serialization round-trip
  • Empty array edge case

Good alignment with the PrimitiveFloatArray implementation.

server/src/main/java/org/opensearch/action/update/UpdateRequest.java (3)

63-63: LGTM. ExtraFieldValues import aligns with the new API usage.


239-245: Good guard against scripted EFV usage. Prevents unsupported script paths from accepting extra field values.


821-829: LGTM. Upsert EFV setter mirrors the doc path cleanly.

server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java (1)

1-82: Solid round‑trip coverage for BytesValue. Exercises BytesArray, composite, and dispatch paths.

server/src/main/java/org/opensearch/index/mapper/DocumentParser.java (3)

50-51: LGTM. EFV imports are consistent with the new parsing flow.


146-146: LGTM. Applying EFV before metadata post-parse keeps the pipeline ordering consistent.


191-205: LGTM. Nested‑parent guard is clear and appropriately scoped.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java (1)

1-89: Nice coverage of immutability and stream round‑trip.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java (1)

1-27: LGTM. Clear and minimal mapper plugin registration for tests.

server/src/main/java/org/opensearch/action/index/IndexRequest.java (5)

65-65: LGTM. EFV import matches the new request field.


116-116: LGTM. Defaulting to ExtraFieldValues.EMPTY avoids null checks.


158-160: LGTM. Version‑gated EFV deserialization looks correct.


504-519: LGTM. Setter null‑resets to EMPTY and the getter keeps the API clean.


702-704: LGTM. Version‑gated EFV serialization is consistent with the read path.

server/src/main/java/org/opensearch/index/mapper/SourceToParse.java (3)

40-40: LGTM. EFV import is consistent with new constructor overloads.


62-84: LGTM. Constructor chaining keeps backward compatibility and defaults EFV to EMPTY.


110-112: LGTM. Simple, clear EFV accessor.

server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java (1)

1-51: LGTM. Clean immutable container with streamable support.

server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java (1)

17-68: LGTM!

The interface design is clean with a well-defined contract:

  • Proper sealed interface extending ExtraFieldValue
  • Symmetric read/write protocol with writeBodyTo/readBodyFrom
  • Clear separation between header metadata (packedLE flag, dimension) and payload serialization via writePayloadTo
server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java (1)

38-202: LGTM!

Comprehensive integration tests covering:

  • Basic index operations with BytesValue
  • Update operations with PrimitiveFloatArray
  • Upsert operations with BytesValue
  • Bulk operations with mixed EFV types

The test structure is clean with appropriate assertions on stored field values.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.java (1)

29-135: LGTM!

Well-structured unit tests for EFV ingestion covering:

  • Root field with BytesValue
  • Nested object path with PrimitiveFloatArray
  • Error handling when no mapper exists for the extra field

The tests properly exercise the DocumentMapper.parse() path with SourceToParse containing ExtraFieldValues.

server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java (1)

17-90: LGTM!

Clean sealed interface design with:

  • Appropriate @ExperimentalApi annotation for the new feature
  • Efficient type lookup via byte id with O(1) array access
  • Well-defined wire encoding protocol: [typeId][body]
  • Good error handling for unknown type ids

The TODO for future types (double, int, long) is noted and the design accommodates extensibility.

server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java (1)

35-165: LGTM!

Well-implemented packed float array with:

  • Efficient lazy decoding with proper volatile fields for thread-safety
  • Zero-copy optimization for BytesArray backed references
  • Correct little-endian decoding in decodeAt()
  • Robust validation with overflow protection using Math.multiplyExact
  • Proper bounds checking in get()

The documented concurrency behavior (multiple threads may transiently materialize but results are equivalent) is correctly implemented with Java's volatile memory model guarantees.

server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java (1)

26-106: LGTM!

Appropriate test-only mapper implementation that:

  • Properly implements supportsExtraFieldValues() returning true
  • Consumes ExtraFieldValue via context.parseExternalValue()
  • Handles both BytesValue and FloatArrayValue types
  • Stores metadata fields for test verification (type, length/dimension, first float)

This provides the necessary test infrastructure for validating the EFV feature end-to-end.

server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java (1)

28-157: LGTM!

Comprehensive test coverage for PackedFloatArray including:

  • Basic construction and value retrieval
  • Zero-copy optimization verification for BytesArray slices
  • Composite BytesReference materialization behavior
  • Cache identity verification for asFloatArray()
  • Serialization round-trip
  • Validation and bounds checking error cases

The use of reflection to verify internal state is appropriate for testing optimization guarantees.

Comment on lines +724 to +733
/**
* Sets extral field values for the partial document update ({@code doc}).
* <p>
* These values are applied only when the update is executed using {@code doc} (i.e., no script).
* {@code null} clears the values and resets to {@link ExtraFieldValues#EMPTY}.
*/
public UpdateRequest docExtraFieldValues(ExtraFieldValues values) {
safeDoc().extraFieldValues(values);
return this;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix Javadoc typo ("extral").

📝 Proposed fix
-     * Sets extral field values for the partial document update ({`@code` doc}).
+     * Sets extra field values for the partial document update ({`@code` doc}).
🤖 Prompt for AI Agents
In `@server/src/main/java/org/opensearch/action/update/UpdateRequest.java` around
lines 724 - 733, The Javadoc for UpdateRequest.docExtraFieldValues has a typo
("extral") and should be corrected to "extra"; update the method comment block
above public UpdateRequest docExtraFieldValues(ExtraFieldValues values) to read
"Sets extra field values for the partial document update" and adjust any related
wording (e.g., "These values...") while keeping the rest of the documentation
and references to ExtraFieldValues, safeDoc().extraFieldValues(values), and
ExtraFieldValues#EMPTY unchanged.

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 5d4649e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

use writeFloatArray/readFloatArray

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 573da92: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…rialization (version mismatch)

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 0f11c94: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@laminelam
Copy link
Copy Markdown
Contributor Author

Hi @msfroh

As per our conversation, this is the non grpc part of the contribution.

The contribution is split across multiple PRs to simplify review and maintain separation of concerns.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Mar 2, 2026
Copy link
Copy Markdown
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @laminelam ! The change overall looks good to me. I left a couple of questions/suggestions.

Comment on lines +158 to +160
extraFieldValues = in.getVersion().onOrAfter(Version.V_3_6_0)
? Objects.requireNonNullElse(in.readOptionalWriteable(ExtraFieldValues::new), ExtraFieldValues.EMPTY)
: ExtraFieldValues.EMPTY;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace the ternary operator with an if/else? It's mostly so that when we remove the version check in a future version (either 4.0 or 5.0), the resulting diff is a little cleaner.


import java.io.IOException;

public non-sealed interface FloatArrayValue extends ExtraFieldValue {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to add both PackedFloatArray and PrimitiveFloatArray?

Is the idea that clients can choose which kind of encoding they want to use when they send a float array? If that's the case, I'm wondering if we can convert to whichever one is better. That conversion can probably happen at the gRPC layer. I'm just thinking that we can remove some code if we only support one float array representation. Also, we could pare down some of the interface methods like isPackedLE.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @msfroh for the review.

The two representations move the cost to different parts of the indexing pipeline: client --> coordinator --> shard --> mapper.

With PrimitiveFloatArray, floats are serialized one by one during transport (client to coordinator and coordinator to shard). The mapper then receives an already decoded float array and does no additional work. Still, the elements are serialized twice during the full cycle.

With PackedFloatArray, the vector is transported as a packed byte sequence. This avoids the float serialization loops during transport, but the mapper needs to decode the floats when they are accessed.

So the trade off is where the CPU cost happens: PrimitiveFloatArray makes transport (including serialization/deserialization) heavier but the mapper cheaper, while PackedFloatArray makes transport cheaper but adds a small decode cost at the mapper.

For large vectors the transport serialization tends to dominate, which is why the packed representation performs much better in the benchmarks.

So, we want to allow clients to choose the trade off that fits their needs: simplicity (float[]) vs optimized transport (packed)

See the benchmark here
(vector = PackedFloatArray VS float = PrimitiveFloatArray)

@msfroh
Copy link
Copy Markdown
Contributor

msfroh commented Mar 11, 2026

Looks like we're getting some errors for missing JavaDoc on public classes:

/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java:17: error: BytesValue (record): javadocs are missing
public record BytesValue(BytesReference bytes) implements ExtraFieldValue {
       ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java:18: error: ExtraFieldValue (interface): javadocs are missing
public sealed interface ExtraFieldValue permits FloatArrayValue, BytesValue {
              ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java:21: error: Type (enum): javadocs are missing
    enum Type {
    ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java:17: error: FloatArrayValue (interface): javadocs are missing
public non-sealed interface FloatArrayValue extends ExtraFieldValue {
                  ^
/home/runner/work/OpenSearch/OpenSearch/server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java:21: error: ExtraFieldValues (class): javadocs are missing
public final class ExtraFieldValues implements Writeable {

Do these all need to be public? I guess we the ones above probably do, since ExtraFieldValues and ExtraFieldValue are both referenced from DocumentParser. Then the field mappers that use them will need BytesValue or FloatArrayValue to be able to extract the actual values. We might be able to make the FloatArrayValue implementations package-private, though (if we want to keep both of them).

@laminelam
Copy link
Copy Markdown
Contributor Author

Do these all need to be public? I guess we the ones above probably do, since ExtraFieldValues and ExtraFieldValue are both referenced from DocumentParser. Then the field mappers that use them will need BytesValue or FloatArrayValue to be able to extract the actual values. We might be able to make the FloatArrayValue implementations package-private, though (if we want to keep both of them).

Yes, was planning to add some javadoc before the PR is merged.
You're right FloatArrayValue implementations could be made private (with some public factory methods). Will do.

replace the ternary operator with if/else
make FloatArrayValue implementations package private

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 988c50d.

PathLineSeverityDescription
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java156lowUses Java reflection with setAccessible(true) to access private fields, bypassing access controls. Annotated @SuppressForbidden with a TODO comment to remove it. Scoped to test code only and clearly labeled as temporary, so likely not malicious but warrants review.
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java172lowA TODO comment explicitly acknowledges missing validation: 'EFV is only allowed if JSON is absent (validation)'. This incomplete guard means a field could simultaneously receive values from both the JSON _source and ExtraFieldValues, potentially causing unexpected data behavior. Not malicious, but the acknowledged gap could be exploited in certain flows.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 17, 2026

PR Reviewer Guide 🔍

(Review updated until commit 362b03d)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Add ExtraFieldValue data model and serialization (extrasource package)

Relevant files:

  • server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValue.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/ExtraFieldValues.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/BytesValue.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/FloatArrayValue.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java
  • server/src/main/java/org/opensearch/index/mapper/extrasource/package-info.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/BytesValueTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValueTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArrayTests.java

Sub-PR theme: Wire ExtraFieldValues through IndexRequest, UpdateRequest, and DocumentParser

Relevant files:

  • server/src/main/java/org/opensearch/action/index/IndexRequest.java
  • server/src/main/java/org/opensearch/action/update/UpdateRequest.java
  • server/src/main/java/org/opensearch/action/update/UpdateHelper.java
  • server/src/main/java/org/opensearch/action/bulk/TransportShardBulkAction.java
  • server/src/main/java/org/opensearch/index/mapper/SourceToParse.java
  • server/src/main/java/org/opensearch/index/mapper/FieldMapper.java
  • server/src/main/java/org/opensearch/index/mapper/DocumentParser.java
  • server/src/test/java/org/opensearch/action/bulk/TransportBulkActionTests.java
  • server/src/test/java/org/opensearch/action/update/UpdateRequestTests.java
  • server/src/internalClusterTest/java/org/opensearch/index/mapper/ExtraFieldValuesRequestIT.java
  • server/src/internalClusterTest/java/org/opensearch/update/UpdateIT.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesIngestTests.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraFieldValuesMapperPlugin.java
  • server/src/test/java/org/opensearch/index/mapper/extrasource/ExtraTestFieldMapper.java

⚡ Recommended focus areas for review

TODO Validation Gap

The TODO comment "EFV is only allowed if JSON is absent (validation)" indicates that there is no validation preventing a field from being supplied both in _source JSON and via ExtraFieldValues simultaneously. This could lead to ambiguous or conflicting indexing behavior and should be addressed before the feature is considered complete.

// TODO: EFV is only allowed if JSON is absent (validation)
Thread Safety

The bytes, bytesOffset, and cached fields are declared volatile, but ensureBytes() and asFloatArray() perform multi-step non-atomic updates (e.g., setting bytesOffset then bytes). Under concurrent access, a thread could observe bytes != null but a stale bytesOffset, leading to incorrect decoding. Consider using a single volatile reference to an immutable holder object, or synchronization.

private void ensureBytes() {
    if (bytes != null) {
        return;
    }

    byte[] arr;
    int off;

    // Avoid compaction/copy for the common array-backed case
    // Otherwise, compact once if needed (composite/paged/etc).
    if (packed instanceof BytesArray ba) {
        arr = ba.array();
        off = ba.offset();
    } else {
        arr = BytesReference.toBytes(packed);
        off = 0;
    }

    bytesOffset = off;
    bytes = arr;
Reflection in Tests

The test uses reflection with @SuppressForbidden to access private fields bytes and bytesOffset of PackedFloatArray, with a TODO to remove it. This is a test smell and the TODO should be resolved — either by exposing a package-private accessor or redesigning the test to avoid internal state inspection.

// TODO remove this
@SuppressForbidden(reason = "todo, remove this")
private static Object getPrivateField(Object target, String fieldName) throws Exception {
    Field f = target.getClass().getDeclaredField(fieldName);
    f.setAccessible(true);
    return f.get(target);
}
Missing Upsert EFV

In prepareUpdateIndexRequest, only currentRequest.extraFieldValues() (the doc's EFV) is propagated to the final index request. The upsert path's ExtraFieldValues (set via upsertExtraFieldValues) may not be correctly applied when the update results in an upsert. Verify that the upsert EFV is properly routed during the upsert execution path.

.extraFieldValues(currentRequest.extraFieldValues())
Null packedBytes Contract

The packedBytes() method on PrimitiveFloatArray throws IllegalStateException when called, but the interface contract does not clearly document when it is safe to call this method. Callers checking isPackedLE() before calling packedBytes() is an implicit contract that should be explicitly documented, and any code path that calls packedBytes() without checking isPackedLE() first is a potential runtime error.

BytesReference packedBytes();

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 17, 2026

PR Code Suggestions ✨

Latest suggestions up to 362b03d

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix off-by-one in nested parent validation loop

The loop iterates from i=1 to parts.length - 1 (due to the early break), but the
parent path is built using parts[i] which is the next segment, not the current
one. This means the last intermediate path segment is never checked. The parent
variable should be updated at the end of each iteration using parts[i] to correctly
build the cumulative path for the next check.

server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [194-208]

 private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) {
     if (parts.length <= 1) return;
 
     String parent = parts[0];
-    for (int i = 1; i < parts.length; i++) {
+    for (int i = 1; i < parts.length - 1; i++) {
         final ObjectMapper om = context.docMapper().objectMappers().get(parent);
         if (om != null && om.nested().isNested()) {
             throw new MapperParsingException(
                 "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]"
             );
         }
-        if (i == parts.length - 1) break;
         parent = parent + "." + parts[i];
+    }
+    // check the last intermediate segment
+    final ObjectMapper om = context.docMapper().objectMappers().get(parent);
+    if (om != null && om.nested().isNested()) {
+        throw new MapperParsingException(
+            "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]"
+        );
     }
 }
Suggestion importance[1-10]: 7

__

Why: The loop logic is indeed confusing: the parent variable is updated with parts[i] at the end of the iteration but the loop breaks early at i == parts.length - 1, meaning the last intermediate path segment is never validated against nested mappers. The suggested fix correctly restructures the loop to check all intermediate segments.

Medium
Fix data race in lazy byte materialization

The ensureBytes() method has a data race: bytesOffset is written before bytes, so
another thread could see a non-null bytes with a stale bytesOffset = 0 (the
default). Since bytes is the visibility guard, bytesOffset must be written first, or
both fields should be updated atomically. Consider writing bytesOffset before bytes
to ensure the offset is visible before the guard field is set.

server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114]

 private void ensureBytes() {
     if (bytes != null) {
         return;
     }
-    ...
+
+    byte[] arr;
+    int off;
+
+    if (packed instanceof BytesArray ba) {
+        arr = ba.array();
+        off = ba.offset();
+    } else {
+        arr = BytesReference.toBytes(packed);
+        off = 0;
+    }
+
+    // Write offset before bytes to ensure visibility ordering
     bytesOffset = off;
     bytes = arr;
 }
Suggestion importance[1-10]: 6

__

Why: There is a real data race where bytesOffset could be read as 0 by another thread after bytes is set but before bytesOffset is written. However, the improved_code is essentially the same as the existing_code (just reordering the two assignments), and the fix is already present in the existing code since bytesOffset = off is written before bytes = arr. The suggestion is valid but the existing code already has the correct ordering.

Low
General
Ensure correct extra field values used in update vs upsert

When prepareUpdateIndexRequest is called for an upsert (document does not exist),
currentRequest may be the upsert IndexRequest rather than the doc IndexRequest.
Propagating currentRequest.extraFieldValues() without distinguishing between the two
cases could apply the wrong extra field values. Ensure the correct source of
extraFieldValues is selected based on whether this is an update or an upsert
operation.

server/src/main/java/org/opensearch/action/update/UpdateHelper.java [250-258]

 final IndexRequest finalIndexRequest = Requests.indexRequest(request.index())
     .id(request.id())
     .routing(routing)
     .source(updatedSourceAsMap, updateSourceContentType)
-    .extraFieldValues(currentRequest.extraFieldValues())
+    .extraFieldValues(request.docExtraFieldValues() != null ? request.docExtraFieldValues() : currentRequest.extraFieldValues())
     .setIfSeqNo(getResult.getSeqNo())
     .setIfPrimaryTerm(getResult.getPrimaryTerm())
     .waitForActiveShards(request.waitForActiveShards())
     .timeout(request.timeout())
Suggestion importance[1-10]: 3

__

Why: The suggestion misunderstands the code context. prepareUpdateIndexRequest is specifically for the update (non-upsert) path, and currentRequest is the doc IndexRequest in this context. The improved_code references a non-existent request.docExtraFieldValues() method, making it incorrect.

Low
Remove reflection-based internal state inspection from tests

Using reflection to access private fields in tests creates a tight coupling to
implementation details and is fragile. The TODO comment acknowledges this should be
removed. Consider adding package-private or test-visible accessors, or restructuring
the test to verify observable behavior (e.g., correctness of decoded values) rather
than internal state.

server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java [156-163]

-// TODO remove this
-@SuppressForbidden(reason = "todo, remove this")
-private static Object getPrivateField(Object target, String fieldName) throws Exception {
-    Field f = target.getClass().getDeclaredField(fieldName);
-    f.setAccessible(true);
-    return f.get(target);
-}
+// Verify observable behavior instead of internal state:
+// The correctness of the no-copy optimization is implicitly verified by
+// asserting that decoded values match expected floats after get()/asFloatArray().
Suggestion importance[1-10]: 2

__

Why: While the suggestion to avoid reflection-based testing is valid as a code quality concern, the improved_code removes the tests entirely rather than refactoring them, which reduces test coverage. The existing TODO comment already acknowledges this technical debt.

Low

Previous suggestions

Suggestions up to commit 72047a8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incomplete nested parent validation loop

The loop condition if (i == parts.length - 1) break; causes the last intermediate
segment to be skipped when building the parent path, meaning the deepest
intermediate object is never checked for nesting. The break should be removed so all
intermediate path segments are validated, and the loop should naturally stop before
the leaf field name.

server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [194-208]

 private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) {
     if (parts.length <= 1) return;
 
     String parent = parts[0];
-    for (int i = 1; i < parts.length; i++) {
+    for (int i = 1; i < parts.length - 1; i++) {
         final ObjectMapper om = context.docMapper().objectMappers().get(parent);
         if (om != null && om.nested().isNested()) {
             throw new MapperParsingException(
                 "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]"
             );
         }
-        if (i == parts.length - 1) break;
         parent = parent + "." + parts[i];
+    }
+    // Check the last intermediate segment too
+    final ObjectMapper om = context.docMapper().objectMappers().get(parent);
+    if (om != null && om.nested().isNested()) {
+        throw new MapperParsingException(
+            "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]"
+        );
     }
 }
Suggestion importance[1-10]: 7

__

Why: The original loop has a logic bug: the break at i == parts.length - 1 exits before updating parent to the last intermediate segment, so the deepest intermediate object is never checked for nesting. The suggested fix correctly iterates through all intermediate segments. This is a real correctness issue in the nested parent validation.

Medium
Fix volatile field write ordering to prevent data race

The ensureBytes() method has a data race: bytes is checked without synchronization,
and bytesOffset is written before bytes, so another thread could observe a non-null
bytes with a stale bytesOffset of 0. Since bytes is volatile, writing bytes last
(after bytesOffset) ensures the happens-before relationship is correct.

server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114]

 private void ensureBytes() {
     if (bytes != null) {
         return;
     }
-    ...
+
+    byte[] arr;
+    int off;
+
+    if (packed instanceof BytesArray ba) {
+        arr = ba.array();
+        off = ba.offset();
+    } else {
+        arr = BytesReference.toBytes(packed);
+        off = 0;
+    }
+
+    // Write bytesOffset before bytes to ensure visibility via volatile write on bytes
     bytesOffset = off;
-    bytes = arr;
+    bytes = arr; // volatile write; establishes happens-before for bytesOffset
 }
Suggestion importance[1-10]: 6

__

Why: The ensureBytes() method writes bytesOffset before bytes (the volatile field), which is actually the correct ordering for establishing happens-before via the volatile write on bytes. The suggestion's improved code is essentially the same as the existing code with only a comment added, so the actual fix is minimal. The concern about the data race is valid but the existing code already has the correct write order.

Low
General
Remove reflection-based internal state access in tests

Using reflection to access private fields in tests is fragile and couples the test
to implementation details. The tests that verify bytes and bytesOffset internal
state should instead be validated through the public API (e.g., get() and
asFloatArray() return correct values), or the internal state should be exposed via
package-private accessors. The TODO comment indicates this is known; it should be
resolved before merging.

server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java [156-163]

-// TODO remove this
-@SuppressForbidden(reason = "todo, remove this")
-private static Object getPrivateField(Object target, String fieldName) throws Exception {
-    Field f = target.getClass().getDeclaredField(fieldName);
-    f.setAccessible(true);
-    return f.get(target);
-}
+// Removed reflection-based internal state checks; correctness is verified via public API:
+// pfa.get(i) and pfa.asFloatArray() already assert correct decoding behavior.
Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies that using reflection to access private fields couples tests to implementation details and is fragile. However, the improved_code simply removes the helper without showing how the tests that use it would be refactored, making it incomplete as a standalone fix.

Low
Document ExtraFieldValues behavior on NOOP updates

The validation only rejects ExtraFieldValues when a script is present, but does not
guard against ExtraFieldValues being set on the doc request when docAsUpsert is true
and a script is absent. More importantly, there is no validation preventing
ExtraFieldValues from being silently ignored when the update falls into a NOOP path.
Consider also validating that ExtraFieldValues are not set on doc when docAsUpsert
is enabled without a script, or at minimum add a comment explaining the intended
behavior.

server/src/main/java/org/opensearch/action/update/UpdateRequest.java [240-245]

 if (script != null) {
     if ((doc != null && !doc.extraFieldValues().isEmpty())
         || (upsertRequest != null && !upsertRequest.extraFieldValues().isEmpty())) {
         validationException = addValidationError("ExtraFieldValues are not supported with scripted updates", validationException);
     }
 }
+// Note: ExtraFieldValues on doc are propagated via UpdateHelper.prepareUpdateIndexRequest;
+// they are intentionally ignored on NOOP (no-change) updates.
Suggestion importance[1-10]: 1

__

Why: The suggestion only adds a comment to the existing code without any functional change. The existing_code and improved_code are functionally identical, making this a documentation-only suggestion with minimal impact.

Low
Suggestions up to commit 2e90030
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect path accumulation in nested parent validation

The loop iterates from i=1 to parts.length - 1 (due to the early break), but parent
is only updated with parts[i] when i < parts.length - 1. This means the last
intermediate segment is never appended, so the path built for checking is incorrect
for deeper nesting. The parent accumulation should happen at the start of each
iteration before the break check.

server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [191-205]

 private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) {
     if (parts.length <= 1) return;
 
     String parent = parts[0];
-    for (int i = 1; i < parts.length; i++) {
+    for (int i = 1; i < parts.length - 1; i++) {
         final ObjectMapper om = context.docMapper().objectMappers().get(parent);
         if (om != null && om.nested().isNested()) {
             throw new MapperParsingException(
                 "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]"
             );
         }
-        if (i == parts.length - 1) break;
         parent = parent + "." + parts[i];
     }
 }
Suggestion importance[1-10]: 7

__

Why: The loop logic is indeed buggy: the parent path is only updated when i < parts.length - 1, but the nested check for the last intermediate segment is skipped due to the early break. The suggested fix simplifies the loop to i < parts.length - 1 and removes the break, which correctly checks all intermediate path segments. This is a real logic bug that could cause nested parent validation to be bypassed for deeper paths.

Medium
Fix potential data race in lazy byte materialization

The ensureBytes() method uses non-atomic writes to bytesOffset and bytes (both
volatile), which can cause a data race: another thread may see the updated bytes but
the stale bytesOffset = 0 (default), leading to incorrect decoding. The assignment
to bytesOffset must happen before bytes is published, or both fields should be set
atomically.

server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114]

 private void ensureBytes() {
     if (bytes != null) {
         return;
     }
-    ...
+
+    byte[] arr;
+    int off;
+
+    if (packed instanceof BytesArray ba) {
+        arr = ba.array();
+        off = ba.offset();
+    } else {
+        arr = BytesReference.toBytes(packed);
+        off = 0;
+    }
+
+    // Write offset before bytes to ensure correct visibility ordering
     bytesOffset = off;
+    // Volatile write to bytes acts as a publication barrier
     bytes = arr;
 }
Suggestion importance[1-10]: 6

__

Why: There is a real potential data race: bytesOffset is written before bytes, but since both are volatile, the JMM guarantees that the volatile write to bytes happens-after the write to bytesOffset, so any thread that reads bytes != null will also see the correct bytesOffset. However, the improved code is essentially identical to the existing code (same ordering), so the suggestion doesn't actually change behavior. The comment in the PR already acknowledges that multiple threads may transiently allocate, but results are equivalent.

Low
General
Defensive copy to prevent external array mutation

The constructor stores the provided float[] array directly without copying, so
external mutations to the array after construction will silently corrupt the stored
value. Since asFloatArray() also returns the same reference, callers can mutate the
internal state. A defensive copy should be made in the constructor.

server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java [24-26]

 public PrimitiveFloatArray(float[] v) {
-    this.v = v;
+    this.v = v.clone();
 }
Suggestion importance[1-10]: 4

__

Why: This is a valid defensive programming concern — storing the array directly allows external mutation. However, the PR's design intent (as noted in the comment "This can be used by clients that already have float[]") suggests performance is a priority, and asFloatArray() intentionally returns the same reference. Adding a defensive copy would change the documented behavior and add overhead.

Low
Ensure correct extra field values source during doc update

The extraFieldValues are propagated from currentRequest (the doc update request) to
the final index request during a doc-based update. However, if the document already
existed and is being updated (not upserted), the extra field values from the
previous version are not considered — this may be intentional, but it should be
verified that currentRequest here always refers to the doc request and not the
upsert request, since prepareUpdateIndexRequest is called for doc-based updates
where currentRequest should be request.doc().

server/src/main/java/org/opensearch/action/update/UpdateHelper.java [250-259]

 final IndexRequest finalIndexRequest = Requests.indexRequest(request.index())
     .id(request.id())
     .routing(routing)
     .source(updatedSourceAsMap, updateSourceContentType)
-    .extraFieldValues(currentRequest.extraFieldValues())
+    .extraFieldValues(request.doc() != null ? request.doc().extraFieldValues() : ExtraFieldValues.EMPTY)
     .setIfSeqNo(getResult.getSeqNo())
     .setIfPrimaryTerm(getResult.getPrimaryTerm())
     .waitForActiveShards(request.waitForActiveShards())
     .timeout(request.timeout())
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify that currentRequest refers to request.doc(), but without seeing the full context of prepareUpdateIndexRequest, this is speculative. The improved_code introduces ExtraFieldValues.EMPTY which requires an additional import and changes behavior in a way that may not be correct if currentRequest is already request.doc(). This is more of a verification concern than a clear bug.

Low
Suggestions up to commit 988c50d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix incorrect intermediate path construction in nested validation

The loop iterates from i=1 to parts.length - 1 (due to the early break), but the
parent path is built using parts[i] instead of parts[i-1], causing incorrect
intermediate path construction. For example, with parts ["a","b","c"], after
checking "a" the next parent becomes "a.b" but the code appends parts[1]="b"
correctly only by coincidence; however the last segment parts[parts.length-2] is
never checked. The parent accumulation should use parts[i] before the check, not
after.

server/src/main/java/org/opensearch/index/mapper/DocumentParser.java [191-205]

 private static void validateNoNestedParents(ParseContext context, String fullPath, String[] parts) {
     if (parts.length <= 1) return;
 
-    String parent = parts[0];
-    for (int i = 1; i < parts.length; i++) {
-        final ObjectMapper om = context.docMapper().objectMappers().get(parent);
+    StringBuilder parent = new StringBuilder(parts[0]);
+    for (int i = 0; i < parts.length - 1; i++) {
+        if (i > 0) {
+            parent.append('.').append(parts[i]);
+        }
+        final ObjectMapper om = context.docMapper().objectMappers().get(parent.toString());
         if (om != null && om.nested().isNested()) {
             throw new MapperParsingException(
                 "Cannot add a value for field [" + fullPath + "] since one of the intermediate objects is nested: [" + parent + "]"
             );
         }
-        if (i == parts.length - 1) break;
-        parent = parent + "." + parts[i];
     }
 }
Suggestion importance[1-10]: 7

__

Why: The original loop logic has a subtle bug: it checks parent starting at parts[0], then builds the next parent using parts[i] where i starts at 1, but the early break at i == parts.length - 1 means the second-to-last intermediate path (e.g., "a.b" for ["a","b","c"]) is never checked. The suggested fix correctly iterates over all intermediate paths from index 0 to parts.length - 2.

Medium
Fix data race in lazy byte materialization

The ensureBytes() method has a data race: bytesOffset is written before bytes, so
another thread could see a non-null bytes with a stale bytesOffset = 0 (the initial
value) before the assignment completes. Since bytes is the visibility guard (checked
first), bytesOffset must be written before bytes is published, but the JMM doesn't
guarantee ordering without synchronization. Swap the assignment order so bytesOffset
is always set before bytes becomes visible.

server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java [94-114]

 private void ensureBytes() {
     if (bytes != null) {
         return;
     }
-    ...
+
+    byte[] arr;
+    int off;
+
+    if (packed instanceof BytesArray ba) {
+        arr = ba.array();
+        off = ba.offset();
+    } else {
+        arr = BytesReference.toBytes(packed);
+        off = 0;
+    }
+
+    // Write offset before bytes to ensure correct visibility ordering
     bytesOffset = off;
-    bytes = arr;
+    bytes = arr;  // volatile write; publishes bytesOffset
 }
Suggestion importance[1-10]: 6

__

Why: There is a legitimate JMM concern: bytesOffset is a non-final volatile field written before bytes, but since bytes is the null-check guard, another thread could observe bytes != null with a stale bytesOffset. The fix (writing bytesOffset before bytes) is already what the existing code does, so the improved_code is essentially the same as existing_code, making this suggestion marginally impactful.

Low
General
Defensive copy to prevent external array mutation

The constructor stores the provided float[] directly without copying, so external
mutation of the array after construction will silently corrupt the stored value.
Since asFloatArray() also returns the internal array directly, callers can mutate
it. A defensive copy should be made in the constructor to ensure immutability of the
stored value.

server/src/main/java/org/opensearch/index/mapper/extrasource/PrimitiveFloatArray.java [24-26]

 public PrimitiveFloatArray(float[] v) {
-    this.v = v;
+    this.v = v.clone();
 }
Suggestion importance[1-10]: 3

__

Why: While defensive copying is a valid concern for immutability, PrimitiveFloatArray is a package-private class used internally, and asFloatArray() already returns the internal array directly (so immutability is not fully enforced regardless). The performance cost of cloning may outweigh the benefit in this context.

Low
Ensure ExtraFieldValues validation covers docAsUpsert path

The validation only rejects ExtraFieldValues when a script is present, but does not
guard against the case where docAsUpsert is true and the doc request carries
ExtraFieldValues while a script is also set (which is already invalid for other
reasons). More importantly, there is no validation preventing ExtraFieldValues on
the doc request when docAsUpsert is enabled alongside a script — the existing check
covers this, but the condition should also handle the docAsUpsert path where doc
doubles as the upsert. Consider also validating that ExtraFieldValues on doc are
rejected when script != null && docAsUpsert to be explicit.

server/src/main/java/org/opensearch/action/update/UpdateRequest.java [240-245]

 if (script != null) {
     if ((doc != null && !doc.extraFieldValues().isEmpty())
-        || (upsertRequest != null && !upsertRequest.extraFieldValues().isEmpty())) {
+        || (upsertRequest != null && !upsertRequest.extraFieldValues().isEmpty())
+        || (docAsUpsert && doc != null && !doc.extraFieldValues().isEmpty())) {
         validationException = addValidationError("ExtraFieldValues are not supported with scripted updates", validationException);
     }
 }
Suggestion importance[1-10]: 2

__

Why: The improved_code adds a redundant condition (docAsUpsert && doc != null && !doc.extraFieldValues().isEmpty()) which is already covered by the existing (doc != null && !doc.extraFieldValues().isEmpty()) check. The suggestion does not add any new coverage and the improved_code is logically equivalent to the existing code.

Low

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 988c50d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Lamine Idjeraoui <lidjeraoui@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2e90030

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 72047a8.

PathLineSeverityDescription
server/src/main/java/org/opensearch/index/mapper/DocumentParser.java163lowTODO comment explicitly acknowledges missing validation: 'EFV is only allowed if JSON is absent (validation)'. This means a document can simultaneously supply a field value via both JSON _source and ExtraFieldValues with no guard against conflicts or override, which could lead to unpredictable indexing behavior. Not malicious, but an acknowledged incomplete security/correctness control.
server/src/test/java/org/opensearch/index/mapper/extrasource/PackedFloatArrayTests.java154lowA @SuppressForbidden-annotated method uses reflection (Field.setAccessible(true)) to access private fields of PackedFloatArray. The annotation reason says 'todo, remove this', indicating this is acknowledged technical debt. Confined to test code and has no runtime security impact, but suppressing access controls without a strong justification is anomalous.
server/src/main/java/org/opensearch/index/mapper/extrasource/PackedFloatArray.java85lowNon-atomic lazy initialization of volatile fields 'bytes' and 'bytesOffset' in ensureBytes() creates a benign but acknowledged data race. The class comment explicitly notes multiple threads may transiently allocate. This is a correctness/safety concern (not malicious), but the intentional acceptance of a data race in a shared mutable object without synchronization is a minor anomaly worth noting.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 0 | Low: 3


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 72047a8

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 72047a8: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 72047a8: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Michael Froh <msfroh@apache.org>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 362b03d

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 362b03d: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 362b03d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Add support of "extra fields" outside _source indexing

2 participants