Skip to content

Address review nits on NONE and LAST_TOKEN pooling PRs#4744

Open
aneesh-db wants to merge 2 commits intoopensearch-project:mainfrom
aneesh-db:fix/pooling-review-nits
Open

Address review nits on NONE and LAST_TOKEN pooling PRs#4744
aneesh-db wants to merge 2 commits intoopensearch-project:mainfrom
aneesh-db:fix/pooling-review-nits

Conversation

@aneesh-db
Copy link
Copy Markdown
Contributor

Description

Follow-up to address review comments from #4710 and #4711:

  1. Early return for NONE pooling (comment): HuggingfaceTextEmbeddingTranslator now returns immediately when pooling is none, skipping unnecessary attention mask creation and switch statement.

  2. Extract shared lastTokenPool (comment): Moved duplicated lastTokenPool method from both translators into a shared TextEmbeddingPoolingUtils class. Also normalizes to use toLongArray() consistently, avoiding float-to-long precision concerns.

Related Issues

Follow-up to #4710 and #4711

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@aneesh-db aneesh-db had a problem deploying to ml-commons-cicd-env-require-approval March 27, 2026 10:26 — with GitHub Actions Error
@aneesh-db aneesh-db had a problem deploying to ml-commons-cicd-env-require-approval March 27, 2026 10:26 — with GitHub Actions Error
@aneesh-db aneesh-db had a problem deploying to ml-commons-cicd-env-require-approval March 27, 2026 10:26 — with GitHub Actions Failure
@aneesh-db aneesh-db had a problem deploying to ml-commons-cicd-env-require-approval March 27, 2026 10:26 — with GitHub Actions Failure
@aneesh-db aneesh-db force-pushed the fix/pooling-review-nits branch from ee5558e to a00348e Compare March 27, 2026 10:28
@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 27, 2026 10:30 — with GitHub Actions Waiting
@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 27, 2026 10:30 — with GitHub Actions Waiting
@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 27, 2026 10:30 — with GitHub Actions Waiting
@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 27, 2026 10:30 — with GitHub Actions Waiting
@pyek-bot
Copy link
Copy Markdown
Collaborator

@aneesh-db Thanks for the follow-up PRs. Could you also remove the feature details from the older release notes? This was added in your previous PR.s

- Early return for NONE pooling in HuggingfaceTextEmbeddingTranslator
  to skip unnecessary attention mask creation and switch statement
- Extract lastTokenPool to shared TextEmbeddingPoolingUtils class,
  eliminating duplication between ONNX and Huggingface translators
- Use toLongArray() consistently for attention mask sum, avoiding
  float-to-long precision concerns

Signed-off-by: Aneesh Nema <aneesh.nema@databricks.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit 74bc0d2)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Extract shared lastTokenPool utility and update translators

Relevant files:

  • ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/TextEmbeddingPoolingUtils.java
  • ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/HuggingfaceTextEmbeddingTranslator.java
  • ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/ONNXSentenceTransformerTextEmbeddingTranslator.java

Sub-PR theme: Early return for NONE pooling in HuggingfaceTextEmbeddingTranslator

Relevant files:

  • ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/HuggingfaceTextEmbeddingTranslator.java

⚡ Recommended focus areas for review

Missing Normalize

In the refactored code, after the early return for NONE pooling, the non-NONE pooling path no longer applies normalization. The original code had a shared normalization step after the if/else block. Verify that normalization is still applied for pooling modes like MEAN, CLS, LASTTOKEN, etc., and that it hasn't been accidentally dropped.

// For other pooling modes, use last_hidden_state or first output
embeddings = list.get("last_hidden_state");
if (embeddings == null) {
    embeddings = list.get(0);
}
Encoding encoding = (Encoding) ctx.getAttachment("encoding");
long[] attentionMask = encoding.getAttentionMask();
NDManager manager = ctx.getNDManager();
NDArray inputAttentionMask = manager.create(attentionMask).toType(DataType.FLOAT32, true);
switch (pooling) {
    case "mean":
        embeddings = meanPool(embeddings, inputAttentionMask, false);
        break;
    case "mean_sqrt_len":
        embeddings = meanPool(embeddings, inputAttentionMask, true);
        break;
    case "max":
        embeddings = maxPool(embeddings, inputAttentionMask);
        break;
    case "weightedmean":
        embeddings = weightedMeanPool(embeddings, inputAttentionMask);
        break;
    case "cls":
        embeddings = embeddings.get(0);
        break;
    case "lasttoken":
        embeddings = TextEmbeddingPoolingUtils.lastTokenPool(embeddings, inputAttentionMask);
        break;
    default:
        throw new AssertionError("Unexpected pooling model: " + pooling);
}
Incorrect Pooling

The lastTokenPool method sums the entire attention mask to find the last token index. For batched inputs (batch_size > 1), this would sum across all sequences, producing an incorrect index. Confirm that this method is only ever called with a single-sequence attention mask, or add a guard/reshape to handle the correct dimension.

static NDArray lastTokenPool(NDArray embeddings, NDArray attentionMask) {
    long tokenCount = attentionMask.sum().toLongArray()[0];
    long lastTokenIdx = Math.max(tokenCount - 1, 0);
    return embeddings.get(lastTokenIdx);
Missing Release Notes

The release notes entries for LAST_TOKEN pooling support and NONE pooling mode bug fix were removed from the release notes file. These features were introduced in PRs #4710 and #4711 and should still be documented. Confirm whether these entries should be preserved or moved to a different section.

 * OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions
 * and limitations under the License.
 */
package org.opensearch.ml.engine.algorithms.text_embedding;

@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval March 30, 2026 23:08 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval March 30, 2026 23:08 — with GitHub Actions Failure
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval March 30, 2026 23:08 — with GitHub Actions Error
@rithin-pullela-aws rithin-pullela-aws had a problem deploying to ml-commons-cicd-env-require-approval March 30, 2026 23:08 — with GitHub Actions Error
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 30, 2026

PR Code Suggestions ✨

Latest suggestions up to 74bc0d2
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null guard before using embeddings

If list is empty, list.get(0) will throw an exception rather than returning null, so
embeddings could still be null only if list.get(0) itself returns null. However, if
list is completely empty, this will throw an IndexOutOfBoundsException. A null-check
or size-check before calling embeddings.normalize and embeddings.toFloatArray()
would prevent a potential NullPointerException if embeddings ends up null.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/HuggingfaceTextEmbeddingTranslator.java [93-99]

 if (embeddings == null) {
     embeddings = list.get(0);
+}
+if (embeddings == null) {
+    throw new IllegalStateException("No embeddings found in model output for NONE pooling mode");
 }
 if (normalize) {
     embeddings = embeddings.normalize(2, 0);
 }
 return embeddings.toFloatArray();
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that if list is empty, list.get(0) could throw an IndexOutOfBoundsException, and adding an explicit null/state check with a meaningful error message improves robustness and debuggability. However, this is a defensive edge case that may rarely occur in practice.

Low
Ensure attention mask summed over correct dimension

The attentionMask passed here may be a multi-dimensional array (e.g., batch x
sequence_length), so calling .sum() on it would sum all elements across all
dimensions, not just the sequence dimension. This could produce an incorrect
tokenCount if the mask is not already 1-D. Consider flattening or explicitly summing
along the correct axis before calling toLongArray()[0].

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/TextEmbeddingPoolingUtils.java [25-29]

 static NDArray lastTokenPool(NDArray embeddings, NDArray attentionMask) {
-    long tokenCount = attentionMask.sum().toLongArray()[0];
+    // Sum along the sequence dimension (axis 0 for 1-D, or last axis for batched)
+    long tokenCount = attentionMask.flatten().sum().toLongArray()[0];
     long lastTokenIdx = Math.max(tokenCount - 1, 0);
     return embeddings.get(lastTokenIdx);
 }
Suggestion importance[1-10]: 4

__

Why: The concern about multi-dimensional masks is valid in theory, but looking at the existing code in both translators, the attentionMask passed to lastTokenPool is already a 1-D array derived from encoding.getAttentionMask(). The suggestion to use .flatten() is a minor defensive improvement but not critical given the actual usage context.

Low

Previous suggestions

Suggestions up to commit 3214126
CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null check before using embeddings

The early return for the "none" pooling path skips the normalization logic that
exists at the end of the method for other pooling modes. However, the normalize
check was added inline here, which is correct. But if list.get(0) also returns null
(e.g., an empty list), calling embeddings.normalize(2, 0) will throw a
NullPointerException. A null check should be added before using embeddings.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/HuggingfaceTextEmbeddingTranslator.java [93-99]

 if (embeddings == null) {
     embeddings = list.get(0);
+}
+if (embeddings == null) {
+    throw new IllegalArgumentException("No embeddings found in model output for NONE pooling mode");
 }
 if (normalize) {
     embeddings = embeddings.normalize(2, 0);
 }
 return embeddings.toFloatArray();
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that list.get(0) could theoretically return null, leading to a NullPointerException when calling embeddings.normalize() or toFloatArray(). This is a valid defensive programming concern, though in practice an empty NDList would be unusual.

Low
General
Validate attention mask dimensions before pooling

The attentionMask is expected to be a 1D array (sequence_length), but the callers in
both translators pass inputAttentionMask which may have been reshaped or broadcast
to a higher-dimensional tensor before this call. If attentionMask is
multi-dimensional, sum() will return the total sum across all dimensions, giving an
incorrect tokenCount. A comment or a shape assertion should be added to document and
enforce the expected shape.

ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/text_embedding/TextEmbeddingPoolingUtils.java [25-29]

 static NDArray lastTokenPool(NDArray embeddings, NDArray attentionMask) {
+    // attentionMask must be 1-D (sequence_length): 1 for real tokens, 0 for padding
+    if (attentionMask.getShape().dimension() != 1) {
+        throw new IllegalArgumentException("attentionMask must be 1-D, got shape: " + attentionMask.getShape());
+    }
     long tokenCount = attentionMask.sum().toLongArray()[0];
     long lastTokenIdx = Math.max(tokenCount - 1, 0);
     return embeddings.get(lastTokenIdx);
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion adds a dimension check for attentionMask, but looking at the callers, inputAttentionMask is derived from encoding.getAttentionMask() which returns a 1D array, so this validation is unlikely to catch real issues. The suggestion is more of a defensive measure than addressing an actual bug.

Low

Release notes are auto-generated, removing entries added in opensearch-project#4710 and opensearch-project#4711.

Signed-off-by: Aneesh Nema <aneesh.nema@databricks.com>
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 74bc0d2

@aneesh-db
Copy link
Copy Markdown
Contributor Author

@pyek-bot done, removed the feature details from the older release notes

@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 31, 2026 05:53 — with GitHub Actions Waiting
@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 31, 2026 05:53 — with GitHub Actions Waiting
@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 31, 2026 05:53 — with GitHub Actions Waiting
@aneesh-db aneesh-db requested a deployment to ml-commons-cicd-env-require-approval March 31, 2026 05:53 — with GitHub Actions Waiting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants