Skip to content

Add created_by field to ML resources for adoption metrics attribution#4754

Open
dbwiddis wants to merge 10 commits into
opensearch-project:mainfrom
dbwiddis:created-by
Open

Add created_by field to ML resources for adoption metrics attribution#4754
dbwiddis wants to merge 10 commits into
opensearch-project:mainfrom
dbwiddis:created-by

Conversation

@dbwiddis
Copy link
Copy Markdown
Member

@dbwiddis dbwiddis commented Mar 31, 2026

Description

Adds a created_by field to ML agents, connectors, and models to enable
attribution of adoption metrics by the client that created the resource
(e.g. flow-framework, opensearch-dashboards).

This is distinct from the authenticated user (User/FGAC) — created_by
identifies the calling client, which may differ from the user when a
workflow engine like Flow Framework creates resources on behalf of a user.

Data model

  • MLAgent: new created_by field, version-gated serialization at
    VERSION_3_7_0, emitted in getTags() (defaults to "unknown")
  • Connector / AbstractConnector / HttpConnector:
    new created_by field, version-gated serialization at VERSION_3_7_0
  • MLCreateConnectorInput: new created_by field, flows through to
    connector via XContent round-trip in TransportCreateConnectorAction
  • MLModel: new created_by field, version-gated serialization at
    VERSION_3_7_0, emitted in all three getTags() methods
    (getRemoteModelTags, getPreTrainedModelTags, getCustomModelTags),
    defaults to "unknown"
  • MLRegisterModelInput: new created_by field, wired through to
    MLModel in MLModelManager (all three builder sites, including
    the previously missing tenantId fix in registerModelFromUrl)

Validation

  • MLRegisterAgentRequest: validates created_by as optional safe-text
    field; restructured validate() to seed exception from validateFields
    and eliminate manual error-merging loop
  • MLCreateConnectorRequest: validates created_by as optional safe-text field
  • MLRegisterModelRequest: validates created_by as optional safe-text field
  • MLUpdateConnectorRequest: created_by is intentionally not validated
    here — it is set at creation time only and is not merged by
    HttpConnector.update(); also fixed a pre-existing bug where
    validateFields result overwrote any prior connectorId null error

Metrics

  • MLStatsJobProcessor requires no changes — created_by flows through
    automatically via model.getTags() and agent.getTags()

Versioning

  • CommonValue: added VERSION_3_7_0 constant; VERSION_3_6_0 was
    considered but removed as unused in production code — tests use the
    existing VERSION_3_5_0 constant to represent a pre-created_by node
  • All new fields are version-gated: nodes on older versions will
    deserialize created_by as null
  • CREATED_BY_FIELD constant centralized in CommonValue; duplicate
    local definitions removed from MLAgent, MLModel, HttpConnector,
    MLCreateConnectorInput, and MLRegisterModelInput

Testing

  • Unit tests added for serialization (stream in/out at VERSION_3_7_0
    and VERSION_3_5_0), XContent parse/write, and getTags() emission
    for all three resource types
  • Happy-path createdBy tests added to MachineLearningNodeClientTest
    verifying the field is passed through for registerAgent, register (Model),
    and createConnector
  • Validation tests added to MLRegisterAgentRequestTest,
    MLCreateConnectorRequestTests, and MLRegisterModelRequestTest
  • All existing tests pass

Related Issues

Resolves #4752

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:36 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:36 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:36 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:36 — with GitHub Actions Failure
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

PR Reviewer Guide 🔍

(Review updated until commit 8c1a652)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add created_by field to MLAgent and agent registration

Relevant files:

  • common/src/main/java/org/opensearch/ml/common/agent/MLAgent.java
  • common/src/main/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequest.java
  • common/src/test/java/org/opensearch/ml/common/agent/MLAgentTest.java
  • common/src/test/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequestTest.java
  • client/src/test/java/org/opensearch/ml/client/MachineLearningNodeClientTest.java
  • plugin/src/test/java/org/opensearch/ml/action/agents/DeleteAgentTransportActionTests.java
  • plugin/src/test/java/org/opensearch/ml/action/agents/GetAgentTransportActionTests.java
  • common/src/test/java/org/opensearch/ml/common/transport/agent/MLAgentGetResponseTest.java

Sub-PR theme: Add created_by field to Connector and connector requests

Relevant files:

  • common/src/main/java/org/opensearch/ml/common/connector/AbstractConnector.java
  • common/src/main/java/org/opensearch/ml/common/connector/AwsConnector.java
  • common/src/main/java/org/opensearch/ml/common/connector/Connector.java
  • common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java
  • common/src/main/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInput.java
  • common/src/main/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorRequest.java
  • common/src/main/java/org/opensearch/ml/common/transport/connector/MLUpdateConnectorRequest.java
  • common/src/test/java/org/opensearch/ml/common/connector/HttpConnectorTest.java
  • common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorInputTests.java
  • common/src/test/java/org/opensearch/ml/common/transport/connector/MLCreateConnectorRequestTests.java
  • common/src/test/java/org/opensearch/ml/common/transport/connector/MLUpdateConnectorRequestTests.java

Sub-PR theme: Add created_by field to MLModel and model registration

Relevant files:

  • common/src/main/java/org/opensearch/ml/common/MLModel.java
  • common/src/main/java/org/opensearch/ml/common/transport/register/MLRegisterModelInput.java
  • common/src/main/java/org/opensearch/ml/common/transport/register/MLRegisterModelRequest.java
  • plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java
  • common/src/test/java/org/opensearch/ml/common/MLModelTests.java
  • common/src/test/java/org/opensearch/ml/common/transport/register/MLRegisterModelInputTest.java
  • common/src/test/java/org/opensearch/ml/common/transport/register/MLRegisterModelRequestTest.java

⚡ Recommended focus areas for review

Version Gap

VERSION_3_6_0 is skipped entirely — the version sequence jumps from VERSION_3_5_0 to VERSION_3_7_0. If VERSION_3_6_0 is ever defined later and used for other features, the ordering of stream read/write checks could behave unexpectedly. Verify this is intentional and that no 3.6.0 release is planned between these two constants.

public static final Version VERSION_3_7_0 = Version.fromString("3.7.0");
Error Merging Regression

The previous code assigned exception = validateFields(fieldsToValidate) directly, which would overwrite any earlier null-connector-id error with only the field-validation result. The new code correctly merges errors, but the old behavior was a latent bug that is now fixed. Confirm the new merged behavior (both connector-id error and field errors together) is the intended contract and that callers handle multiple validation errors properly.

ActionRequestValidationException fieldException = validateFields(fieldsToValidate);
if (fieldException != null) {
    for (String error : fieldException.validationErrors()) {
        exception = addValidationError(error, exception);
    }
}
Null Exception Seed

validateFields is called and its result is stored in exception, but if mlAgent is null the method returns early before reaching validateFields. If mlAgent is non-null but validateFields returns null, subsequent addValidationError calls correctly chain onto null. However, if validateFields returns a non-null exception and none of the context-management conditions are true, the exception from validateFields is returned correctly. Verify that the exception variable seeded from validateFields is always the starting point for subsequent addValidationError calls (it is, based on the diff), but double-check there is no path where a prior exception is silently dropped.

Map<String, FieldDescriptor> fieldsToValidate = new HashMap<>();
fieldsToValidate.put("Agent created_by field", new FieldDescriptor(mlAgent.getCreatedBy(), false));
ActionRequestValidationException exception = validateFields(fieldsToValidate);

if (mlAgent.getContextManagementName() != null && mlAgent.getContextManagement() != null) {
    exception = addValidationError("Cannot specify both context_management_name and context_management", exception);
}
if (mlAgent.getContextManagementName() != null) {
    exception = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
}
if (mlAgent.getContextManagement() != null) {
    exception = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
}

return exception;
Missing createdBy in prebuilt model

The registerPrebuiltModel path (if it exists separately from registerModelFromUrl) should also propagate createdBy to the built MLModel. Confirm all model registration code paths in MLModelManager now set createdBy, not just the three sites visible in this diff.

.tenantId(registerModelInput.getTenantId())
.createdBy(registerModelInput.getCreatedBy())

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 31, 2026

PR Code Suggestions ✨

Latest suggestions up to 8c1a652

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validation errors not properly accumulated

The validateFields method returns a new exception object, but subsequent
addValidationError calls may not properly chain onto it if validateFields returns a
non-null exception. The existing code in MLUpdateConnectorRequest shows the correct
pattern: iterate over fieldException.validationErrors() and add each to the
accumulating exception. Apply the same pattern here to ensure all validation errors
are properly accumulated.

common/src/main/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequest.java [56-68]

 Map<String, FieldDescriptor> fieldsToValidate = new HashMap<>();
 fieldsToValidate.put("Agent created_by field", new FieldDescriptor(mlAgent.getCreatedBy(), false));
-ActionRequestValidationException exception = validateFields(fieldsToValidate);
+ActionRequestValidationException fieldException = validateFields(fieldsToValidate);
+ActionRequestValidationException exception = null;
+if (fieldException != null) {
+    for (String error : fieldException.validationErrors()) {
+        exception = addValidationError(error, exception);
+    }
+}
 
 if (mlAgent.getContextManagementName() != null && mlAgent.getContextManagement() != null) {
     exception = addValidationError("Cannot specify both context_management_name and context_management", exception);
 }
 if (mlAgent.getContextManagementName() != null) {
     exception = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
 }
 if (mlAgent.getContextManagement() != null) {
     exception = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid concern - the current code assigns the result of validateFields to exception and then passes it to addValidationError, but addValidationError creates a new exception object that may not include the errors from validateFields. The pattern used in MLUpdateConnectorRequest (iterating over fieldException.validationErrors()) is the correct approach for properly accumulating all validation errors.

Medium
General
Missing version constant for 3.6.0

The PR introduces VERSION_3_7_0 but skips VERSION_3_6_0. If 3.6.0 is a valid release
version in the project's versioning scheme, this gap could cause issues with
version-based compatibility checks. Verify whether VERSION_3_6_0 should also be
defined, or confirm that 3.6.0 is intentionally skipped.

common/src/main/java/org/opensearch/ml/common/CommonValue.java [113]

+public static final Version VERSION_3_6_0 = Version.fromString("3.6.0");
 public static final Version VERSION_3_7_0 = Version.fromString("3.7.0");
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about a gap in version constants (skipping VERSION_3_6_0), but this may be intentional if 3.6.0 is not a release version in this project. The suggestion is speculative and asks for verification rather than identifying a definite bug.

Low
Verify tenantId addition doesn't break existing behavior

The registerModelFromUrl method now sets both tenantId and createdBy, but looking at
the diff, tenantId was also added here (not just createdBy). Verify that tenantId
was previously missing from this code path and that adding it now doesn't introduce
unintended behavior for existing callers that didn't previously propagate tenant
context through this path.

plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java [847-857]

+.modelState(MLModelState.REGISTERING)
+.modelConfig(registerModelInput.getModelConfig())
+.deploySetting(registerModelInput.getDeploySetting())
+.createdTime(now)
+.lastUpdateTime(now)
+.isHidden(registerModelInput.getIsHidden())
+.guardrails(registerModelInput.getGuardrails())
+.modelInterface(registerModelInput.getModelInterface())
+.tenantId(registerModelInput.getTenantId())
+.createdBy(registerModelInput.getCreatedBy())
+.build();
 
-
Suggestion importance[1-10]: 3

__

Why: The suggestion only asks for verification and the existing_code is identical to the improved_code, making it a low-value suggestion. The tenantId addition to registerModelFromUrl is a legitimate fix for a previously missing field propagation, consistent with the PR's intent.

Low

Previous suggestions

Suggestions up to commit e85bb6f
CategorySuggestion                                                                                                                                    Impact
Possible issue
Ensure version sequence has no gaps

The createdBy field is gated behind VERSION_3_7_0, but the version sequence jumps
from VERSION_3_5_0 to VERSION_3_7_0, skipping VERSION_3_6_0. If there is an
intermediate 3.6.0 release, the serialization/deserialization logic will be
incorrect for nodes running that version. Ensure the version constant matches the
actual release where this feature is introduced, or add the missing intermediate
version constant.

common/src/main/java/org/opensearch/ml/common/CommonValue.java [107]

+public static final Version VERSION_3_6_0 = Version.fromString("3.6.0");
 public static final Version VERSION_3_7_0 = Version.fromString("3.7.0");
Suggestion importance[1-10]: 5

__

Why: The version sequence jumps from VERSION_3_5_0 to VERSION_3_7_0, skipping VERSION_3_6_0. If a 3.6.0 release exists, serialization logic gated on VERSION_3_7_0 could be incorrect for nodes on that version. However, this may be intentional if 3.6.0 is not a planned release.

Low
General
Verify all model registration paths propagate createdBy

The registerModelFromUrl path now correctly sets both tenantId and createdBy, but
it's worth verifying that the pre-trained model registration path (e.g.,
registerPrebuiltModel or similar methods) also propagates createdBy from
registerModelInput to the MLModel being indexed. If any registration path is missed,
models registered through that path will always have a null createdBy, breaking
attribution metrics.

plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java [847-857]

-                .modelState(MLModelState.REGISTERING)
-                .modelConfig(registerModelInput.getModelConfig())
-                .deploySetting(registerModelInput.getDeploySetting())
-                .createdTime(now)
-                .lastUpdateTime(now)
-                .isHidden(registerModelInput.getIsHidden())
-                .guardrails(registerModelInput.getGuardrails())
-                .modelInterface(registerModelInput.getModelInterface())
-                .tenantId(registerModelInput.getTenantId())
-                .createdBy(registerModelInput.getCreatedBy())
-                .build();
-            IndexRequest indexModelMetaRequest = new IndexRequest(ML_MODEL_INDEX);
-            if (functionName == FunctionName.METRICS_CORRELATION) {
+.modelState(MLModelState.REGISTERING)
+.modelConfig(registerModelInput.getModelConfig())
+.deploySetting(registerModelInput.getDeploySetting())
+.createdTime(now)
+.lastUpdateTime(now)
+.isHidden(registerModelInput.getIsHidden())
+.guardrails(registerModelInput.getGuardrails())
+.modelInterface(registerModelInput.getModelInterface())
+.tenantId(registerModelInput.getTenantId())
+.createdBy(registerModelInput.getCreatedBy())
+.build();
Suggestion importance[1-10]: 4

__

Why: This is a valid concern about completeness — other registration paths may not propagate createdBy. However, the suggestion only asks to verify rather than providing a concrete fix, and the existing_code matches the improved_code, limiting its actionable value.

Low
Ensure all validation errors are properly accumulated

The validateFields method may return a new ActionRequestValidationException
instance, but subsequent calls to addValidationError and
validateContextManagementTemplateName/validateInlineContextManagement may not
accumulate errors into the same exception object if those methods create a new
exception internally. Verify that all validation errors are properly accumulated
into a single exception, similar to the pattern used in MLUpdateConnectorRequest
where errors from validateFields are explicitly merged.

common/src/main/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequest.java [56-68]

 Map<String, FieldDescriptor> fieldsToValidate = new HashMap<>();
 fieldsToValidate.put("Agent created_by field", new FieldDescriptor(mlAgent.getCreatedBy(), false));
 ActionRequestValidationException exception = validateFields(fieldsToValidate);
 
 if (mlAgent.getContextManagementName() != null && mlAgent.getContextManagement() != null) {
     exception = addValidationError("Cannot specify both context_management_name and context_management", exception);
 }
 if (mlAgent.getContextManagementName() != null) {
-    exception = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
+    ActionRequestValidationException ctxException = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
+    if (ctxException != null) {
+        for (String error : ctxException.validationErrors()) {
+            exception = addValidationError(error, exception);
+        }
+    }
 }
 if (mlAgent.getContextManagement() != null) {
-    exception = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
+    ActionRequestValidationException inlineException = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
+    if (inlineException != null) {
+        for (String error : inlineException.validationErrors()) {
+            exception = addValidationError(error, exception);
+        }
+    }
 }
Suggestion importance[1-10]: 3

__

Why: The validateContextManagementTemplateName and validateInlineContextManagement methods already accept and return the accumulated exception object (following the addValidationError pattern), so errors are already properly chained. The suggestion's concern about error accumulation is not actually a problem in this code.

Low
Suggestions up to commit 31fb021
CategorySuggestion                                                                                                                                    Impact
Possible issue
Missing intermediate version constant in sequence

The PR introduces VERSION_3_7_0 but skips VERSION_3_6_0. If the versioning sequence
is intentional, it should be documented. Otherwise, if VERSION_3_6_0 is expected to
exist (e.g., for other features), this gap could cause serialization compatibility
issues when nodes running 3.6.x communicate with nodes running 3.7.x, as the
created_by field would be missing from the stream.

common/src/main/java/org/opensearch/ml/common/CommonValue.java [107]

+public static final Version VERSION_3_6_0 = Version.fromString("3.6.0");
 public static final Version VERSION_3_7_0 = Version.fromString("3.7.0");
Suggestion importance[1-10]: 3

__

Why: The suggestion raises a valid concern about the version gap between VERSION_3_5_0 and VERSION_3_7_0, but adding VERSION_3_6_0 may not be necessary if that version doesn't exist in the project's release plan. This is more of a documentation/verification concern than a code bug.

Low
Validation errors may be lost due to incorrect chaining

The refactored validate() method no longer accumulates errors from validateFields
into the same exception variable used by subsequent addValidationError calls. If
validateFields returns a non-null exception and subsequent validations also fail,
the errors from validateFields will be lost because addValidationError creates a new
exception chain starting from the passed-in exception. The exception variable
returned by validateFields should be passed into subsequent addValidationError
calls.

common/src/main/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequest.java [56-68]

+Map<String, FieldDescriptor> fieldsToValidate = new HashMap<>();
+fieldsToValidate.put("Agent created_by field", new FieldDescriptor(mlAgent.getCreatedBy(), false));
+ActionRequestValidationException exception = validateFields(fieldsToValidate);
 
+if (mlAgent.getContextManagementName() != null && mlAgent.getContextManagement() != null) {
+    exception = addValidationError("Cannot specify both context_management_name and context_management", exception);
+}
+if (mlAgent.getContextManagementName() != null) {
+    exception = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
+}
+if (mlAgent.getContextManagement() != null) {
+    exception = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
+}
Suggestion importance[1-10]: 2

__

Why: The 'existing_code' and 'improved_code' are identical, meaning the suggestion doesn't actually propose any code change. Looking at the actual code, the exception variable from validateFields is correctly passed into subsequent addValidationError calls, so the described bug doesn't exist in the PR code.

Low
General
Inconsistent field population across registration paths

The registerModelFromUrl method now sets tenantId and createdBy, but looking at the
diff, the modelName, modelGroupId, algorithm, and other fields that are set in the
other two performIndexRemoteModel methods appear to be missing from this builder
call. While this may be pre-existing, the addition of tenantId here (not present
before) suggests this was a bug fix bundled in. Verify that all required fields are
consistently set across all three model registration paths.

plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java [847-857]

+// Ensure all required fields are set consistently with other registration paths
 .modelState(MLModelState.REGISTERING)
 .modelConfig(registerModelInput.getModelConfig())
 .deploySetting(registerModelInput.getDeploySetting())
 .createdTime(now)
 .lastUpdateTime(now)
 .isHidden(registerModelInput.getIsHidden())
 .guardrails(registerModelInput.getGuardrails())
 .modelInterface(registerModelInput.getModelInterface())
 .tenantId(registerModelInput.getTenantId())
 .createdBy(registerModelInput.getCreatedBy())
 .build();
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify consistency across registration paths but the 'improved_code' is identical to the 'existing_code', providing no concrete fix. It's a valid concern to verify but doesn't constitute an actionable code change.

Low
Suggestions up to commit 6b67eec
CategorySuggestion                                                                                                                                    Impact
Possible issue
Accumulate all validation errors into single exception

The validateFields method returns a new ActionRequestValidationException instance,
but subsequent calls to addValidationError and
validateContextManagementTemplateName/validateInlineContextManagement may create
separate exception instances rather than accumulating errors into the same one. This
could result in only the last validation error being returned. Ensure all validation
errors are accumulated into a single exception instance.

common/src/main/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequest.java [56-68]

+ActionRequestValidationException exception = null;
 Map<String, FieldDescriptor> fieldsToValidate = new HashMap<>();
 fieldsToValidate.put("Agent created_by field", new FieldDescriptor(mlAgent.getCreatedBy(), false));
-ActionRequestValidationException exception = validateFields(fieldsToValidate);
+ActionRequestValidationException fieldException = validateFields(fieldsToValidate);
+if (fieldException != null) {
+    for (String error : fieldException.validationErrors()) {
+        exception = addValidationError(error, exception);
+    }
+}
 
 if (mlAgent.getContextManagementName() != null && mlAgent.getContextManagement() != null) {
     exception = addValidationError("Cannot specify both context_management_name and context_management", exception);
 }
 if (mlAgent.getContextManagementName() != null) {
     exception = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
 }
 if (mlAgent.getContextManagement() != null) {
     exception = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
 }
Suggestion importance[1-10]: 7

__

Why: This is a real bug: validateFields returns a new exception instance, and subsequent addValidationError calls pass exception (which may be null) rather than the field exception, potentially losing field validation errors. The improved code correctly merges all errors into a single exception instance, similar to the pattern used in MLUpdateConnectorRequest.

Medium
Verify version constant doesn't skip a release

The VERSION_3_7_0 constant is used as the version gate for the new created_by field
in serialization/deserialization. However, the versions jump from VERSION_3_5_0 to
VERSION_3_7_0, skipping VERSION_3_6_0. If 3.6.0 is a valid release version, nodes
running that version will incorrectly skip reading/writing the created_by field,
causing deserialization errors in mixed-version clusters. Verify whether
VERSION_3_6_0 should be defined and used instead, or confirm that 3.6.0 is not a
planned release.

common/src/main/java/org/opensearch/ml/common/CommonValue.java [107]

+public static final Version VERSION_3_6_0 = Version.fromString("3.6.0");
 public static final Version VERSION_3_7_0 = Version.fromString("3.7.0");
Suggestion importance[1-10]: 5

__

Why: The concern about skipping VERSION_3_6_0 is valid if that version is a planned release, as it could cause deserialization issues in mixed-version clusters. However, this is more of a verification/confirmation request rather than a definitive bug, and the improved code only adds a new constant without changing the existing one.

Low
General
Ensure createdBy is set in all registration paths

The createdBy field is now propagated from registerModelInput to MLModel in
performIndexRemoteModel, but it appears the pre-trained model registration path
(e.g., registerModelFromUrl) also needs to propagate createdBy to the MLModel being
indexed. Check all model registration paths to ensure createdBy is consistently set,
otherwise some registration flows will silently drop the attribution data.

plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java [644-653]

+.modelConfig(registerModelInput.getModelConfig())
+.deploySetting(registerModelInput.getDeploySetting())
+.createdTime(now)
+.lastUpdateTime(now)
+.isHidden(registerModelInput.getIsHidden())
+.guardrails(registerModelInput.getGuardrails())
+.modelInterface(registerModelInput.getModelInterface())
+.tenantId(registerModelInput.getTenantId())
+.createdBy(registerModelInput.getCreatedBy())
+.build();
 
-
Suggestion importance[1-10]: 3

__

Why: The suggestion asks to verify that createdBy is propagated in all registration paths, but looking at the PR diff, registerModelFromUrl at line 856 already includes .createdBy(registerModelInput.getCreatedBy()). The existing_code and improved_code are identical, making this a verification request with no actual code change needed.

Low
Suggestions up to commit ae4ea18
CategorySuggestion                                                                                                                                    Impact
General
Ensure version constants are sequentially consistent

The createdBy field is gated behind VERSION_3_7_0, but the version sequence jumps
from VERSION_3_5_0 to VERSION_3_7_0, skipping VERSION_3_6_0. If VERSION_3_6_0 is
ever introduced later, the ordering could cause subtle serialization bugs. Ensure
the version used for this feature matches the actual planned release version.

common/src/main/java/org/opensearch/ml/common/CommonValue.java [107]

+public static final Version VERSION_3_6_0 = Version.fromString("3.6.0");
 public static final Version VERSION_3_7_0 = Version.fromString("3.7.0");
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about skipping VERSION_3_6_0, but this is a design/planning issue rather than a code bug. The improved_code just adds a new constant without removing or changing the existing one, which doesn't directly fix the stated concern about serialization ordering.

Low
Verify intentional addition of tenantId in URL model path

The registerModelFromUrl method now correctly propagates both tenantId and
createdBy, but the two other performIndexRemoteModel overloads already had tenantId
set before this PR. Verify that the registerModelFromUrl path previously missing
tenantId was intentional or a pre-existing bug, and confirm that adding it here
doesn't break existing behavior for URL-based model registration.

plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java [847-857]

+.modelState(MLModelState.REGISTERING)
+.modelConfig(registerModelInput.getModelConfig())
+.deploySetting(registerModelInput.getDeploySetting())
+.createdTime(now)
+.lastUpdateTime(now)
+.isHidden(registerModelInput.getIsHidden())
+.guardrails(registerModelInput.getGuardrails())
+.modelInterface(registerModelInput.getModelInterface())
+.tenantId(registerModelInput.getTenantId())
+.createdBy(registerModelInput.getCreatedBy())
+.build();
 
-
Suggestion importance[1-10]: 3

__

Why: The existing_code and improved_code are identical, so no actual change is proposed. The suggestion only asks the reviewer to verify behavior, which is a valid observation about a pre-existing missing tenantId in registerModelFromUrl, but it's just a verification request.

Low
Possible issue
Ensure validation errors are properly chained

The refactored validation no longer accumulates errors from
validateContextManagementTemplateName and validateInlineContextManagement into the
same exception returned by validateFields. The result of validateFields is
overwritten by addValidationError calls, but validateContextManagementTemplateName
and validateInlineContextManagement may return a new exception object that doesn't
include the createdBy validation errors. Ensure all validation errors are properly
chained by passing the existing exception into each subsequent validation call.

common/src/main/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequest.java [56-68]

+Map<String, FieldDescriptor> fieldsToValidate = new HashMap<>();
+fieldsToValidate.put("Agent created_by field", new FieldDescriptor(mlAgent.getCreatedBy(), false));
+ActionRequestValidationException exception = validateFields(fieldsToValidate);
 
+if (mlAgent.getContextManagementName() != null && mlAgent.getContextManagement() != null) {
+    exception = addValidationError("Cannot specify both context_management_name and context_management", exception);
+}
+if (mlAgent.getContextManagementName() != null) {
+    exception = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
+}
+if (mlAgent.getContextManagement() != null) {
+    exception = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
+}
Suggestion importance[1-10]: 1

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. The suggestion text describes a concern about error chaining, but looking at the code, validateContextManagementTemplateName and validateInlineContextManagement already receive the exception parameter and chain errors properly.

Low
Suggestions up to commit a6d4005
CategorySuggestion                                                                                                                                    Impact
Possible issue
Validation regression removes required field checks

The refactored validate() method no longer validates the agent's name and type
fields, which were previously validated in the original code. This is a regression —
the agent name and type should still be validated to ensure they are not null or
empty before proceeding.

common/src/main/java/org/opensearch/ml/common/transport/agent/MLRegisterAgentRequest.java [56-68]

 Map<String, FieldDescriptor> fieldsToValidate = new HashMap<>();
+fieldsToValidate.put("Agent name", new FieldDescriptor(mlAgent.getName(), true));
+fieldsToValidate.put("Agent type", new FieldDescriptor(mlAgent.getType(), true));
 fieldsToValidate.put("Agent created_by field", new FieldDescriptor(mlAgent.getCreatedBy(), false));
 ActionRequestValidationException exception = validateFields(fieldsToValidate);
 
 if (mlAgent.getContextManagementName() != null && mlAgent.getContextManagement() != null) {
     exception = addValidationError("Cannot specify both context_management_name and context_management", exception);
 }
 if (mlAgent.getContextManagementName() != null) {
     exception = validateContextManagementTemplateName(mlAgent.getContextManagementName(), exception);
 }
 if (mlAgent.getContextManagement() != null) {
     exception = validateInlineContextManagement(mlAgent.getContextManagement(), exception);
 }
Suggestion importance[1-10]: 7

__

Why: The refactored validate() method in the PR only validates createdBy but the old code also validated other fields. However, looking at the old hunk, the original code didn't explicitly validate name or type via validateFields either — it only checked context management. The suggestion may be pointing to a real gap but the original code didn't validate name/type in this method either, so the regression claim needs verification.

Medium
General
Missing intermediate version constant definition

The VERSION_3_7_0 constant is being used as the version gate for the new created_by
field, but the versions jump from VERSION_3_5_0 to VERSION_3_7_0, skipping
VERSION_3_6_0. If 3.6.0 is a valid release version, this gap could cause issues with
backward compatibility for nodes running that version. Verify that VERSION_3_6_0 is
not needed or add it if required.

common/src/main/java/org/opensearch/ml/common/CommonValue.java [107]

+public static final Version VERSION_3_6_0 = Version.fromString("3.6.0");
 public static final Version VERSION_3_7_0 = Version.fromString("3.7.0");
Suggestion importance[1-10]: 4

__

Why: The suggestion points out a potential gap between VERSION_3_5_0 and VERSION_3_7_0, but this is a design decision that may be intentional if 3.6.0 is not a planned release. The suggestion asks to verify rather than fix a confirmed bug, and the improved code just adds a constant that may not be needed.

Low
Verify complete model metadata propagation in new code path

The registerModelFromUrl method now correctly propagates tenantId and createdBy, but
looking at the diff, tenantId was not previously set in this code path. This is a
good fix, but it should be verified that the modelGroupId and other fields like
modelName are also being set in this builder call, as they appear to be missing from
the visible diff hunk, which could result in incomplete model metadata being
indexed.

plugin/src/main/java/org/opensearch/ml/model/MLModelManager.java [847-857]

+// Verify all required fields are set in the builder, including modelName, modelGroupId, etc.
 .modelState(MLModelState.REGISTERING)
 .modelConfig(registerModelInput.getModelConfig())
 .deploySetting(registerModelInput.getDeploySetting())
 .createdTime(now)
 .lastUpdateTime(now)
 .isHidden(registerModelInput.getIsHidden())
 .guardrails(registerModelInput.getGuardrails())
 .modelInterface(registerModelInput.getModelInterface())
 .tenantId(registerModelInput.getTenantId())
 .createdBy(registerModelInput.getCreatedBy())
 .build();
Suggestion importance[1-10]: 2

__

Why: This suggestion asks to verify that other fields are set in the builder, but the improved_code is essentially the same as existing_code with just a comment added. It doesn't provide a concrete fix and the concern about missing fields is speculative without seeing the full builder chain.

Low

@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:49 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:49 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:49 — with GitHub Actions Failure
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 05:49 — with GitHub Actions Failure
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 7bf11f7

@dbwiddis dbwiddis marked this pull request as draft March 31, 2026 15:51
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit a6d4005

@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 17:18 — with GitHub Actions Failure
Adds an optional `created_by` field to MLModel and MLRegisterModelInput
to allow plugins using the ML Client API to attribute provisioned models
for adoption metrics tracking.

- Add CREATED_BY_FIELD constant and field to MLModel with toXContent,
  parse, and version-gated stream serialization (VERSION_3_7_0)
- Add CREATED_BY_FIELD constant and field to MLRegisterModelInput with
  toXContent, both parse overloads, and version-gated stream
  serialization (VERSION_3_7_0)

Signed-off-by: Daniel Widdis <widdis@gmail.com>
Add created_by to the Tags returned by getRemoteModelTags,
getPreTrainedModelTags, and getCustomModelTags. Defaults to
'unknown' when createdBy is null, consistent with MLAgent.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
Consistent with MLRegisterAgentRequest, validate created_by as an
optional safe-text field in MLCreateConnectorRequest and
MLRegisterModelRequest.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
…n and tenantId

- Centralize CREATED_BY_FIELD constant in CommonValue; remove duplicate
  definitions from MLAgent, MLModel, HttpConnector, MLCreateConnectorInput,
  and MLRegisterModelInput
- Add created_by validation to MLUpdateConnectorRequest (was missing
  alongside MLCreateConnectorRequest)
- Use human-readable labels for created_by validation across all four
  request classes
- Fix missing tenantId in registerModelFromUrl MLModel builder in
  MLModelManager

Signed-off-by: Daniel Widdis <widdis@gmail.com>
- MachineLearningNodeClientTest: add happy path tests verifying
  createdBy is passed through for registerAgent, register, and
  createConnector
- MLRegisterAgentRequestTest: add validate_ValidCreatedBy and
  validate_InvalidCreatedBy
- MLCreateConnectorRequestTests: add validateWithValidCreatedBy and
  validateWithInvalidCreatedBy
- MLRegisterModelRequestTest: add validate_ValidCreatedBy and
  validate_InvalidCreatedBy

Signed-off-by: Daniel Widdis <widdis@gmail.com>
…ror merge bug

created_by is set at creation time only and is not merged by
HttpConnector.update(), so validating it on update was misleading.

Also fix a pre-existing bug where exception = validateFields(...) would
silently discard any prior connectorId null error; now merges field
errors into the existing exception via addValidationError.

Signed-off-by: Daniel Widdis <widdis@gmail.com>
…d field invalid

Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis marked this pull request as ready for review March 31, 2026 18:47
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 18:49 — with GitHub Actions Error
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env-require-approval March 31, 2026 18:49 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval March 31, 2026 18:49 — with GitHub Actions Failure
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env-require-approval March 31, 2026 18:49 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit e85bb6f

Signed-off-by: Daniel Widdis <widdis@gmail.com>
@dbwiddis dbwiddis requested a review from akolarkunnu as a code owner May 4, 2026 21:28
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env-require-approval May 4, 2026 21:30 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env-require-approval May 4, 2026 21:30 — with GitHub Actions Inactive
@dbwiddis dbwiddis temporarily deployed to ml-commons-cicd-env-require-approval May 4, 2026 21:30 — with GitHub Actions Inactive
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval May 4, 2026 21:30 — with GitHub Actions Failure
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

Persistent review updated to latest commit 8c1a652

@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval May 4, 2026 21:54 — with GitHub Actions Error
@dbwiddis dbwiddis had a problem deploying to ml-commons-cicd-env-require-approval May 4, 2026 21:54 — with GitHub Actions Failure
Copy link
Copy Markdown
Collaborator

@pyek-bot pyek-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Solid implementation that follows existing patterns (mirrors tenantId version-gating). Well-tested, well-scoped. A few observations/questions inline.

Main open items:

  1. Connector-level metrics emission (Connector.getTags() / MLStatsJobProcessor connector collection) is called out in the issue but not implemented here. Is a follow-up tracked?

Let's track connector level metrics so that once they are added these details are also available.

  1. Are the ML indices using dynamic mapping, or does a mapping update need to accompany this for the new created_by field to be indexed properly?

Recommend to update the json mapping files to include this new field and bump the schema version there.

.addTag(IS_HIDDEN_FIELD, isHidden != null ? isHidden : false)
.addTag(AGENT_TYPE_FIELD, type != null ? type : TAG_VALUE_UNKNOWN);
.addTag(AGENT_TYPE_FIELD, type != null ? type : TAG_VALUE_UNKNOWN)
.addTag(CREATED_BY_FIELD, createdBy != null ? createdBy : TAG_VALUE_UNKNOWN);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting in docs: every existing resource (which will have createdBy == null) will now emit created_by=unknown in metrics. Depending on the ratio of legacy vs. newly-attributed resources, unknown could dominate the metric cardinality initially. This is consistent with how other tags handle null here, so not a blocker — just something to call out for metric consumers.

}
}
return exception;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the pre-existing bug. The old code was:

exception = validateFields(fieldsToValidate);

which silently discarded any prior connectorId null error. This merge-based approach is correct.

@pyek-bot
Copy link
Copy Markdown
Collaborator

pyek-bot commented May 6, 2026

Additional questions

  1. Naming concern — could created_by be confused with the authenticated user?
    The existing User / FGAC owner field already tracks who (the human/role) created a resource. Adding a separate created_by that tracks the calling client (e.g. flow-framework, opensearch-dashboards) could be confusing to end users seeing this in GET responses. They might reasonably assume created_by refers to a person. Has an alternative name been considered (e.g. created_by_client, provisioned_by, source_client) to make the distinction clearer at the API surface?

  2. What is the default value for this field?
    In getTags(), null is mapped to "unknown" for metrics purposes. But what about the stored document and API responses . It looks like it stays null in the index and only becomes "unknown" at metrics emission time — can you confirm that's the intended behavior?

Should the default be "ml" (or "ml-commons") rather than "unknown" when no caller sets the field?

The rationale being: if a resource is created directly through the ML Commons REST API (i.e., not via Flow Framework or Dashboards), then the "client" is ML Commons itself. Defaulting to "unknown" implies the provenance is unclear, when in reality it's just the standard ML API path. A default like "ml" would make the metrics more meaningful — you'd be able to distinguish "created via ML API directly" vs. "created by flow-framework" vs. truly unknown (legacy resources migrated from before this field existed).

@pyek-bot pyek-bot self-assigned this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] created_by Provenance Tag Support in ML Commons

2 participants