Add created_by field to ML resources for adoption metrics attribution#4754
Add created_by field to ML resources for adoption metrics attribution#4754dbwiddis wants to merge 10 commits into
Conversation
PR Reviewer Guide 🔍(Review updated until commit 8c1a652)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to 8c1a652 Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit e85bb6f
Suggestions up to commit 31fb021
Suggestions up to commit 6b67eec
Suggestions up to commit ae4ea18
Suggestions up to commit a6d4005
|
|
Persistent review updated to latest commit 7bf11f7 |
|
Persistent review updated to latest commit a6d4005 |
Adds an optional `created_by` field to MLModel and MLRegisterModelInput to allow plugins using the ML Client API to attribute provisioned models for adoption metrics tracking. - Add CREATED_BY_FIELD constant and field to MLModel with toXContent, parse, and version-gated stream serialization (VERSION_3_7_0) - Add CREATED_BY_FIELD constant and field to MLRegisterModelInput with toXContent, both parse overloads, and version-gated stream serialization (VERSION_3_7_0) Signed-off-by: Daniel Widdis <widdis@gmail.com>
Add created_by to the Tags returned by getRemoteModelTags, getPreTrainedModelTags, and getCustomModelTags. Defaults to 'unknown' when createdBy is null, consistent with MLAgent. Signed-off-by: Daniel Widdis <widdis@gmail.com>
Consistent with MLRegisterAgentRequest, validate created_by as an optional safe-text field in MLCreateConnectorRequest and MLRegisterModelRequest. Signed-off-by: Daniel Widdis <widdis@gmail.com>
…n and tenantId - Centralize CREATED_BY_FIELD constant in CommonValue; remove duplicate definitions from MLAgent, MLModel, HttpConnector, MLCreateConnectorInput, and MLRegisterModelInput - Add created_by validation to MLUpdateConnectorRequest (was missing alongside MLCreateConnectorRequest) - Use human-readable labels for created_by validation across all four request classes - Fix missing tenantId in registerModelFromUrl MLModel builder in MLModelManager Signed-off-by: Daniel Widdis <widdis@gmail.com>
- MachineLearningNodeClientTest: add happy path tests verifying createdBy is passed through for registerAgent, register, and createConnector - MLRegisterAgentRequestTest: add validate_ValidCreatedBy and validate_InvalidCreatedBy - MLCreateConnectorRequestTests: add validateWithValidCreatedBy and validateWithInvalidCreatedBy - MLRegisterModelRequestTest: add validate_ValidCreatedBy and validate_InvalidCreatedBy Signed-off-by: Daniel Widdis <widdis@gmail.com>
…ror merge bug created_by is set at creation time only and is not merged by HttpConnector.update(), so validating it on update was misleading. Also fix a pre-existing bug where exception = validateFields(...) would silently discard any prior connectorId null error; now merges field errors into the existing exception via addValidationError. Signed-off-by: Daniel Widdis <widdis@gmail.com>
…d field invalid Signed-off-by: Daniel Widdis <widdis@gmail.com>
|
Persistent review updated to latest commit e85bb6f |
Signed-off-by: Daniel Widdis <widdis@gmail.com>
|
Persistent review updated to latest commit 8c1a652 |
There was a problem hiding this comment.
Review Summary
Solid implementation that follows existing patterns (mirrors tenantId version-gating). Well-tested, well-scoped. A few observations/questions inline.
Main open items:
- Connector-level metrics emission (
Connector.getTags()/MLStatsJobProcessorconnector collection) is called out in the issue but not implemented here. Is a follow-up tracked?
Let's track connector level metrics so that once they are added these details are also available.
- Are the ML indices using dynamic mapping, or does a mapping update need to accompany this for the new
created_byfield to be indexed properly?
Recommend to update the json mapping files to include this new field and bump the schema version there.
| .addTag(IS_HIDDEN_FIELD, isHidden != null ? isHidden : false) | ||
| .addTag(AGENT_TYPE_FIELD, type != null ? type : TAG_VALUE_UNKNOWN); | ||
| .addTag(AGENT_TYPE_FIELD, type != null ? type : TAG_VALUE_UNKNOWN) | ||
| .addTag(CREATED_BY_FIELD, createdBy != null ? createdBy : TAG_VALUE_UNKNOWN); |
There was a problem hiding this comment.
Worth noting in docs: every existing resource (which will have createdBy == null) will now emit created_by=unknown in metrics. Depending on the ratio of legacy vs. newly-attributed resources, unknown could dominate the metric cardinality initially. This is consistent with how other tags handle null here, so not a blocker — just something to call out for metric consumers.
| } | ||
| } | ||
| return exception; | ||
| } |
There was a problem hiding this comment.
Good catch on the pre-existing bug. The old code was:
exception = validateFields(fieldsToValidate);which silently discarded any prior connectorId null error. This merge-based approach is correct.
Additional questions
Should the default be The rationale being: if a resource is created directly through the ML Commons REST API (i.e., not via Flow Framework or Dashboards), then the "client" is ML Commons itself. Defaulting to |
Description
Adds a
created_byfield to ML agents, connectors, and models to enableattribution of adoption metrics by the client that created the resource
(e.g.
flow-framework,opensearch-dashboards).This is distinct from the authenticated user (
User/FGAC) —created_byidentifies the calling client, which may differ from the user when a
workflow engine like Flow Framework creates resources on behalf of a user.
Data model
MLAgent: newcreated_byfield, version-gated serialization atVERSION_3_7_0, emitted ingetTags()(defaults to"unknown")Connector/AbstractConnector/HttpConnector:new
created_byfield, version-gated serialization atVERSION_3_7_0MLCreateConnectorInput: newcreated_byfield, flows through toconnector via XContent round-trip in
TransportCreateConnectorActionMLModel: newcreated_byfield, version-gated serialization atVERSION_3_7_0, emitted in all threegetTags()methods(
getRemoteModelTags,getPreTrainedModelTags,getCustomModelTags),defaults to
"unknown"MLRegisterModelInput: newcreated_byfield, wired through toMLModelinMLModelManager(all three builder sites, includingthe previously missing
tenantIdfix inregisterModelFromUrl)Validation
MLRegisterAgentRequest: validatescreated_byas optional safe-textfield; restructured
validate()to seed exception fromvalidateFieldsand eliminate manual error-merging loop
MLCreateConnectorRequest: validatescreated_byas optional safe-text fieldMLRegisterModelRequest: validatescreated_byas optional safe-text fieldMLUpdateConnectorRequest:created_byis intentionally not validatedhere — it is set at creation time only and is not merged by
HttpConnector.update(); also fixed a pre-existing bug wherevalidateFieldsresult overwrote any priorconnectorIdnull errorMetrics
MLStatsJobProcessorrequires no changes —created_byflows throughautomatically via
model.getTags()andagent.getTags()Versioning
CommonValue: addedVERSION_3_7_0constant;VERSION_3_6_0wasconsidered but removed as unused in production code — tests use the
existing
VERSION_3_5_0constant to represent a pre-created_bynodedeserialize
created_byasnullCREATED_BY_FIELDconstant centralized inCommonValue; duplicatelocal definitions removed from
MLAgent,MLModel,HttpConnector,MLCreateConnectorInput, andMLRegisterModelInputTesting
VERSION_3_7_0and
VERSION_3_5_0), XContent parse/write, andgetTags()emissionfor all three resource types
createdBytests added toMachineLearningNodeClientTestverifying the field is passed through for
registerAgent,register(Model),and
createConnectorMLRegisterAgentRequestTest,MLCreateConnectorRequestTests, andMLRegisterModelRequestTestRelated Issues
Resolves #4752
Check List
--signoff.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.