Skip to content

[PNE-7018] Add new evaluations API and deprecate PEWs.#997

Merged
anoto-moniz merged 3 commits intomainfrom
feature/pne-7018-deprecate-pews
Sep 22, 2025
Merged

[PNE-7018] Add new evaluations API and deprecate PEWs.#997
anoto-moniz merged 3 commits intomainfrom
feature/pne-7018-deprecate-pews

Conversation

@anoto-moniz
Copy link
Copy Markdown
Collaborator

@anoto-moniz anoto-moniz commented Jul 16, 2025

PEWs no longer need to be a customer facing asset type. Their only function is to allow reusing the same set of evaluators with different predictors, which A) doesn't often work due to mismatched responses, and B) isn't particularly helpful. As such, we can move users on to a cleaner, easier to use workflow, where predictor evaluation is kicked off directly against predictor, and you can start one using the default evaluators with a single call.

The PredictorEvaluation API and associated objects are similar to the existing PEW and PEW execution APIs/objects, but are stand-alone. This ensures we can easily make a clean break when the deprecation cycle completes, and also provides an opportunity to ensure the interfaces are as clean and simple as possible.

This also attaches a deprecation warning to the PEW and PEW Execution collections and their associated operations. The associated response objects (PredictorEvaluationWorkflow and PredictorEvaluationExecution) are also deprecated, but as they're only produced by the collections, their docstring notes their deprecation rather than an emitted message, to avoid inundating the user.

PR Type:

  • Breaking change (fix or feature that would cause existing functionality to change)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Maintenance (non-breaking change to assist developers)

Adherence to team decisions

  • I have added tests for 100% coverage
  • I have written Numpy-style docstrings for every method and class.
  • I have communicated the downstream consequences of the PR to others.
  • I have bumped the version in __version__.py

@anoto-moniz anoto-moniz force-pushed the feature/pne-7018-deprecate-pews branch 3 times, most recently from 19220c1 to d074bba Compare September 9, 2025 18:18
@anoto-moniz anoto-moniz changed the title [PNE-7018] Deprecate PEWs. [PNE-7018] [PNE-7018] Add new evaluations API and deprecate PEWs. Sep 9, 2025
@anoto-moniz anoto-moniz changed the title [PNE-7018] [PNE-7018] Add new evaluations API and deprecate PEWs. [PNE-7018] Add new evaluations API and deprecate PEWs. Sep 9, 2025

class PredictorEvaluationExecution(Resource['PredictorEvaluationExecution'], Execution):
"""The execution of a PredictorEvaluationWorkflow.
"""[DEPRECATED] The execution of a PredictorEvaluationWorkflow.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the PredictorEvaluationExecutionCollection is already going to emit a deprecation warning on any operation, and it's the only way a PredictorEvaluationExecution is created, it feels like overkill to decorate this class with @deprecated as well. Hence the docstring update without the decorator.

class PredictorEvaluationWorkflow(Resource['PredictorEvaluationWorkflow'],
Workflow, AIResourceMetadata):
"""A workflow that evaluations a predictor.
"""[DEPRECATED] A workflow that evaluates a predictor.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PredictorEvaluationExecution.


def list_all(self, *, per_page: int = 20) -> Iterable[DesignSpace]:
"""List the most recent version of all design spaces."""
"""List all design spaces."""
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just fixing some unclear wording that slipped by for a while.

predictor_id=predictor_id,
predictor_version=predictor_version)

def list(self,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our pattern has generally been that list returns all non-archived assets, and we use more specific names for other filters.

def default(self,
*,
predictor_id: Union[UUID, str],
predictor_version: Union[int, str] = LATEST_PRED_VER
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The predictor version defaults to latest for consistency with trigger, which requires a trained predictor.

PEWs no longer need to be a customer facing asset type. Their only
function is to allow reusing the same set of evaluators with different
predictors, which A) doesn't often work due to mismatched responses,
and B) isn't particularly helpful. As such, we can move users on to a
cleaner, easier to use workflow, where predictor evaluation is kicked
off directly against predictor, and you can start one using the default
evaluators with a single call.

The PredictorEvaluation API and associated objects are similar to the
existing PEW and PEW execution APIs/objects, but are stand-alone. This
ensures we can easily make a clean break when the deprecation cycle
completes, and also provides an opportunity to ensure the interfaces are
as clean and simple as possible.
Attach a deprecation warning to the PEW and PEW Execution collections
and their associated operations. The associated response objects
(PredictorEvaluationWorkflow and PredictorEvaluationExecution) are also
deprecated, but as they're only produced by the collections, their
docstring notes their deprecation rather than an emitted message, to
avoid inundating the user.
@anoto-moniz anoto-moniz force-pushed the feature/pne-7018-deprecate-pews branch from d074bba to 70dde8f Compare September 12, 2025 19:37
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated whether to change this filename to predictor_evaluations.rst, but left it for now to avoid any broken links coming into it.

@anoto-moniz anoto-moniz marked this pull request as ready for review September 12, 2025 19:42
@anoto-moniz anoto-moniz requested a review from a team as a code owner September 12, 2025 19:42
Copy link
Copy Markdown
Collaborator

@kroenlein kroenlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused about one thing, but nothing blocking here. Good work.

evaluation_id=str(self.uid)
)

@lru_cache()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this exceptionally slow or called a large number of times? Seems like maybe an unnecessary optimization.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair question, but I'm not sure. This is just copied from the PredictorEvaluationExecution class, so I decided to keep it as is.

Copy link
Copy Markdown
Collaborator

@jspeerless jspeerless left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all looks great! Added a couple points about documentation, but one issue has been around for a while.

Workflows employ reusable modules in order to execute.
There are three different types of modules, and these are discussed in greater detail below.
Currently, there are two workflows on the AI Engine: the :doc:`DesignWorkflow <design_workflows>` and the :doc:`PredictorEvaluation <predictor_evaluation_workflows>`.
There are two different types of modules, and these are discussed in greater detail below.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to re-think this page at a later date - introducing the idea of workflows seems odd given that there really is only 1. But, given this is the deprecation PR and we still have PEW collateral in the "workflows" directory under informatics, I think we should take a look at re-factoring this part of the documentation for v4.0

result = self.session.post_resource(path, json=payload, version=self._api_version)
return PredictorEvaluatorsResponse.build(result).evaluators

def default(self,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any more documentation on this method elsewhere? I think we have methods for inferring the ignore_when_grouping field from the model config that are not 100% clear to me. For example, we implemented this change where we populate Latent Variables automatically.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, if we can confirm the behavior, we should document it here since users don't have access to our internal documentation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I'm aware of. It sounds like something we should pull together for the 4.0 release, though. Especially since ignore_when_grouping gets so many questions.

@anoto-moniz anoto-moniz merged commit 9a3d89d into main Sep 22, 2025
45 checks passed
@anoto-moniz anoto-moniz deleted the feature/pne-7018-deprecate-pews branch September 22, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants