Skip to content

Add recommendation score#4156

Merged
chukarsten merged 34 commits intomainfrom
4149_recommendation_score
May 9, 2023
Merged

Add recommendation score#4156
chukarsten merged 34 commits intomainfrom
4149_recommendation_score

Conversation

@eccabay
Copy link
Copy Markdown
Contributor

@eccabay eccabay commented Apr 21, 2023

Closes #4149

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 21, 2023

Codecov Report

Merging #4156 (9b67d47) into main (b530abd) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4156     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        349     349             
  Lines      37809   38094    +285     
=======================================
+ Hits       37692   37977    +285     
  Misses       117     117             
Impacted Files Coverage Δ
evalml/objectives/__init__.py 100.0% <ø> (ø)
evalml/automl/automl_search.py 99.6% <100.0%> (+0.1%) ⬆️
evalml/objectives/standard_metrics.py 100.0% <100.0%> (ø)
evalml/objectives/utils.py 100.0% <100.0%> (ø)
evalml/tests/automl_tests/test_automl.py 99.6% <100.0%> (+0.1%) ⬆️
evalml/tests/conftest.py 98.3% <100.0%> (ø)
evalml/tests/objective_tests/test_objectives.py 100.0% <100.0%> (ø)

@eccabay eccabay marked this pull request as ready for review April 24, 2023 16:36
self.data_splitter = data_splitter
self.optimize_thresholds = optimize_thresholds
self.ensembling = ensembling
if objective == "auto":
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section was not deleted, simply moved down lower to be part of the other objective handling. It needed to be lower rather than moving the new handling up here because the new recommendation_objective logic requires several other validation checks to have already happened and self.X_train and self.y_train to be set.

Copy link
Copy Markdown
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - great work! Just have some nits and you may need to take a look at the docs for the new rankings!

]

if problem_type == ProblemTypes.MULTICLASS and imbalanced:
objective_list.remove(objectives.AUCMicro.name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would be more clear to define lists for every problem type instead of encoding it in logic? I'm not too sure about this but just a thought.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I went back and forth on this a lot. I figured this was more concise, but it may also be more unclear. I'm happy to change this around if need be!

")\n",
"automl_recommendation.search(interactive_plot=False)\n",
"\n",
"automl_recommendation.rankings"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spacing is off here for some reason
Screenshot 2023-04-25 at 11 14 53 AM

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a weird one. It doesn't happen when running the jupyter notebooks directly, and it looks like the same issue comes up with other rankings dataframes as well - this one looks the worst since it has the most columns. I think I'll drop some unnecessary columns so that they aren't so squished, but I'm also open to other suggestions, as Google was unhelpful here.

Comment on lines +367 to +368
if prioritized_objective is not None:
if prioritized_objective not in objectives:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prioritized_objective allows us to give more (or I guess less) weight to a single objective. Is it worth designing this function such that it accepts instead an objectives_weighting (dict[str,float]): Mapping of objectives to their relative weighting to compose the recommendation score ?

I was kind of thinking that there would existing presets that would function as data scientist selected "blends" of metrics based on what our user was looking for. This would require weighting of multiple metrics at the same time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original design - amended after discussion here

assert "already one of the default objectives" in caplog.text


def test_recommendation_include_non_optimization(X_y_binary):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this test about?

Copy link
Copy Markdown
Contributor Author

@eccabay eccabay Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get the additional (not main objective) scores through the additional_objectives mechanism. By default, this calculates the objectives for all of the optimization metrics, excluding any that are classified as ranking only (and we have a check against it). However, we want to make sure users have full control over which objectives are included in the recommendation score - including ones that are ranking only (for example, recall). This test ensures that the logic permitting this works



@pytest.mark.parametrize("imbalanced_data", [True, False])
def test_use_recommendation_score_imbalanced(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the code...why are we treating imbalanced data differently?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was discussed in the design doc - it's exclusively used in the multiclass classification case, when selecting whether to use AUC Micro or AUC weighted, because micro averages are better for imbalanced datasets.

Comment on lines +1704 to +1705
ranking_column = "ranking_score"
if self.use_recommendation:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit to modifying the API of full_rankings() to have the ranking column be selectable via the function call rather than the AutoML init setting? Probably not, I guess. Wouldn't want the user to be able to select "recommendation_score" with self.use_recommendation not having been set...

@eccabay eccabay requested a review from chukarsten May 1, 2023 12:37
Comment on lines +358 to +366
prioritized_objective (str): An optional name of a priority objective that should be given heavier weight
than the other objectives contributing to the score. Defaults to None, where all objectives are
weighted equally.
prioritized_weight (float): The weight (maximum of 1) to attribute to the prioritized objective, if it exists.
Should be between 0 and 1. Defaults to 0.5.
custom_weights (dict[str,float]): A dictionary mapping objective names to corresponding weights between 0 and 1.
If all objectives are listed, should add up to 1. If a subset of objectives are listed, should add up to less
than 1, and remaining weight will be evenly distributed between the remaining objectives. Should not be used
at the same time as prioritized_objective.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my fault for doing this piecemeal - I blame being distracted - but perhaps we should consider an API where these two are put together and that the input is a dict of str:float. If there is one value, then perhaps that should be interpreted as the prioritized weight/obj.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've discussed previously, I'm hesitant to completely remove the prioritized objective/weight API. I think the benefit of the simplicity outweighs the slightly larger API. That being said, after thinking about it more I would be open to removing the prioritized_weight argument, and just keeping prioritized_objective. That allows users to have an easy way to say "I care about this one more", and if they want to be more specific about the weights, there is still custom_weights to open up that opportunity. Thoughts?

@chukarsten chukarsten merged commit d777c6c into main May 9, 2023
@chukarsten chukarsten deleted the 4149_recommendation_score branch May 9, 2023 17:48
@chukarsten chukarsten mentioned this pull request May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement .recommendation_score() for Pipelines

3 participants