Add new argument for limiting the maximum epsilon by prodrigues-tdx · Pull Request #529 · scikit-learn-contrib/hdbscan

prodrigues-tdx · 2022-02-22T13:14:48Z

This PR aims to introduce to HDBSCAN an argument for a max threshold to the epsilon used when picking the best clusters. With this PR we allow for this new argument, cluster_selection_epsilon_max, to be used in the EOM search method.

This is very useful for cases where you know from the get go that your samples should not be very far from each other, because you have some domain knowledge.

For this implementation, we use cluster_selection_epsilon_max in a very similar way to max_cluster_size. This way the clusters with an epsilon bigger than cluster_selection_epsilon_max can still appear if there are no valid clusters bellow that epsilon. This is, in fact, the exact same behavior as max_cluster_size.

lmcinnes · 2022-04-26T14:53:23Z

Sorry for taking so long to get to this. It looks like a useful addition. Any chance you could add a test to the test suite to check that it works as intended?

prodrigues-tdx · 2022-05-09T22:09:45Z

Sorry for taking so long to get to this. It looks like a useful addition. Any chance you could add a test to the test suite to check that it works as intended?

I totally missed your comment:s I'll do that yes.

# Conflicts: # hdbscan/hdbscan_.py

… cluster_label_map

joaopmatias · 2024-10-13T18:41:06Z

Hi @lmcinnes! :D It has been a while since the last update in this PR.

Could you take another look?

Thanks!

joaopmatias · 2024-10-27T15:50:56Z

Gentle reminder to revisit this PR @lmcinnes
Thanks!

lmcinnes · 2024-10-27T17:09:28Z

Thanks.

added cluster_selection_epsilon_max arg

7998ea8

prodrigues-tdx force-pushed the master branch from d65fc26 to 7998ea8 Compare March 2, 2022 00:12

joaopmatias added 9 commits June 18, 2024 23:25

Merge remote-tracking branch 'origin/master' into maxepsilon

e77fb26

Merge remote-tracking branch 'origin/master' into maxepsilon

81fea36

# Conflicts: # hdbscan/hdbscan_.py

add assert for cluster_selection_epsilon_max. tests fail.

b6b8e79

add validation for cluster_selection_epsilon_max

6bd3b72

Change cluster_selection_max logic a bit. Prevent KeyError evaluating…

7813b90

… cluster_label_map

add some tests

757d4d5

add test checking that no errors are thrown

df777f7

update epsilon of root node

7a7fba6

Merge branch 'master' into feat_cluster_selection_epsilon_max

86c33f2

one comment

c101732

lmcinnes merged commit 5dab8e3 into scikit-learn-contrib:master Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new argument for limiting the maximum epsilon#529

Add new argument for limiting the maximum epsilon#529
lmcinnes merged 11 commits into
scikit-learn-contrib:masterfrom
prodrigues-tdx:master

prodrigues-tdx commented Feb 22, 2022 •

edited

Loading

Uh oh!

lmcinnes commented Apr 26, 2022

Uh oh!

prodrigues-tdx commented May 9, 2022

Uh oh!

joaopmatias commented Oct 13, 2024

Uh oh!

joaopmatias commented Oct 27, 2024

Uh oh!

lmcinnes commented Oct 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

prodrigues-tdx commented Feb 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmcinnes commented Apr 26, 2022

Uh oh!

prodrigues-tdx commented May 9, 2022

Uh oh!

joaopmatias commented Oct 13, 2024

Uh oh!

joaopmatias commented Oct 27, 2024

Uh oh!

lmcinnes commented Oct 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prodrigues-tdx commented Feb 22, 2022 •

edited

Loading