Skip to content

fix: suppress SMOTE for multiclass targets (issue #36)#115

Merged
kimusaku merged 2 commits intomainfrom
fix/smote-multiclass-issue-36
Mar 20, 2026
Merged

fix: suppress SMOTE for multiclass targets (issue #36)#115
kimusaku merged 2 commits intomainfrom
fix/smote-multiclass-issue-36

Conversation

@kimusaku
Copy link
Copy Markdown
Contributor

Summary

Fixes #36 — SMOTE was being recommended for multiclass classification targets.

Root Cause

_get_target_imbalance_score() in meta_features.py correctly returns 0 when a target has more than 2 classes, but the imbalance score is computed only on the training split. When a rare third class is absent from that split, the method sees only 2 classes and returns a high score, causing SMOTE to be emitted even though the overall task is multiclass (task.is_multiclass = True).

The guard in template_based_adaptation.py (line 307) only blocked multi-target columns (len(task.target_columns) > 1) and did not check task.is_multiclass.

Fix

sapientml_core/adaptation/generation/template_based_adaptation.py

Extend the SMOTE guard to also skip SMOTE when task.is_multiclass is True:

# Before
if "PREPROCESS:Balancing:SMOTE:imblearn" == component.label_name and len(self.task.target_columns) > 1:
    continue

# After
if "PREPROCESS:Balancing:SMOTE:imblearn" == component.label_name and (
    len(self.task.target_columns) > 1 or self.task.is_multiclass
):
    continue

Tests

Added test_smote_not_recommended_for_multiclass to tests/sapientml/test_generatedcode_additional_patterns.py.

The test uses the target_category_binary_imbalance column (imbalance score ≈ 0.913, which would normally trigger SMOTE) but forces task.is_multiclass = True to simulate the edge case where the full dataset has a 3rd rare class absent from the training split. It asserts that "SMOTE" does not appear in the generated code.

Existing binary SMOTE test (test_additional_classifier_works_with_preprocess) continues to pass, confirming no regression for the binary case.

@kimusaku kimusaku requested a review from a team as a code owner March 19, 2026 05:57
@kimusaku kimusaku requested review from AkiraUra and fukuta-flab and removed request for a team March 19, 2026 05:57
The SMOTE guard in template_based_adaptation.py only blocked multi-target
cases (len(target_columns) > 1) but allowed SMOTE through for multiclass
tasks (task.is_multiclass=True). SMOTE is only valid for binary
classification, so extend the guard to also skip it when the task is
multiclass.

Also add test_smote_not_recommended_for_multiclass to cover the edge case
where a binary-imbalanced training split would trigger SMOTE but the full
dataset has more than 2 classes (is_multiclass=True).

Co-authored-by: openhands <openhands@all-hands.dev>
Signed-off-by: openhands <openhands@all-hands.dev>
@kimusaku kimusaku force-pushed the fix/smote-multiclass-issue-36 branch from 3063404 to c876ea3 Compare March 19, 2026 06:44
@kimusaku kimusaku merged commit 21c828e into main Mar 20, 2026
57 checks passed
@kimusaku kimusaku deleted the fix/smote-multiclass-issue-36 branch March 20, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

For multi-targets, SMOTE is not recommended even if there is a bias

2 participants