Skip to content

Allow label list conversions#2094

Open
AlbertvanHouten wants to merge 1 commit intodevelopfrom
albert/label-list-conversion
Open

Allow label list conversions#2094
AlbertvanHouten wants to merge 1 commit intodevelopfrom
albert/label-list-conversion

Conversation

@AlbertvanHouten
Copy link
Copy Markdown
Contributor

@AlbertvanHouten AlbertvanHouten commented Apr 2, 2026

Allow label list conversions once again. This is necessary for when classification datasets are exported in a certain dataset format, for example VOC, which has is_list=True for label field. To be able to reimport this VOC dataset as a classification dataset where the label_field has is_list=False, this conversion must be allowed.

Checklist

  • I have added tests to cover my changes or documented any manual tests.
  • I have updated the documentation accordingly

Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@AlbertvanHouten AlbertvanHouten marked this pull request as ready for review April 2, 2026 13:24
@AlbertvanHouten AlbertvanHouten requested a review from a team as a code owner April 2, 2026 13:24
Copilot AI review requested due to automatic review settings April 2, 2026 13:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-enables previously rejected “list → scalar” (and related) lossy conversions in the experimental converter stack to support workflows like re-importing VOC-style exports (where label is stored with is_list=True) into classification schemas expecting scalar labels.

Changes:

  • Update NumericFieldShapeConverter, BoolFieldShapeConverter, StringFieldShapeConverter, and LabelShapeConverter to allow lossy reductions by taking the first element and emitting a warning instead of raising ConversionError.
  • Add/adjust unit tests to assert lossy conversions succeed and warnings are logged.
  • Refactor LabelShapeConverter internals into helper methods for multi-label and is_list conversions.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/datumaro/experimental/converters/type_converters.py Allows list→scalar reductions for numeric/bool/string fields via “first element” selection with warnings.
src/datumaro/experimental/converters/annotation_converters.py Allows lossy LabelShapeConverter reductions (multi-label and/or is_list) with warnings; adds helper methods.
tests/unit/experimental/converters/test_type_converters.py Updates tests to expect successful lossy list→scalar conversions and warning logs.
tests/unit/experimental/converters/test_annotation_converters.py Updates label shape tests to validate lossy reductions’ outputs/dtypes and warning logs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if input_is_list:
log.warning(
"Converting list to non-list for field '%s': keeping the first element only. Data may be lost.",
self.input_label.name,
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _convert_is_list, the warning logs self.input_label.name rather than the column actually being reduced (src_col / step2_col). When multi_label conversion has already written into output_col and step 2 reads from that, the warning can reference the wrong field/column name, which is confusing for users debugging data loss. Consider logging src_col (or the original input_col passed into convert) for consistency with _convert_multi_label.

Suggested change
self.input_label.name,
src_col,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants