Conversation
Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR re-enables previously rejected “list → scalar” (and related) lossy conversions in the experimental converter stack to support workflows like re-importing VOC-style exports (where label is stored with is_list=True) into classification schemas expecting scalar labels.
Changes:
- Update
NumericFieldShapeConverter,BoolFieldShapeConverter,StringFieldShapeConverter, andLabelShapeConverterto allow lossy reductions by taking the first element and emitting a warning instead of raisingConversionError. - Add/adjust unit tests to assert lossy conversions succeed and warnings are logged.
- Refactor
LabelShapeConverterinternals into helper methods for multi-label andis_listconversions.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/datumaro/experimental/converters/type_converters.py |
Allows list→scalar reductions for numeric/bool/string fields via “first element” selection with warnings. |
src/datumaro/experimental/converters/annotation_converters.py |
Allows lossy LabelShapeConverter reductions (multi-label and/or is_list) with warnings; adds helper methods. |
tests/unit/experimental/converters/test_type_converters.py |
Updates tests to expect successful lossy list→scalar conversions and warning logs. |
tests/unit/experimental/converters/test_annotation_converters.py |
Updates label shape tests to validate lossy reductions’ outputs/dtypes and warning logs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if input_is_list: | ||
| log.warning( | ||
| "Converting list to non-list for field '%s': keeping the first element only. Data may be lost.", | ||
| self.input_label.name, |
There was a problem hiding this comment.
In _convert_is_list, the warning logs self.input_label.name rather than the column actually being reduced (src_col / step2_col). When multi_label conversion has already written into output_col and step 2 reads from that, the warning can reference the wrong field/column name, which is confusing for users debugging data loss. Consider logging src_col (or the original input_col passed into convert) for consistency with _convert_multi_label.
| self.input_label.name, | |
| src_col, |
Allow label list conversions once again. This is necessary for when classification datasets are exported in a certain dataset format, for example VOC, which has is_list=True for label field. To be able to reimport this VOC dataset as a classification dataset where the label_field has is_list=False, this conversion must be allowed.
Checklist