Implement conversion to the experimental Dataset format.#1810
Merged
Conversation
* Also implements __len__ and __iter__in the Dataset class. * Fix bug when fetching ann_types() before the cache is initialised.
AlbertvanHouten
approved these changes
Aug 6, 2025
Comment on lines
+358
to
+365
| # Add third sample | ||
| sample3 = TestSample( | ||
| image=np.array([[[128, 64, 192]], [[96, 160, 32]]], dtype=np.uint8), | ||
| bbox=np.array([[0.9, 0.8, 0.7, 0.6]], dtype=np.float32), | ||
| image_info=ImageInfo(width=1, height=2), | ||
| ) | ||
| dataset.append(sample3) | ||
| assert len(dataset) == 3 |
Contributor
There was a problem hiding this comment.
Adding this third sample seems redundant after having already tested two appends. It would make more sense to remove one here and test if the len still works properly.
Contributor
Author
There was a problem hiding this comment.
Done, I have also implemented __delitem__ for that.
AlbertvanHouten
approved these changes
Aug 7, 2025
6 tasks
gdlg
added a commit
that referenced
this pull request
Aug 7, 2025
The approach is similar to #1810. The conversion works in two steps: the first step analyse the existing dataset and generate the media type, annotation type and categories for the new dataset. The second step actually converts the data. I have defined BackwardMediaConverter and BackwardAnnotationConverter base class which can be extended to support new media/annotation types. This PR implements the conversion logic but the conversion for specific media/annotation type will be implemented later. Follow-up from #1810. Fixes #1789 <!-- Contributing guide: https://github.com/open-edge-platform/datumaro/blob/develop/CONTRIBUTING.md --> ### Summary <!-- Resolves #111 and #222. Depends on #1000 (for series of dependent commits). This PR introduces this capability to make the project better in this and that. - Added this feature - Removed that feature - Fixed the problem #1234 --> ### How to test <!-- Describe the testing procedure for reviewers, if changes are not fully covered by unit tests or manual testing can be complicated. --> ### Checklist <!-- Put an 'x' in all the boxes that apply --> - [x] I have added unit tests to cover my changes. - [x] I have added integration tests to cover my changes. - [x] I have added the description of my changes into [CHANGELOG](https://github.com/open-edge-platform/datumaro/blob/develop/CHANGELOG.md). - [ ] I have updated the [documentation](https://github.com/open-edge-platform/datumaro/tree/develop/docs) accordingly ### License - [ ] I submit _my code changes_ under the same [MIT License](https://github.com/open-edge-platform/datumaro/blob/develop/LICENSE) that covers the project. Feel free to contact the maintainers if that's a concern. - [ ] I have updated the license header for each file (see an example below). ```python # Copyright (C) 2025 Intel Corporation # # SPDX-License-Identifier: MIT ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements the conversion from the legacy to the experimental Dataset class. I will implement the conversion back to the legacy class in a separate PR.
Misc fixes:
__len__,__delitem__and__iter__in the Dataset class.The conversion works in two steps: the first step analyse the existing dataset and generate the schema for the new dataset. The second step actually converts the data.
I have defined MediaConverter and AnnotationConverter base class which can be extended to support new media/annotation types. This PR implements the conversion logic but the conversion for specific media/annotation type will be implemented later.
Part of #1789
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.