Add support for importing legacy datumaro datasets with videos#2093
Add support for importing legacy datumaro datasets with videos#2093AlbertvanHouten wants to merge 2 commits intodevelopfrom
Conversation
Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR adds support for importing legacy Datumaro datasets containing videos (via VideoFrame) into the experimental dataset format, including mixed image+video datasets, and extends the media converter bridge to support unified media-path workflows.
Changes:
- Added forward/backward legacy media converters for
VideoFrameand mixed (image + video frame) datasets, and updated forward-converter selection strategy accordingly. - Introduced converter utilities to promote
VideoFramePathField→MediaPathFieldand to extractMediaInfoFieldfromMediaPathField. - Added extensive unit tests covering converter behavior, dataset analysis, and round-trip conversions for video/mixed datasets.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/experimental/legacy/test_legacy_media_converters.py | Adds unit tests for new legacy video + mixed media converters. |
| tests/unit/experimental/legacy/test_legacy_dataset_converters.py | Adds tests for forward media converter strategy, analysis, convert_from/to_legacy, and round-trips for video/mixed datasets. |
| tests/unit/experimental/converters/test_media_bridge_converters.py | Adds tests for new media bridge converters (VideoFramePath→MediaPath, MediaPath→MediaInfo). |
| src/datumaro/experimental/legacy/register_legacy_converters.py | Changes forward media converter selection strategy; registers new backward converters. |
| src/datumaro/experimental/legacy/media_converters.py | Implements forward/backward converters for VideoFrame and mixed datasets. |
| src/datumaro/experimental/legacy/init.py | Exports newly added legacy media converters. |
| src/datumaro/experimental/converters/media_converters.py | Adds VideoFramePathToMediaPathConverter and MediaPathToMediaInfoConverter. |
| src/datumaro/experimental/converters/init.py | Exposes newly added experimental converters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Strategy: | ||
| - If the dataset contains *any* video frames (mixed or video-only), | ||
| the :class:`ForwardMixedMediaConverter` is used with a unified | ||
| ``media_path_field``. | ||
| - If the dataset contains *only* images (no video frames at all), | ||
| the :class:`ForwardImageMediaConverter` is used with an | ||
| ``image_path_field``. | ||
|
|
||
| Args: | ||
| dataset: Legacy dataset to create converter from | ||
| semantic: The semantic type for the converted fields | ||
| name_prefix: Prefix to prepend to all field names | ||
| """ | ||
| # Get the dataset's media type | ||
| media_type = cast("type[MediaElement[Any]]", dataset.media_type()) | ||
|
|
||
| # Try converter registered for this specific media type | ||
| if media_type in _media_converter_classes: | ||
| converter_class = _media_converter_classes[media_type] | ||
| return converter_class.create(dataset, semantic, name_prefix) | ||
|
|
||
| return None | ||
| # Try mixed-media converter first — it activates when any VideoFrame | ||
| # items are present (mixed or video-only datasets). | ||
| mixed = ForwardMixedMediaConverter.create(dataset, semantic, name_prefix) | ||
| if mixed is not None: | ||
| return mixed | ||
|
|
||
| # No video frames found — try the image-only converter. | ||
| return ForwardImageMediaConverter.create(dataset, semantic, name_prefix) |
There was a problem hiding this comment.
This change makes get_forward_media_converter ignore the forward media-converter registry entirely, which is a behavioral/API regression for legacy datasets whose media type is neither image-only nor (mixed) video frames (e.g., any custom or future registered media converter). Consider preserving the registry-based lookup as a final fallback: try mixed-first, then image-only, then fall back to the registered converter by dataset.media_type() (or an iteration of registered converters) so existing extensibility continues to work.
| elif isinstance(item.media, Image) and isinstance(item.media, FromFileMixin): | ||
| result[key] = LazyImage(path=item.media.path) |
There was a problem hiding this comment.
In mixed datasets, images that are not FromFileMixin (e.g., ImageFromData / byte-backed images) will silently produce no media value, which can lead to null/missing media for those samples after conversion. Since this converter is selected for any dataset containing at least one VideoFrame, this can break mixed datasets that include non-file images. Consider either (a) supporting the same image variants as ForwardImageMediaConverter (bytes/callables/info), or (b) raising a clear exception when encountering unsupported image media in mixed mode so the failure is explicit.
| elif isinstance(item.media, Image) and isinstance(item.media, FromFileMixin): | |
| result[key] = LazyImage(path=item.media.path) | |
| elif isinstance(item.media, Image): | |
| if isinstance(item.media, FromFileMixin): | |
| result[key] = LazyImage(path=item.media.path) | |
| else: | |
| raise TypeError( | |
| "ForwardMixedMediaConverter does not support non-file-based " | |
| "images (e.g., byte/numpy-backed Image instances) in mixed " | |
| "datasets. Use a converter that supports in-memory images " | |
| "or ensure all images are file-based when mixing with video frames." | |
| ) |
| def convert_to_legacy_media(self, sample: Sample) -> MediaElement[Any]: | ||
| """Convert a LazyImage or LazyVideoFrame back to the appropriate legacy media. | ||
|
|
||
| Returns Image for LazyImage values and VideoFrame for LazyVideoFrame values. | ||
| """ | ||
| value = getattr(sample, self.media_attr) | ||
|
|
||
| if isinstance(value, LazyVideoFrame): | ||
| video_path = str(value.video_path) | ||
| if video_path not in self._video_cache: | ||
| self._video_cache[video_path] = Video(path=video_path) | ||
| video = self._video_cache[video_path] | ||
| return VideoFrame(video=video, index=value.frame_index) | ||
|
|
||
| if isinstance(value, LazyImage): | ||
| return Image.from_file(path=str(value.path)) # pyright: ignore[reportUnknownMemberType] | ||
|
|
||
| raise TypeError(f"Expected LazyImage or LazyVideoFrame, got {type(value)}") |
There was a problem hiding this comment.
BackwardMixedMediaConverter will raise a TypeError when the unified media value is None (or otherwise missing), but legacy DatasetItem media can be None (and the forward converters already treat media=None as a valid case by producing {}). To avoid breaking round-trips or conversions with missing media, handle None explicitly (e.g., return None so the legacy DatasetItem ends up with no media) and reserve TypeError for truly unexpected value types.
| if frame_idx is not None: | ||
| # Video frame — extract video metadata | ||
| video_info = extract_video_info(path_str) | ||
| rows.append( | ||
| { | ||
| "width": video_info.width, | ||
| "height": video_info.height, | ||
| "fps": float(video_info.fps) if video_info.fps else None, | ||
| "total_frames": video_info.total_frames, | ||
| "duration": float(video_info.duration) if video_info.duration else None, | ||
| "codec": video_info.codec, | ||
| "frame_index": int(frame_idx), | ||
| } | ||
| ) | ||
| else: | ||
| # Standalone image — read dimensions only | ||
| p = _Path(path_str) | ||
| if not p.exists(): | ||
| rows.append(None) | ||
| continue |
There was a problem hiding this comment.
For the video-frame path branch, the converter doesn’t check file existence and doesn’t handle failures from extract_video_info(path_str) (e.g., missing/unreadable video). This can raise exceptions during conversion, whereas the image branch explicitly returns None for missing files. Consider aligning behavior by checking existence for video paths too and/or wrapping extract_video_info in error handling that returns None when metadata cannot be extracted.
| rows: list[dict[str, Any] | None] = [] | ||
|
|
||
| for i in range(len(df)): | ||
| path = df[input_col][i] | ||
| frame_idx = df[frame_idx_col][i] if frame_idx_col in df.columns else None |
There was a problem hiding this comment.
This converter iterates row-by-row in Python and performs per-row file I/O (PIL open / video metadata extraction), which will be a bottleneck on large datasets. If this converter is expected to run on sizeable framesets, consider using a Polars-native approach (pl.map_elements / pl.struct(...).map_elements) to reduce Python overhead, and add caching for repeated paths (common for video frames) so the same video metadata isn’t recomputed for every frame.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
Checklist