Skip to content

Add support for importing legacy datumaro datasets with videos#2093

Draft
AlbertvanHouten wants to merge 2 commits intodevelopfrom
albert/import-legacy-videos
Draft

Add support for importing legacy datumaro datasets with videos#2093
AlbertvanHouten wants to merge 2 commits intodevelopfrom
albert/import-legacy-videos

Conversation

@AlbertvanHouten
Copy link
Copy Markdown
Contributor

Checklist

  • I have added tests to cover my changes or documented any manual tests.
  • I have updated the documentation accordingly

Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 96.75325% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...tumaro/experimental/converters/media_converters.py 93.75% 2 Missing and 1 partial ⚠️
...c/datumaro/experimental/legacy/media_converters.py 97.97% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@AlbertvanHouten AlbertvanHouten requested a review from Copilot April 1, 2026 08:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for importing legacy Datumaro datasets containing videos (via VideoFrame) into the experimental dataset format, including mixed image+video datasets, and extends the media converter bridge to support unified media-path workflows.

Changes:

  • Added forward/backward legacy media converters for VideoFrame and mixed (image + video frame) datasets, and updated forward-converter selection strategy accordingly.
  • Introduced converter utilities to promote VideoFramePathFieldMediaPathField and to extract MediaInfoField from MediaPathField.
  • Added extensive unit tests covering converter behavior, dataset analysis, and round-trip conversions for video/mixed datasets.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/unit/experimental/legacy/test_legacy_media_converters.py Adds unit tests for new legacy video + mixed media converters.
tests/unit/experimental/legacy/test_legacy_dataset_converters.py Adds tests for forward media converter strategy, analysis, convert_from/to_legacy, and round-trips for video/mixed datasets.
tests/unit/experimental/converters/test_media_bridge_converters.py Adds tests for new media bridge converters (VideoFramePathMediaPath, MediaPathMediaInfo).
src/datumaro/experimental/legacy/register_legacy_converters.py Changes forward media converter selection strategy; registers new backward converters.
src/datumaro/experimental/legacy/media_converters.py Implements forward/backward converters for VideoFrame and mixed datasets.
src/datumaro/experimental/legacy/init.py Exports newly added legacy media converters.
src/datumaro/experimental/converters/media_converters.py Adds VideoFramePathToMediaPathConverter and MediaPathToMediaInfoConverter.
src/datumaro/experimental/converters/init.py Exposes newly added experimental converters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +117 to +137
Strategy:
- If the dataset contains *any* video frames (mixed or video-only),
the :class:`ForwardMixedMediaConverter` is used with a unified
``media_path_field``.
- If the dataset contains *only* images (no video frames at all),
the :class:`ForwardImageMediaConverter` is used with an
``image_path_field``.

Args:
dataset: Legacy dataset to create converter from
semantic: The semantic type for the converted fields
name_prefix: Prefix to prepend to all field names
"""
# Get the dataset's media type
media_type = cast("type[MediaElement[Any]]", dataset.media_type())

# Try converter registered for this specific media type
if media_type in _media_converter_classes:
converter_class = _media_converter_classes[media_type]
return converter_class.create(dataset, semantic, name_prefix)

return None
# Try mixed-media converter first — it activates when any VideoFrame
# items are present (mixed or video-only datasets).
mixed = ForwardMixedMediaConverter.create(dataset, semantic, name_prefix)
if mixed is not None:
return mixed

# No video frames found — try the image-only converter.
return ForwardImageMediaConverter.create(dataset, semantic, name_prefix)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes get_forward_media_converter ignore the forward media-converter registry entirely, which is a behavioral/API regression for legacy datasets whose media type is neither image-only nor (mixed) video frames (e.g., any custom or future registered media converter). Consider preserving the registry-based lookup as a final fallback: try mixed-first, then image-only, then fall back to the registered converter by dataset.media_type() (or an iteration of registered converters) so existing extensibility continues to work.

Copilot uses AI. Check for mistakes.
Comment on lines +361 to +362
elif isinstance(item.media, Image) and isinstance(item.media, FromFileMixin):
result[key] = LazyImage(path=item.media.path)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In mixed datasets, images that are not FromFileMixin (e.g., ImageFromData / byte-backed images) will silently produce no media value, which can lead to null/missing media for those samples after conversion. Since this converter is selected for any dataset containing at least one VideoFrame, this can break mixed datasets that include non-file images. Consider either (a) supporting the same image variants as ForwardImageMediaConverter (bytes/callables/info), or (b) raising a clear exception when encountering unsupported image media in mixed mode so the failure is explicit.

Suggested change
elif isinstance(item.media, Image) and isinstance(item.media, FromFileMixin):
result[key] = LazyImage(path=item.media.path)
elif isinstance(item.media, Image):
if isinstance(item.media, FromFileMixin):
result[key] = LazyImage(path=item.media.path)
else:
raise TypeError(
"ForwardMixedMediaConverter does not support non-file-based "
"images (e.g., byte/numpy-backed Image instances) in mixed "
"datasets. Use a converter that supports in-memory images "
"or ensure all images are file-based when mixing with video frames."
)

Copilot uses AI. Check for mistakes.
Comment on lines +489 to +506
def convert_to_legacy_media(self, sample: Sample) -> MediaElement[Any]:
"""Convert a LazyImage or LazyVideoFrame back to the appropriate legacy media.

Returns Image for LazyImage values and VideoFrame for LazyVideoFrame values.
"""
value = getattr(sample, self.media_attr)

if isinstance(value, LazyVideoFrame):
video_path = str(value.video_path)
if video_path not in self._video_cache:
self._video_cache[video_path] = Video(path=video_path)
video = self._video_cache[video_path]
return VideoFrame(video=video, index=value.frame_index)

if isinstance(value, LazyImage):
return Image.from_file(path=str(value.path)) # pyright: ignore[reportUnknownMemberType]

raise TypeError(f"Expected LazyImage or LazyVideoFrame, got {type(value)}")
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackwardMixedMediaConverter will raise a TypeError when the unified media value is None (or otherwise missing), but legacy DatasetItem media can be None (and the forward converters already treat media=None as a valid case by producing {}). To avoid breaking round-trips or conversions with missing media, handle None explicitly (e.g., return None so the legacy DatasetItem ends up with no media) and reserve TypeError for truly unexpected value types.

Copilot uses AI. Check for mistakes.
Comment on lines +705 to +724
if frame_idx is not None:
# Video frame — extract video metadata
video_info = extract_video_info(path_str)
rows.append(
{
"width": video_info.width,
"height": video_info.height,
"fps": float(video_info.fps) if video_info.fps else None,
"total_frames": video_info.total_frames,
"duration": float(video_info.duration) if video_info.duration else None,
"codec": video_info.codec,
"frame_index": int(frame_idx),
}
)
else:
# Standalone image — read dimensions only
p = _Path(path_str)
if not p.exists():
rows.append(None)
continue
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the video-frame path branch, the converter doesn’t check file existence and doesn’t handle failures from extract_video_info(path_str) (e.g., missing/unreadable video). This can raise exceptions during conversion, whereas the image branch explicitly returns None for missing files. Consider aligning behavior by checking existence for video paths too and/or wrapping extract_video_info in error handling that returns None when metadata cannot be extracted.

Copilot uses AI. Check for mistakes.
Comment on lines +693 to +697
rows: list[dict[str, Any] | None] = []

for i in range(len(df)):
path = df[input_col][i]
frame_idx = df[frame_idx_col][i] if frame_idx_col in df.columns else None
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converter iterates row-by-row in Python and performs per-row file I/O (PIL open / video metadata extraction), which will be a bottleneck on large datasets. If this converter is expected to run on sizeable framesets, consider using a Polars-native approach (pl.map_elements / pl.struct(...).map_elements) to reduce Python overhead, and add caching for repeated paths (common for video frames) so the same video metadata isn’t recomputed for every frame.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Albert van Houten <albert.van.houten@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants