Kate/splitter cli by jihyeonyi · Pull Request #81 · open-edge-platform/datumaro

jihyeonyi · 2021-01-12T04:13:30Z

Summary

This PR includes

supporting CLI for task-specific split
Revise re-identification split
Update documentation regarding the task-specific split

How to test

Unittest

$ python -m unittest -v tests/test_splitter.py

Testing classification split with imagenet dataset.

Notes: Imagenet doesn't support subsets but, checking subsets at the project level is enough here.

$ pip install .
$ datum project create -o imagenet
$ datum source add path <path-to-source> -f imagenet -p imagenet/
$ datum project transform -t classification_split -p imagenet/ -- --subset train:.5 --subset val:.2 --subset test:.3
$ datum project info -p imagenet-classification_split

Testing detection split with voc dataset

$ pip install .
$ datum project import -i <path-to-voc> -f voc
$ cd voc/
$ datum project transform -t detection_split -- --subset train:.5 --subset val:.2 --subset test:.3
$ datum project info -p voc-detection_split

Testing re-identification split with imagenet dataset.

Notes: Datumaro doesn't support re-id dataset now, so the classification dataset is used instead.

$ pip install .
$ datum project create -o imagenet
$ datum source add path <path-to-imagenet> -f imagenet -p imagenet/
$ datum project transform -t reidentification_split -p imagenet/ -- --subset train:.5 --subset val:.2 --subset test:.3 --query .5
$ datum project info -p imagenet-reidentification_split

Checklist

I submit my changes into the develop branch
I have added description of my changes into CHANGELOG
I have updated the documentation accordingly
I have added tests to cover my changes
I have linked related issues)

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below)

# Copyright (C) 2020 Intel Corporation
#
# SPDX-License-Identifier: MIT

datumaro/plugins/splitter.py

README.md

tests/test_splitter.py

zhiltsov-max

Please check the updated class descriptions for correctness.

Future updates could include:

ignoring attributes in classification split (for captions, descriptions and other technical attributes)
splitting using an attribute as label in classification split
using polygons and masks in detection split

README.md

jihyeonyi · 2021-01-14T10:24:23Z

datumaro/plugins/splitter.py

+    Produces a split with a specified ratio of images, avoiding having same
+    labels in different subsets.|n


Here, we avoid having the same person id or object id. It could be label or attribute if attr_for_id is specified.

One more thing is, actually train and val set share person id or object id. (Most person re-identification data doesn't have val set though). But they do not share IDs with test set.
I'm not sure how accurate the explanation should be.
If you feel the current explanation is sufficient, please leave it as it is.

jihyeonyi · 2021-01-14T10:49:02Z

Please check the updated class descriptions for correctness.

Future updates could include:

ignoring attributes in classification split (for captions, descriptions and other technical attributes)

splitting using an attribute as label in classification split

using polygons and masks in detection split

Thank you for revising the descriptions.
And for future updates,

Would you like to remove the attribute-based splitting or just make it optional?
I think the latter is better.
When you say 'splitting using an attribute as label', do you mean splitting using only attributes, regardless of labels?
Does the detection task have polygons or masks? I thought it is for the segmentation task. Maybe I'm wrong.
For your information, I'll add a splitter for the segmentation task. So why don't you add polygons or masks later?

zhiltsov-max · 2021-01-14T14:19:39Z

Would you like to remove the attribute-based splitting or just make it optional?

Optional, enabled by default.

When you say 'splitting using an attribute as label', do you mean splitting using only attributes, regardless of labels?

I mean using a single attribute, like in re-id. Maybe, using some subset of them / ignoring some attributes.

Does the detection task have polygons or masks?

In Mask R-CNN they are intermixed with segmentation task. I, personally, consider these types of annotations more or less interchangeable, because all these types can be used for training a segmentation and a detection algorithm.

…SpecificSplit), 3. revise test code

* syncing util/mask_tools.py * syncing util/image.py * keeping exif unconditionally * syncing components/media.py * syncing components/importer.py * syncing util/meta_file_util.py * moving cli/contexts/project/diff.py to cli/util/compare.py * moving Registry and PluginRegistry to components/registry.py * syncing components/exporter.py * syncing components/hl_ops.py * syncing components/dataset.py * limiting opencv version (due to opencv/opencv#25809) * fixes * upper case extension fix * fixes * always keeping exif info * limiting opencv version * Update src/datumaro/components/media.py Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * test for reading exif orientation * changelog entry * fixed isort * fixed test * fixed changelog * Update src/datumaro/components/hl_ops/__init__.py Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * fixing filter examples * hl_ops tests * syncing plugins/data_formats/celeba * syncing plugins/data_formats/cifar.py * setting DETECT_CONFIDENCE for yolo formats * syncing plugins/data_formats/image_dir.py * better detection for yolo classification importer * syncing plugins/data_formats/imagenet.py and plugins/data_formats/imagenet_txt.py * syncing plugins/data_formats/camvid.py * syncing tests/integration/cli/test_detect_format.py * syncing cli/util/project.py * syncing tests/integration/cli/test_filter.py * syncing tests/integration/cli/test_transform.py * yolo streaming exporter * syncing plugins/data_formats/coco * Update src/datumaro/components/media.py Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * Update src/datumaro/components/media.py Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * Update src/datumaro/components/media.py Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * Update tests/unit/test_video.py Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * Update src/datumaro/components/registry.py Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * coco find_images_dir do not fail if images folder doe not exist - because cvat needs to be able to export and then import dataset without images * coco find_rootpath do not fail if path does not end with ANNOTATIONS_DIR - because cvat needs it * fixes * fix linters * tests for HLOps.compare * syncing tests/unit/test_image.py * accounting for the new flag in cv2 * syncing components/importer.py * fixes * fixes * fixes * fixes * tests in test_masks.py from upstream * a bit of info on ImageColorChannel.UNCHANGED * fixing wrong merge * removing bad changes * rolling back changes in test * do not recollect subset names in StreamDatasetStorage if transformations do not change subsets * fixes * Refactor with_subset_dirs * Support detect() calls with no return value * Update importer detection confidence * Lower the default confidence * Align default format detection confidence in detector and importer * Clean imports * syncing tests/conftest.py and tests/unit/data_formats/conftest.py * syncing imagenet tests * test new yolo classification detetection behaviour * syncing tests/unit/test_format_detection.py * Apply suggestions from code review Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * fixes * fixes * Apply suggestions from code review Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com> * small fixes * basic streaming tests for coco and yolo formats * returning previous tests and behaviour for coco * Improve function name * raising error on unknown image id * test coco streaming * test yolo streaming --------- Co-authored-by: Maxim Zhiltsov <zhiltsov.max35@gmail.com>

jihyeonyi force-pushed the kate/splitter-cli branch from 84b16c6 to ff4cd80 Compare January 12, 2021 04:36

zhiltsov-max previously approved these changes Jan 13, 2021

View reviewed changes

datumaro/plugins/splitter.py Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

tests/test_splitter.py Outdated Show resolved Hide resolved

jihyeonyi dismissed zhiltsov-max’s stale review via dc85888 January 14, 2021 03:15

jihyeonyi force-pushed the kate/splitter-cli branch from ff4cd80 to dc85888 Compare January 14, 2021 03:15

Update changelog

bec7467

zhiltsov-max approved these changes Jan 14, 2021

View reviewed changes

jihyeonyi commented Jan 14, 2021

View reviewed changes

README.md Show resolved Hide resolved

jihyeonyi commented Jan 14, 2021

View reviewed changes

zhiltsov-max merged commit 1ee908f into develop Jan 14, 2021

zhiltsov-max deleted the kate/splitter-cli branch February 16, 2021 10:55

Yi, Jihyeon and others added 6 commits March 13, 2021 15:34

add cli support for classification/detection splitter

e9c896e

revisit re-id splitter and implement cli for re-id

011b852

update documentation for task-specific split

875e385

add changelog and revert toc part of user_manual and README

d3231a4

1. add more description regarding split, 2. move to base class (_Task…

dc85888

…SpecificSplit), 3. revise test code

Update docs

55e0718

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kate/splitter cli#81

Kate/splitter cli#81
zhiltsov-max merged 7 commits intodevelopfrom
kate/splitter-cli

jihyeonyi commented Jan 12, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhiltsov-max left a comment

Uh oh!

Uh oh!

jihyeonyi Jan 14, 2021

Uh oh!

jihyeonyi Jan 14, 2021 •

edited

Loading

Uh oh!

jihyeonyi commented Jan 14, 2021

Uh oh!

zhiltsov-max commented Jan 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Produces a split with a specified ratio of images, avoiding having same
		labels in different subsets.\|n

Conversation

jihyeonyi commented Jan 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to test

Checklist

License

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhiltsov-max left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jihyeonyi Jan 14, 2021

Choose a reason for hiding this comment

Uh oh!

jihyeonyi Jan 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihyeonyi commented Jan 14, 2021

Uh oh!

zhiltsov-max commented Jan 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jihyeonyi commented Jan 12, 2021 •

edited

Loading

jihyeonyi Jan 14, 2021 •

edited

Loading