Add task type information when importing#1422
Add task type information when importing#1422wonjuleee merged 26 commits intoopen-edge-platform:developfrom
Conversation
| caption = 11 | ||
| super_resolution = 12 | ||
| depth_estimation = 13 | ||
| mixed = 14 |
There was a problem hiding this comment.
Is there a way for users to know what tasks are possible when TaskType is mixed?
There was a problem hiding this comment.
Mixed task can be transformed to any task types. The reason why we are providing mixed is because Datumaro format can have any AnnotationType when importing.
There was a problem hiding this comment.
I could see there are many changes in plugins/data_format. However, I'd rather revert them and let a set of annotation types existing in the dataset to be managed by DatasetStorage (Dataset's dataset item container) or StreamDatasetStorage (a correspondent to StreamDataset). This is because
- This implementation makes a hidden constraint that every dataset extractor (
DatasetBase) should implement an annotation type gatherer such as
https://github.com/openvinotoolkit/datumaro/blob/f97820b6e6d6de107af8561c829db4321e7b7f70/src/datumaro/plugins/data_formats/ade20k2017.py#L72 - This implementation is not aligned with our dataset transform logics. It currently compute
task_typeatDatasetBase. Let's assume that someDatasetBasedecides that a given dataset has two annotation types,LabelandBbox. However, if an arbitrary dataset transform is applied on top of it and it drops everyBbox, we must re-compute a set of annotation types existed in the dataset after transformation. This should be done byDatasetStorageorStreamDatasetStorage.
Following this idea, it would be
class DatasetStorage:
def __init__(self):
...
self._set_of_ann_types: set | None = None
...
@property
def set_of_ann_types(self):
if self._set_of_ann_types is None:
self._set_of_ann_types = set()
# If reset or not computed, run its iterator to compute
for item in self:
for ann in item.annotations:
self._set_of_ann_types.add(ann.type)
return self._set_of_ann_types
@property
def task_type(self):
return infer_task_type_from_set_of_ann_types(self.set_of_ann_types)
...
def _iter_init_cache_unchecked(self) -> Iterable[DatasetItem]:
# Merges the source, source transforms and patch, caches the result
# and provides an iterator for the resulting item sequence.
...
# Reset if there is a possible change
self._set_of_ann_types = NoneThere was a problem hiding this comment.
Thank you for the good idea. When I approached with this way, it needs to have a single iteration of whole dataset for obtaining available task information. So, I have turned to obtain available task during importing. What do you think?
|
I have a question. I'm curious why datasets are designed to have a single task_type. For example, if a dataset has both label and bbox annotations, it can be used for both classification and detection tasks. (And even for anomaly cls/det. tasks if the labels are anomalous and normal). However, according to your implementation, it seems like the task_type becomes detection. |
Hi @jihyeonyi, thank you for the question. We are able to identify the mapping between annotation types and tasks in |
1d05fe9 to
3d30c1d
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1422 +/- ##
===========================================
+ Coverage 80.85% 80.98% +0.12%
===========================================
Files 271 272 +1
Lines 30689 31137 +448
Branches 6197 6279 +82
===========================================
+ Hits 24815 25216 +401
- Misses 4489 4505 +16
- Partials 1385 1416 +31
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
| # # when adding a new item, task_type will be updated automatically | ||
| # for ann in item.annotations: | ||
| # self._set_of_ann_types.add(ann.type) | ||
| # self._task_type = TaskAnnotationMapping().get_task(self._set_of_ann_types) |
|
|
||
| def __iter__(self) -> Iterator[DatasetItem]: | ||
| yield from self.stacked_transform | ||
| # yield from self.stacked_transform |
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.