Skip to content

Save and load hashkey for explorer#981

Merged
sooahleex merged 20 commits intoopen-edge-platform:developfrom
sooahleex:feature/save_hashkey
May 11, 2023
Merged

Save and load hashkey for explorer#981
sooahleex merged 20 commits intoopen-edge-platform:developfrom
sooahleex:feature/save_hashkey

Conversation

@sooahleex
Copy link
Copy Markdown
Contributor

@sooahleex sooahleex commented Apr 28, 2023

Summary

  • Ticket no.107264
  • Save and load HashKey for dataset after explore command
  • Get list of dataset in explorer
  • Export HashKey annotation in datumaro format
  • Usecase for explorer
    • w/w.o target for explore command
    • explore -> add -> explore
    • explore -> merge -> explore
    • Support versioning

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@sooahleex sooahleex added the enhancement Enhancement of existing features label Apr 28, 2023
@sooahleex sooahleex added this to the 1.3.0 milestone Apr 28, 2023
@sooahleex sooahleex marked this pull request as ready for review May 2, 2023 06:45
@sooahleex sooahleex requested review from a team as code owners May 2, 2023 06:45
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 2, 2023

Codecov Report

Patch coverage: 40.36% and project coverage change: -0.23 ⚠️

Comparison is base (38bbf0c) 78.75% compared to head (d8e65d4) 78.53%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #981      +/-   ##
===========================================
- Coverage    78.75%   78.53%   -0.23%     
===========================================
  Files          233      233              
  Lines        26626    26749     +123     
  Branches      5283     5320      +37     
===========================================
+ Hits         20969    21007      +38     
- Misses        4424     4497      +73     
- Partials      1233     1245      +12     
Flag Coverage Δ
macos-11_Python-3.8 77.53% <24.09%> (-0.25%) ⬇️
ubuntu-20.04_Python-3.8 78.51% <40.36%> (-0.22%) ⬇️
windows-2019_Python-3.8 78.41% <40.36%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
datumaro/plugins/explorer.py 81.81% <0.00%> (-3.64%) ⬇️
datumaro/cli/commands/explore.py 19.48% <7.14%> (-7.80%) ⬇️
datumaro/plugins/data_formats/datumaro/exporter.py 91.21% <25.00%> (-3.95%) ⬇️
...vino_plugin/samples/clip_visual_ViT-B_32_interp.py 29.16% <25.00%> (-2.09%) ⬇️
datumaro/util/meta_file_util.py 64.63% <31.57%> (-28.70%) ⬇️
datumaro/plugins/data_formats/datumaro/base.py 89.03% <42.85%> (-5.34%) ⬇️
datumaro/components/project.py 79.16% <66.66%> (-0.03%) ⬇️
datumaro/components/exporter.py 86.89% <75.00%> (-0.30%) ⬇️
datumaro/components/explorer.py 58.40% <83.72%> (+2.72%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Comment thread datumaro/cli/commands/explore.py Outdated
Comment thread datumaro/cli/commands/explore.py Outdated
Comment thread datumaro/components/explorer.py Outdated
Comment thread datumaro/components/explorer.py
Comment thread datumaro/components/explorer.py Outdated
Comment thread datumaro/components/launcher.py Outdated
@sooahleex sooahleex marked this pull request as draft May 2, 2023 08:18
@sooahleex sooahleex force-pushed the feature/save_hashkey branch from d063c2b to 019e552 Compare May 8, 2023 07:03
@sooahleex sooahleex marked this pull request as ready for review May 8, 2023 07:19
Comment thread datumaro/components/project.py Outdated
Comment thread datumaro/components/project.py Outdated
Comment thread datumaro/components/explorer.py Outdated
Comment thread datumaro/components/explorer.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move saving hash key functionality to components/exporter.py::Exporter, so that it can be usable for all data formats. This saving function should store the following file with the following directory structure.

dataset_directory/
     - hash_key_meta/
            - hash_keys.json # Hash key data (you can use binary file also)
            - index.bin # FAISS index (future)
     ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cover this in future PR.

Copy link
Copy Markdown
Contributor

@vinnamkim vinnamkim May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make loading hash key functionality global (all data formats). I think that you need to implement class DatasetExtractor by extending class DatasetBase to take in path to get the hash key checkpoint:

class DatasetExtractor(DatasetBase):
    def __init__(
        self,
        path: str,
        *,
        length: Optional[int] = None,
        subsets: Optional[Sequence[str]] = None,
        media_type: Type[MediaElement] = Image,
        ctx: Optional[ImportContext] = None,
    ):
        ...
        self._load_hash_key(path)
     
     def _load_hash_key(self, path):
        ...

Subsequently, make all data format plugins to inherit DatasetExtractor.

However, I think it can make this PR too huge. Please do it as a separate PR and implement self._load_hash_key(path) function to DatasetExtractor and Datumaro format in this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cover this in future PR, too.

Comment thread datumaro/components/explorer.py Outdated
Comment thread datumaro/components/launcher.py Outdated
@sooahleex sooahleex force-pushed the feature/save_hashkey branch from 67a838e to fd6e6af Compare May 9, 2023 12:41
Comment thread datumaro/components/explorer.py Outdated
Comment thread datumaro/components/explorer.py Outdated
Comment thread datumaro/plugins/data_formats/datumaro/exporter.py Outdated
Comment thread datumaro/plugins/data_formats/datumaro/base.py Outdated
JihwanEom
JihwanEom previously approved these changes May 10, 2023
Comment thread datumaro/components/config_model.py Outdated
Comment thread datumaro/plugins/data_formats/datumaro/base.py Outdated
Comment thread datumaro/components/explorer.py Outdated
Comment thread datumaro/util/meta_file_util.py Outdated
@sooahleex sooahleex requested a review from JihwanEom May 10, 2023 08:04
Comment thread datumaro/plugins/data_formats/datumaro/base.py Outdated
Comment thread datumaro/plugins/data_formats/datumaro/exporter.py Outdated
Comment thread datumaro/plugins/explorer.py Outdated
Copy link
Copy Markdown
Contributor

@vinnamkim vinnamkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@sooahleex sooahleex merged commit 9ab0954 into open-edge-platform:develop May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement of existing features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants