Skip to content

Add notebook for Kaggle importers#1254

Merged
wonjuleee merged 4 commits intoopen-edge-platform:developfrom
wonjuleee:nb_kaggle
Feb 1, 2024
Merged

Add notebook for Kaggle importers#1254
wonjuleee merged 4 commits intoopen-edge-platform:developfrom
wonjuleee:nb_kaggle

Conversation

@wonjuleee
Copy link
Copy Markdown
Contributor

@wonjuleee wonjuleee commented Jan 31, 2024

Summary

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have added the description of my changes into CHANGELOG.​
  • I have updated the documentation accordingly

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

@wonjuleee wonjuleee requested review from a team as code owners January 31, 2024 05:46
@wonjuleee wonjuleee requested review from itrushkin and removed request for a team January 31, 2024 05:46
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@wonjuleee wonjuleee requested a review from vinnamkim January 31, 2024 05:47
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (acc491d) 80.55% compared to head (2e68f4a) 80.55%.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #1254   +/-   ##
========================================
  Coverage    80.55%   80.55%           
========================================
  Files          271      271           
  Lines        30442    30442           
  Branches      5930     5930           
========================================
  Hits         24524    24524           
  Misses        4530     4530           
  Partials      1388     1388           
Flag Coverage Δ
ubuntu-20.04_Python-3.8 80.55% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@review-notebook-app
Copy link
Copy Markdown

review-notebook-app bot commented Feb 1, 2024

View / edit / reply to this conversation on ReviewNB

vinnamkim commented on 2024-02-01T06:57:22Z
----------------------------------------------------------------

according to the tags -> according to the tag


wonjuleee commented on 2024-02-01T07:08:40Z
----------------------------------------------------------------

fixed at 2e68f4a

@review-notebook-app
Copy link
Copy Markdown

review-notebook-app bot commented Feb 1, 2024

View / edit / reply to this conversation on ReviewNB

vinnamkim commented on 2024-02-01T06:57:22Z
----------------------------------------------------------------

I would like to suggest sentence changes to bring better clarification from my point of view while reading:

  1. by loosening a strictly given limitation -> by loosening strict rules of the original format
  2. while some of them require to feed rule for reading an annotation file -> while some of them require user assistance to feed rules for reading an annotation file

wonjuleee commented on 2024-02-01T07:08:44Z
----------------------------------------------------------------

fixed at 2e68f4a

@review-notebook-app
Copy link
Copy Markdown

review-notebook-app bot commented Feb 1, 2024

View / edit / reply to this conversation on ReviewNB

vinnamkim commented on 2024-02-01T06:57:23Z
----------------------------------------------------------------

Is this able to be added into the Datumaro download feature in the future? Currently, all of our downloadable dataset catalog is from tensorflow-datasets. We might expand it to the kaggle datasets.


wonjuleee commented on 2024-02-01T07:10:48Z
----------------------------------------------------------------

It sounds great. Let me create a new requirement for this.

All datasets are available for downloading at [here](https://www.kaggle.com/datasets?tags=13207-Computer+Vision).
However, since Kaggle doesn't enforce community to follow specific rule for dataset uploads, it is more natural to explore a dataset directoy structure by manual.
So it eventually requires to take some time for importing those datasets into their machine learning codes.
Therefore, Datumaro is providing ability to import them through Datumaro Python APIs.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Therefore, Datumaro is providing ability to import them through Datumaro Python APIs.
Therefore, Datumaro is providing an ability to import them through Datumaro Python APIs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed at 2e68f4a


## Import Kaggle Image CSV dataset

Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more arguments for importing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more arguments for importing.
Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more user-assisted arguments for importing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed at 2e68f4a


dataset = dm.Dataset.import_from('<path_to_image_directory>', format='kaggle_image_csv', ann_file='<path_to_csv_file>', columns={"media": "column_name_of_media", "label": "column_name_of_label"})
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At this time, it's essential to specify the column names for media and label such as `dm.Dataset.import_from(..., columns={"media": "column_name_of_media", "label": "column_name_of_label"})

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed at 2e68f4a

Copy link
Copy Markdown
Contributor Author

fixed at 2e68f4a


View entire conversation on ReviewNB

Copy link
Copy Markdown
Contributor Author

fixed at 2e68f4a


View entire conversation on ReviewNB

Copy link
Copy Markdown
Contributor Author

It sounds great. Let me create a new requirement for this.


View entire conversation on ReviewNB

Copy link
Copy Markdown
Contributor

@vinnamkim vinnamkim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wonjuleee wonjuleee merged commit 76769b5 into open-edge-platform:develop Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants