Add notebook for Kaggle importers by wonjuleee · Pull Request #1254 · open-edge-platform/datumaro

wonjuleee · 2024-01-31T05:46:47Z

Summary

How to test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have added the description of my changes into CHANGELOG.
I have updated the documentation accordingly

License

I submit my code changes under the same MIT License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2023 Intel Corporation
#
# SPDX-License-Identifier: MIT

review-notebook-app · 2024-01-31T05:46:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2024-01-31T05:56:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (acc491d) 80.55% compared to head (2e68f4a) 80.55%.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #1254   +/-   ##
========================================
  Coverage    80.55%   80.55%           
========================================
  Files          271      271           
  Lines        30442    30442           
  Branches      5930     5930           
========================================
  Hits         24524    24524           
  Misses        4530     4530           
  Partials      1388     1388

Flag	Coverage Δ
ubuntu-20.04_Python-3.8	`80.55% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

review-notebook-app · 2024-02-01T06:57:22Z

View / edit / reply to this conversation on ReviewNB

vinnamkim commented on 2024-02-01T06:57:22Z
----------------------------------------------------------------

according to the tags -> according to the tag

wonjuleee commented on 2024-02-01T07:08:40Z
----------------------------------------------------------------

fixed at 2e68f4a

review-notebook-app · 2024-02-01T06:57:23Z

View / edit / reply to this conversation on ReviewNB

vinnamkim commented on 2024-02-01T06:57:22Z
----------------------------------------------------------------

I would like to suggest sentence changes to bring better clarification from my point of view while reading:

by loosening a strictly given limitation -> by loosening strict rules of the original format
while some of them require to feed rule for reading an annotation file -> while some of them require user assistance to feed rules for reading an annotation file

wonjuleee commented on 2024-02-01T07:08:44Z
----------------------------------------------------------------

fixed at 2e68f4a

review-notebook-app · 2024-02-01T06:57:24Z

View / edit / reply to this conversation on ReviewNB

vinnamkim commented on 2024-02-01T06:57:23Z
----------------------------------------------------------------

Is this able to be added into the Datumaro download feature in the future? Currently, all of our downloadable dataset catalog is from tensorflow-datasets. We might expand it to the kaggle datasets.

wonjuleee commented on 2024-02-01T07:10:48Z
----------------------------------------------------------------

It sounds great. Let me create a new requirement for this.

vinnamkim · 2024-02-01T06:58:01Z

docs/source/docs/data-formats/formats/kaggle.md

+All datasets are available for downloading at [here](https://www.kaggle.com/datasets?tags=13207-Computer+Vision).
+However, since Kaggle doesn't enforce community to follow specific rule for dataset uploads, it is more natural to explore a dataset directoy structure by manual.
+So it eventually requires to take some time for importing those datasets into their machine learning codes.
+Therefore, Datumaro is providing ability to import them through Datumaro Python APIs.


Suggested change

Therefore, Datumaro is providing ability to import them through Datumaro Python APIs.

Therefore, Datumaro is providing an ability to import them through Datumaro Python APIs.

fixed at 2e68f4a

vinnamkim · 2024-02-01T06:58:26Z

docs/source/docs/data-formats/formats/kaggle.md

+
+## Import Kaggle Image CSV dataset
+
+Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more arguments for importing.


Suggested change

Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more arguments for importing.

Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more user-assisted arguments for importing.

fixed at 2e68f4a

vinnamkim · 2024-02-01T07:02:19Z

docs/source/docs/data-formats/formats/kaggle.md

+
+dataset = dm.Dataset.import_from('<path_to_image_directory>', format='kaggle_image_csv', ann_file='<path_to_csv_file>', columns={"media": "column_name_of_media", "label": "column_name_of_label"})
+```
+


Suggested change

At this time, it's essential to specify the column names for media and label such as `dm.Dataset.import_from(..., columns={"media": "column_name_of_media", "label": "column_name_of_label"})

fixed at 2e68f4a

wonjuleee · 2024-02-01T07:08:41Z

fixed at 2e68f4a

View entire conversation on ReviewNB

wonjuleee · 2024-02-01T07:08:46Z

fixed at 2e68f4a

View entire conversation on ReviewNB

wonjuleee · 2024-02-01T07:10:50Z

It sounds great. Let me create a new requirement for this.

View entire conversation on ReviewNB

vinnamkim

LGTM.

wonjuleee requested review from a team as code owners January 31, 2024 05:46

wonjuleee requested review from itrushkin and removed request for a team January 31, 2024 05:46

wonjuleee requested a review from vinnamkim January 31, 2024 05:47

wonjuleee added 3 commits February 1, 2024 13:49

add notebook for kaggle importer

76ed7ee

add notebook for kaggle importer

d4507f4

doc for kaggle

665607e

wonjuleee force-pushed the nb_kaggle branch from 54d8ed9 to 665607e Compare February 1, 2024 05:45

reflect comments

2e68f4a

vinnamkim reviewed Feb 1, 2024

View reviewed changes

vinnamkim approved these changes Feb 1, 2024

View reviewed changes

wonjuleee merged commit 76769b5 into open-edge-platform:develop Feb 1, 2024

	Therefore, Datumaro is providing ability to import them through Datumaro Python APIs.
	Therefore, Datumaro is providing an ability to import them through Datumaro Python APIs.


		## Import Kaggle Image CSV dataset

		Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more arguments for importing.


		dataset = dm.Dataset.import_from('<path_to_image_directory>', format='kaggle_image_csv', ann_file='<path_to_csv_file>', columns={"media": "column_name_of_media", "label": "column_name_of_label"})
		```


	At this time, it's essential to specify the column names for media and label such as `dm.Dataset.import_from(..., columns={"media": "column_name_of_media", "label": "column_name_of_label"})

Conversation

wonjuleee commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to test

Checklist

License

Uh oh!

review-notebook-app bot commented Jan 31, 2024

Uh oh!

codecov bot commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

review-notebook-app bot commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vinnamkim Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

wonjuleee Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

vinnamkim Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

wonjuleee Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

vinnamkim Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

wonjuleee Feb 1, 2024

Choose a reason for hiding this comment

Uh oh!

wonjuleee commented Feb 1, 2024

Uh oh!

wonjuleee commented Feb 1, 2024

Uh oh!

wonjuleee commented Feb 1, 2024

Uh oh!

vinnamkim left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wonjuleee commented Jan 31, 2024 •

edited

Loading

codecov bot commented Jan 31, 2024 •

edited

Loading

review-notebook-app bot commented Feb 1, 2024 •

edited

Loading

review-notebook-app bot commented Feb 1, 2024 •

edited

Loading

review-notebook-app bot commented Feb 1, 2024 •

edited

Loading