Add notebook for Kaggle importers#1254
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1254 +/- ##
========================================
Coverage 80.55% 80.55%
========================================
Files 271 271
Lines 30442 30442
Branches 5930 5930
========================================
Hits 24524 24524
Misses 4530 4530
Partials 1388 1388
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
View / edit / reply to this conversation on ReviewNB vinnamkim commented on 2024-02-01T06:57:22Z according to the tags -> according to the tag wonjuleee commented on 2024-02-01T07:08:40Z fixed at 2e68f4a |
|
View / edit / reply to this conversation on ReviewNB vinnamkim commented on 2024-02-01T06:57:22Z I would like to suggest sentence changes to bring better clarification from my point of view while reading:
wonjuleee commented on 2024-02-01T07:08:44Z fixed at 2e68f4a |
|
View / edit / reply to this conversation on ReviewNB vinnamkim commented on 2024-02-01T06:57:23Z Is this able to be added into the Datumaro download feature in the future? Currently, all of our downloadable dataset catalog is from wonjuleee commented on 2024-02-01T07:10:48Z It sounds great. Let me create a new requirement for this. |
| All datasets are available for downloading at [here](https://www.kaggle.com/datasets?tags=13207-Computer+Vision). | ||
| However, since Kaggle doesn't enforce community to follow specific rule for dataset uploads, it is more natural to explore a dataset directoy structure by manual. | ||
| So it eventually requires to take some time for importing those datasets into their machine learning codes. | ||
| Therefore, Datumaro is providing ability to import them through Datumaro Python APIs. |
There was a problem hiding this comment.
| Therefore, Datumaro is providing ability to import them through Datumaro Python APIs. | |
| Therefore, Datumaro is providing an ability to import them through Datumaro Python APIs. |
|
|
||
| ## Import Kaggle Image CSV dataset | ||
|
|
||
| Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more arguments for importing. |
There was a problem hiding this comment.
| Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more arguments for importing. | |
| Indeed, Kaggle doesn't have any specific directory structure, and Datumaro hence requires more user-assisted arguments for importing. |
|
|
||
| dataset = dm.Dataset.import_from('<path_to_image_directory>', format='kaggle_image_csv', ann_file='<path_to_csv_file>', columns={"media": "column_name_of_media", "label": "column_name_of_label"}) | ||
| ``` | ||
|
|
There was a problem hiding this comment.
| At this time, it's essential to specify the column names for media and label such as `dm.Dataset.import_from(..., columns={"media": "column_name_of_media", "label": "column_name_of_label"}) |
|
fixed at 2e68f4a View entire conversation on ReviewNB |
|
fixed at 2e68f4a View entire conversation on ReviewNB |
|
It sounds great. Let me create a new requirement for this. View entire conversation on ReviewNB |
Summary
How to test
Checklist
License
Feel free to contact the maintainers if that's a concern.