Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions docs/source/docs/data-formats/datumaro_format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Datumaro Format

So far, in the field of computer vision, there are various tasks such as classification, detection,
and segmentation, as well as pose estimation and visual tracking, and public data is used by providing
a format suitable for each task. Even within the same segmentation task, some data formats provide
annotation information as polygons, while others provide mask form. In order to ensure compatibility
with different tasks and formats, we provide a novel Datumaro format with `.json` or `.datum`
extensions.

A variety of metadata can be stored in the datumaro format. First of all, `dm_format_version` field
is provided for backward compatibility to help with data version tracing.And various metadata can be
added to the `info` field. For example, you can record task types such as detection and segmentation,
or record data creation time. Labels and attributes can be saved in the `categories` field, and mask
colormap information can be saved. In addition, in the datumaro format, in order to respond to
hierarchical classification or multi-label classification tasks, `label_group` is provided to record
whether or not enabling multiple selection between labels in a group and the `parent` is provided to
specify the parent label for each label. Finally, in the `item` field, we can write the annotation
information for each media id, and additionally write the data path and data size.

Here is the example of `json` annotation file:

```json
{
"dm_format_version": "1.0",
"infos": {
"task": "anomaly_detection",
"creation time": "2023.4.1"
},
"categories": {
"label": {
"labels": [
{
"name": "Normal",
"parent": "",
"attributes": []
},
{
"name": "Anomalous",
"parent": "",
"attributes": []
}
],
"label_groups": [
{
"name": "Label",
"group_type": "exclusive",
"labels": [
"Anomalous",
"Normal"
]
}
],
"attributes": []
},
"mask": {
"colormap": [
{
"label_id": 1,
"r": 255,
"g": 255,
"b": 255
}
]
}
},
"items": [
{
"id": "good_001",
"annotations": [
{
"id": 0,
"type": "label",
"attributes": {},
"group": 0,
"label_id": 0
}
],
"image": {
"path": "good_001.jpg",
"size": [
900,
900
]
}
},
{
"id": "broken_small_001",
"annotations": [
{
"id": 0,
"type": "bbox",
"attributes": {},
"group": 0,
"label_id": 1,
"z_order": 0,
"bbox": [
350.8999938964844,
151.3899993896484,
275.1399841308594,
126.4900054931640
]
}
],
"image": {
"path": "broken_small_001.jpg",
"size": [
900,
900
]
}
},
]
}
```

A Datumaro format directory have the following structure:

<!--lint disable fenced-code-flag-->
```
dataset/
├── dataset_meta.json # a list of non-format labels (optional)
├── images/
│ ├── train/ # directory with training images
│ | ├── img001.png
│ | ├── img002.png
│ | └── ...
│ ├── val/ # directory with validation images
│ | ├── img001.png
│ | ├── img002.png
│ | └── ...
│ └── ...
└── annotations/
├── train.json # annotation file with training data
├── val.json # annotation file with validation data
└── ...
```
9 changes: 9 additions & 0 deletions docs/source/docs/data-formats/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Data Formats
###########

.. toctree::
:maxdepth: 1

supported_formats
media_formats
datumaro_format
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# Media formats
# Supported Media Formats

Datumaro supports the following media types:
- 2D RGB(A) images
- Videos
- KITTI Point Clouds

To create an unlabelled dataset from an arbitrary directory with images use
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Dataset Formats
# Supported Dataset Formats

List of supported formats:
- ADE20k (v2017) (import-only)
Expand Down
8 changes: 8 additions & 0 deletions docs/source/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ Docs
:caption: Guides

user-manual/index
data-formats/index

.. toctree::
:hidden:
:caption: Level Up

level-up/basic_skills/index
level-up/intermediate_skills/index

.. toctree::
:hidden:
Expand Down
17 changes: 17 additions & 0 deletions docs/source/docs/level-up/basic_skills/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Basic Skills
#################

.. toctree::
:maxdepth: 1

import
export
validate
visualize
filter
compare
transform
merge
split
search
generate
8 changes: 8 additions & 0 deletions docs/source/docs/level-up/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
User Manual
###########

.. toctree::
:maxdepth: 1

basic_skills/index
intermediate_skills/index
28 changes: 28 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/data_aggregation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Data Aggregation

Datumaro aims to refine data

``` bash
datum create -o <project/dir>
datum import -p <project/dir> -f image_dir <directory/path/>
```

or, if you work with Datumaro API:

- for using with a project:

```python
from datumaro.project import Project

project = Project.init()
project.import_source('source1', format='image_dir', url='directory/path/')
dataset = project.working_tree.make_dataset()
```

- for using as a dataset:

```python
from datumaro import Dataset

dataset = Dataset.import_from('directory/path/', 'image_dir')
```
28 changes: 28 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/data_comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Data Comparison

Datumaro aims to refine data

``` bash
datum create -o <project/dir>
datum import -p <project/dir> -f image_dir <directory/path/>
```

or, if you work with Datumaro API:

- for using with a project:

```python
from datumaro.project import Project

project = Project.init()
project.import_source('source1', format='image_dir', url='directory/path/')
dataset = project.working_tree.make_dataset()
```

- for using as a dataset:

```python
from datumaro import Dataset

dataset = Dataset.import_from('directory/path/', 'image_dir')
```
28 changes: 28 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/data_exploration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Data Explorartion

Datumaro aims to refine data

``` bash
datum create -o <project/dir>
datum import -p <project/dir> -f image_dir <directory/path/>
```

or, if you work with Datumaro API:

- for using with a project:

```python
from datumaro.project import Project

project = Project.init()
project.import_source('source1', format='image_dir', url='directory/path/')
dataset = project.working_tree.make_dataset()
```

- for using as a dataset:

```python
from datumaro import Dataset

dataset = Dataset.import_from('directory/path/', 'image_dir')
```
28 changes: 28 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/data_generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Data Generation

Datumaro aims to refine data

``` bash
datum create -o <project/dir>
datum import -p <project/dir> -f image_dir <directory/path/>
```

or, if you work with Datumaro API:

- for using with a project:

```python
from datumaro.project import Project

project = Project.init()
project.import_source('source1', format='image_dir', url='directory/path/')
dataset = project.working_tree.make_dataset()
```

- for using as a dataset:

```python
from datumaro import Dataset

dataset = Dataset.import_from('directory/path/', 'image_dir')
```
28 changes: 28 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/data_merge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Data Merge

Datumaro aims to refine data

``` bash
datum create -o <project/dir>
datum import -p <project/dir> -f image_dir <directory/path/>
```

or, if you work with Datumaro API:

- for using with a project:

```python
from datumaro.project import Project

project = Project.init()
project.import_source('source1', format='image_dir', url='directory/path/')
dataset = project.working_tree.make_dataset()
```

- for using as a dataset:

```python
from datumaro import Dataset

dataset = Dataset.import_from('directory/path/', 'image_dir')
```
28 changes: 28 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/data_refinement.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Data Refinement

Datumaro aims to refine data

``` bash
datum create -o <project/dir>
datum import -p <project/dir> -f image_dir <directory/path/>
```

or, if you work with Datumaro API:

- for using with a project:

```python
from datumaro.project import Project

project = Project.init()
project.import_source('source1', format='image_dir', url='directory/path/')
dataset = project.working_tree.make_dataset()
```

- for using as a dataset:

```python
from datumaro import Dataset

dataset = Dataset.import_from('directory/path/', 'image_dir')
```
12 changes: 12 additions & 0 deletions docs/source/docs/level-up/intermediate_skills/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Intermediate Skills
#################

.. toctree::
:maxdepth: 1

data_refinement
data_comparison
data_aggregation
data_merge
data_exploration
data_generation
Loading