Conversation
6d08cd7 to
e9290f5
Compare
zhiltsov-max
left a comment
There was a problem hiding this comment.
Thank you for a great PR!
I'd suggest fixing code formatting (spaces in arguments) and string formatting to keep the project code in a single style.
datumaro/plugins/splitter.py
Outdated
| def __init__(self, dataset, seed): | ||
| super().__init__(dataset) | ||
|
|
||
| np.random.seed(seed) |
There was a problem hiding this comment.
I think, this should be done closer to the actual use, because a lot can happen after constructor is called.
There was a problem hiding this comment.
Random shuffling occurs several times.
Do you think I really need to set seed whenever I use the shuffling function?
There was a problem hiding this comment.
Just make sure the results are reproducible. I suggest moving seed setting as much close to the point of use, as possible.
There was a problem hiding this comment.
Then, I'll add test codes to check the results are reproducible.
|
I need help regarding complexity. $ python -m mccabe datumaro/plugins/splitter.py --min 10
188:4: 'SplitforMatchingReID.__init__' 16
307:4: 'SplitforMatchingReID._rebalancing' 12
370:4: 'SplitforDetection.__init__' 19But Codacy gives 35 for SplitforMatchingReID.init and 20 for SplitforDetection.init |
|
Complexity from these tools is not something we look at. Don't bother with this. |
2b673b6 to
cbe6dfe
Compare
| for idx, item in enumerate(self._extractor): | ||
| for subset, split in by_groups.items(): | ||
| if idx in split: | ||
| group_id = self._group_map[subset] | ||
| item.annotations[0].group = group_id | ||
| break |
There was a problem hiding this comment.
Here, I use 'group' property to split 'test' subset into 'query' and 'gallery' set.
It seems that the default subset doesn't support a hierarchy.
In the original implementation, we add "Auxiliary" attribute to the annotation.
Do you think I have to change the current implementation w.r.t the original one?
Or do I have to implement the subset hierarchy?
Please share your opinion.
There was a problem hiding this comment.
Yes, there in no exact support for split hierarchy yet. To do what you want (as a workaround?), I'd create 2 extra subsets - test_gallery and test_query, this would look quite natural (at least, in Datumaro). Technically, subsets are extractors themselves, so it is possible to extend the implementation with some nesting. But it would require lots of changes in the importing/exporting code.
There was a problem hiding this comment.
Actually, the gallery and query sets are not subsets. They're a kind of tag.
I need more time to resolve this, so let's leave this as it is at the moment.
I'll resolve this issue while applying the command line feature.
489e5eb to
b712db3
Compare
b712db3 to
7c74aa7
Compare
|
I've rebased and squashed some commits. |
…er for classification
Summary
Added task-specific annotation splitters.
How to test
Checklist
developbranchLicense
Feel free to contact the maintainers if that's a concern.