Fix serde inconsistencies#2812
Fix serde inconsistencies#2812iamtrask merged 6 commits intoOpenMined:masterfrom vvmnnnkv:fix-serde-inconsistencies
Conversation
…stencies # Conflicts: # syft/serde/torch_serde.py
| # We operate on simplified content of Operation, hence all values should be simplified | ||
| from_workers_simplified = None | ||
| to_workers_simplified = None | ||
| if from_worker and to_worker: |
There was a problem hiding this comment.
Q: Will we always have from_worker and to_worker. Is there any case only one is != None?
There was a problem hiding this comment.
I think replacement doesn't make sense when any of them are None.
This was existing code, I just moved it out of the loop to avoid making simplification each time.
|
|
||
| from_ids_simplified = None | ||
| to_ids_simplified = None | ||
| if len(from_ids) and len(to_ids): |
There was a problem hiding this comment.
Q: The same case as above for here
| ) -> torch.jit.ScriptModule: | ||
| """"Strategy to deserialize a binary input using Torch load""" | ||
| script_module_stream = io.BytesIO(script_module_bin) | ||
| script_module_stream = io.BytesIO(script_module_bin[0]) |
There was a problem hiding this comment.
Q: We should get here with a len(script_module_bin) of size 1, right? I am thinking if we should add an assert of not.
There was a problem hiding this comment.
Yes. I haven't seen asserts in similar cases, e.g. in _detail_str, _detail_range, _detail_ndarray, etc.
Do you think they should be added?
There was a problem hiding this comment.
I think it is ok how it is now.
test/test_serde_full.py
Outdated
| pt = syft.Promise.FloatTensor(shape=torch.Size((3, 4))) | ||
| pt.tags = ["tag1"] | ||
| pt.tag("tag1") | ||
| pt.description = "I promise" |
There was a problem hiding this comment.
Q: Shouldn't we do the same for description? Maybe pt.describe("I promise").
Also, when is ok to use tag vs tags?
Should tags be used for TorchTensors (since we know we are working with tensors) and tag for AbstractObject (and we do not know the underlying object?)?
There was a problem hiding this comment.
Thanks! I've updated to .describe() everywhere.
I think I shouldn't have used .tags and .description properties directly at all. Just didn't know about them 🤷♂
LaRiffle
left a comment
There was a problem hiding this comment.
That's a really good and precise work!
|
|
||
| return (tensor.id, tensor.field, tensor.crypto_provider.id, chain) | ||
| return ( | ||
| tensor.id, |
There was a problem hiding this comment.
You add a simplifier for the object_pointer id, but not here
Any reason for this?
|
|
||
| return (ptr.id, ptr.id_at_location, ptr.location.id, ptr.garbage_collect_data) | ||
| return ( | ||
| ptr.id, |
There was a problem hiding this comment.
same remark here. Actually I'm not sure ids should be simplified as they are integers and this only slow down the process
There was a problem hiding this comment.
Also confused by this.
In some places (e.g. PointerPlan, PointerProtocol, MultiPointerTensor) id is defined as Union[str, int] in constructor, in some places (e.g. FixedPrecisionTensor, LoggingTensor) it's mentioned in doc comment that it's str or int. But in parent classes AbstractTensor, AbstractObject, and sy.ID_PROVIDER.pop() that generates those id's, it seems to be just int.
I can make serde to consistently treat ID as int or int/str everywhere, but there might be a reason why it's different for different classes?
There was a problem hiding this comment.
Just chiming in to say that I have encountered the dual nature of ids as both integers and strings, and I am also confused by this. Would be easier to handle if it were one or the other, but if it needs to be both, would be great to make that consistent (someday and probably not in this PR.)
There was a problem hiding this comment.
OK, I've made serde follow constructor (or parent constructor) arg type definition or documentation:
torch.nn.Parameter - int
torch.Tensor - int
AdditiveSharingTensor - int or str
FixedPrecisionTensor - int or str
CRTPrecisionTensor - int
LoggingTensor - int or str
MultiPointerTensor - int or str
Plan - int or str
Protocol - int *
PointerTensor - int or str
PointerPlan - int or str
PointerProtocol - int or str
ObjectWrapper - int
ObjectPointer - int or str
TrainConfig - int or str
AutogradTensor - int *
PrivateTensor - int or str
PromiseTensor - int or str
*serde supports int or str, didn't change that
| if isinstance(description, bytes): | ||
| description = description.decode("utf-8") | ||
| tensor.description = description | ||
| tensor.tags = syft.serde._detail(worker, tags) |
This PR follows discussion in issue #2654 and tries to make serde simplification consistent across all types:
(CODE, (DATA,))that you can decode ascode, data = simplified_tupleThere's a case when worker_id in Procedure's operations needs to be updated from int to str type. Procedure had to be fixed for that, given that str is now simplified.