Skip to content

Fix serde inconsistencies#2812

Merged
iamtrask merged 6 commits intoOpenMined:masterfrom
vvmnnnkv:fix-serde-inconsistencies
Dec 16, 2019
Merged

Fix serde inconsistencies#2812
iamtrask merged 6 commits intoOpenMined:masterfrom
vvmnnnkv:fix-serde-inconsistencies

Conversation

@vvmnnnkv
Copy link
Member

This PR follows discussion in issue #2654 and tries to make serde simplification consistent across all types:

  • Fix missing simplifications, e.g. all str should be simplified/detailed consistently by _simplify_str/_detail_str
  • Use same simplified data structure for all types: (CODE, (DATA,)) that you can decode as code, data = simplified_tuple
  • Simplify list that wraps operations in Procedure to avoid "magic parens"

There's a case when worker_id in Procedure's operations needs to be updated from int to str type. Procedure had to be fixed for that, given that str is now simplified.

Vova Manannikov added 2 commits December 11, 2019 16:15
# We operate on simplified content of Operation, hence all values should be simplified
from_workers_simplified = None
to_workers_simplified = None
if from_worker and to_worker:
Copy link
Member

@gmuraru gmuraru Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Will we always have from_worker and to_worker. Is there any case only one is != None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacement doesn't make sense when any of them are None.
This was existing code, I just moved it out of the loop to avoid making simplification each time.


from_ids_simplified = None
to_ids_simplified = None
if len(from_ids) and len(to_ids):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: The same case as above for here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above :)

) -> torch.jit.ScriptModule:
""""Strategy to deserialize a binary input using Torch load"""
script_module_stream = io.BytesIO(script_module_bin)
script_module_stream = io.BytesIO(script_module_bin[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: We should get here with a len(script_module_bin) of size 1, right? I am thinking if we should add an assert of not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I haven't seen asserts in similar cases, e.g. in _detail_str, _detail_range, _detail_ndarray, etc.
Do you think they should be added?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok how it is now.

pt = syft.Promise.FloatTensor(shape=torch.Size((3, 4)))
pt.tags = ["tag1"]
pt.tag("tag1")
pt.description = "I promise"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Shouldn't we do the same for description? Maybe pt.describe("I promise").

Also, when is ok to use tag vs tags?
Should tags be used for TorchTensors (since we know we are working with tensors) and tag for AbstractObject (and we do not know the underlying object?)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've updated to .describe() everywhere.
I think I shouldn't have used .tags and .description properties directly at all. Just didn't know about them 🤷‍♂

@vvmnnnkv vvmnnnkv requested a review from gmuraru December 11, 2019 21:34
Copy link
Contributor

@LaRiffle LaRiffle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really good and precise work!


return (tensor.id, tensor.field, tensor.crypto_provider.id, chain)
return (
tensor.id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You add a simplifier for the object_pointer id, but not here
Any reason for this?


return (ptr.id, ptr.id_at_location, ptr.location.id, ptr.garbage_collect_data)
return (
ptr.id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remark here. Actually I'm not sure ids should be simplified as they are integers and this only slow down the process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also confused by this.
In some places (e.g. PointerPlan, PointerProtocol, MultiPointerTensor) id is defined as Union[str, int] in constructor, in some places (e.g. FixedPrecisionTensor, LoggingTensor) it's mentioned in doc comment that it's str or int. But in parent classes AbstractTensor, AbstractObject, and sy.ID_PROVIDER.pop() that generates those id's, it seems to be just int.

I can make serde to consistently treat ID as int or int/str everywhere, but there might be a reason why it's different for different classes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in to say that I have encountered the dual nature of ids as both integers and strings, and I am also confused by this. Would be easier to handle if it were one or the other, but if it needs to be both, would be great to make that consistent (someday and probably not in this PR.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've made serde follow constructor (or parent constructor) arg type definition or documentation:
torch.nn.Parameter - int
torch.Tensor - int
AdditiveSharingTensor - int or str
FixedPrecisionTensor - int or str
CRTPrecisionTensor - int
LoggingTensor - int or str
MultiPointerTensor - int or str
Plan - int or str
Protocol - int *
PointerTensor - int or str
PointerPlan - int or str
PointerProtocol - int or str
ObjectWrapper - int
ObjectPointer - int or str
TrainConfig - int or str
AutogradTensor - int *
PrivateTensor - int or str
PromiseTensor - int or str

*serde supports int or str, didn't change that

if isinstance(description, bytes):
description = description.decode("utf-8")
tensor.description = description
tensor.tags = syft.serde._detail(worker, tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vvmnnnkv vvmnnnkv requested a review from LaRiffle December 13, 2019 23:31
@cereallarceny cereallarceny mentioned this pull request Dec 16, 2019
@iamtrask iamtrask merged commit 5b9161e into OpenMined:master Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants