Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All strings should simplify without "b" and additional comma #2423

Closed
cereallarceny opened this issue Aug 1, 2019 · 7 comments
Closed

All strings should simplify without "b" and additional comma #2423

cereallarceny opened this issue Aug 1, 2019 · 7 comments
Labels
Good first issue 🎓 Perfect for beginners, welcome to OpenMined!

Comments

@cereallarceny
Copy link
Member

cereallarceny commented Aug 1, 2019

Describe the bug

MsgPack or something is prepending all strings with a b character and ending them with an unnecessary ,. These have to be manually removed via a Regex function in syft.js and makes testing quite inconsistent.

To Reproduce

sy.serde._simplify('hello') produces: (5, (b'hello',))

Expected behavior

(5, ('hello'))

Screenshots

Screen Shot 2019-08-01 at 1 43 16 PM

Desktop

  • OS: MacOS
  • Version 10.14.6

Additional context

This is a "nice to have" functionality for myself and @mccorby when developing the Javascript and Kotlin ports of Serde.

@cereallarceny cereallarceny added the Good first issue 🎓 Perfect for beginners, welcome to OpenMined! label Aug 1, 2019
@cereallarceny
Copy link
Member Author

It's also worth noting that upon creating a plan, I've noticed other places where unnecessary commas are being added, here are a few examples:

(19, (23885703668, 30300883787, 85156589176, None, (13, (2,)), False))
(6, (53361601662,))
(6, ((19, (50671613206, 53361601662, 85156589176, None, None, True)),))
(1, (30300883787,))

@Ankit-Dhankhar
Copy link
Contributor

That additional b comes from

return (obj.encode("utf-8"),)
encoding string in "utf-8" encoding that can be simply solved by not encoding it. But that will lead to memory overhead as "utf-8" encoding is flexible in size, unlike fixed size encodings.

Another option is to decode the second element of tuple if the first element is 5 i.e. second element is a string.
Which solution is more preferable? I would like to contribute to this issue.

@Ankit-Dhankhar
Copy link
Contributor

Same line

return (obj.encode("utf-8"),)

is reposponsible for unnecessary commas along with
plan.is_built,
this line.
But when you remove the trailing comma you get explected output. But PySyft implementation is based on iterative parsing thus that lead to 327 failed test cases. I can change those implementation if required :)

@cereallarceny
Copy link
Member Author

Go for it @Ankit-Dhankhar! If we need to leave the b in there, that's acceptable to me, but the commas being inserted at the end of strings, tuples, and other objects in PySyft should probably be fixed. If you think you have a fix, submit a PR and tag this issue id! 😄

@Ankit-Dhankhar
Copy link
Contributor

Sure, I am on it :)

@vvmnnnkv
Copy link
Member

vvmnnnkv commented Aug 4, 2019

For string (code = 5), it's probably fine to simplify it into (5, b"string") instead of (5, (b"string", ))
by updating _simplify_str and _detail_str to return and expect string instead of tuple (several tests will need to be fixed too).
But for types like dict (0), list (1), set (3), tuple (6), you can't get rid of extra comma because it's how python displays tuple with a single element:
tuple(list[1])
(1,)

In python syntax it's not possible to make tuple with single element without that comma, because it's indistinguishable from simple parentheses:
(1)
1

(1,)
(1,)

If you need to generate correct python representation, extra comma needs to be in tuple with single element to make it tuple :) With 2 or more elements comma is not necessary of course.
E.g., list with one element:
(1, (30300883787,))
List with 2 elements:
(1, (30300883787, 123))

@cereallarceny
Copy link
Member Author

@vvmnnnkv You're actually correct here, I didn't realize it was standard practice. In this case - I think I'm going to close this ticket. Sorry @Ankit-Dhankhar - although, I could really use your help here: #2424

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good first issue 🎓 Perfect for beginners, welcome to OpenMined!
Projects
None yet
Development

No branches or pull requests

3 participants