Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] bits/bytes syntax #682

Open
gluax opened this issue Feb 19, 2021 · 20 comments
Open

[Proposal] bits/bytes syntax #682

gluax opened this issue Feb 19, 2021 · 20 comments
Assignees
Labels
documentation Improvements or additions to documentation feature A new feature. proposal A proposal for something new.

Comments

@gluax
Copy link
Contributor

gluax commented Feb 19, 2021

💥 Proposal bits/bytes syntax

Feature

Currently, many gadgets in Leo support a to_bit and to_bytes method on the rust side. The goal is to expose this side to Leo via some syntax. I would like to propose two different syntaxes so that we can decide between them or choose both.

Proposed Syntax 1

Coming from a rust background and something that is syntactically satisfying is the From implementations. So in Leo, we would have something as follows:

let x = 1u32;
let b1 = Bytes::from(x);
let b2 = Bytes::from(1u8);
let b3 = Bytes::from(some_method_that_returns_applicable_value());

I also believe this method would be easy to integrate into our stack. It's simply adding another statement, then at compile-time, we try actually converting the type to bits/bits with the methods provided it implements the trait. If it doesn't implement the method/trait, we can throw an error during the compiler.

Proposed Syntax 2

In contrast(or in conjunction), we could have the method approach. The goal syntactically would be to allow the call on any data type, the syntax would be as follows:

let x = 1u32;
let b1 = x.bytes();
let b2 = 1u8.bytes();
let b3 = some_method_that_returns_applicable_value().bytes();

Not sure if we currently allow method calls on data types themselves, but that shouldn't be too bad. Similarly to above, we check at compile time the type can be properly converted and in the compiler, we throw an error if the type is not convertible.

@gluax gluax added feature A new feature. Design labels Feb 19, 2021
@collinc97 collinc97 added ast documentation Improvements or additions to documentation labels Feb 19, 2021
@collinc97
Copy link
Collaborator

Thanks for writing this up @gluax! I have a few comments for discussion:

Syntax 1 requires adding a Bytes circuit to the core module. At the top of the file, developers would need to import the circuit before using the method. In addition, we would have to add support for function overloading: Bytes::from(u64), Bytes::from(field) etc. Each one of these overloaded functions would have to be loaded into the program tree as well - adding to compilation time.

Syntax 2 requires no additional imports - since every datatype where to_bits/to_bytes is supported will have the method built in. We do not have to add function overloading support there either - since multiple datatypes can have method names that are the same. Lastly, the cost of each function is only added to the program tree when it is called. One drawback of this approach is that it may require more syntax to be added to the grammar file - as opposed to syntax 1 which merely requires an additional statement rule.

For these reasons I gravitate towards Syntax 2 but would like to hear others thoughts.
@acoglio @howardwu

@collinc97 collinc97 changed the title [Proposal] to_bits/bytes syntax [Proposal] bits/bytes syntax Feb 22, 2021
@acoglio
Copy link
Collaborator

acoglio commented Feb 23, 2021

Which types are we going to use for bits, bytes, and their sequences? A natural choice for bytes is u8. The closest (isomorphic, in fact) type to bits we have now is bool, but I think we'll want bits to be numbers somehow. If we adopt proposal #627, then u1 would be a natural choice for bits. I guess we would use arrays for sequences (of bits or bytes).

I agree with @collinc97 that Syntax 2 may be preferable to Syntax 1. Among other things, overloading adds a bit of complication to type checking and inference (still doable of course, just a bit more complicated).

I was thinking whether we might use the type cast syntax (#600) for converting from/to bits and bytes, since we'll want to add it anyways for other types (as exemplified in that proposal):

let x = 1u32;
let b1 = x as [u8; 4];
let b2 = 1u8 as [u8; 1];
let b4 = something as [u8; N];

Note that we would have to specify the length of the array, but the length must be known at compile time anyhow. If we want to spare the user, we could introduce a type notation like [T; ?] for an array of an unstated size, with the intent that the size must be uniquely determined statically (i.e. at compile time), during the constant folding/propagation phase.

This type notation [T; ?] should be a separate proposal, in fact, which may be useful regardless of conversions to/from bits and bytes. Just like now one can omit a whole type, this is omitting part of a type. We could go a little crazier and allow [?; N], and, well, in fact, ? anywhere, as a sort of wildcard. For example, (u8, ?, [field; (?, 4)]). I can make an RFC just for this wildcard idea (just popped up as I was commenting this bits/bytes syntax, clearly), unless everybody dislikes it right away.

@collinc97
Copy link
Collaborator

The to_bits rust gadget returns a vector of booleans. The to_bytes rust gadget returns a vector of u8's.
The types that have rust gadgets implemented are ints, uints, fields, groups, addresses.

The first version of the bits and bytes methods should only call the underlying gadgets on those types.

Once that is finalized, we can extend syntax to aggregate types such as arrays.

@acoglio I like the idea of extending type casting to arrays of bits and bytes but I do think that should be a separate feature.

Similarly, the generic array length [T; ?] could also be a separate feature but I think is out of scope for this proposal.

@acoglio
Copy link
Collaborator

acoglio commented Feb 23, 2021

I'll make a separate RFC for type wildcards. (Not sure there's a strong use case, but it seems worth considering, maybe in the future.)

Understood about the underlying Rust gadgets. But what's not clear to me is what is the Leo type of b1, say. Presumably it consists of a sequence of 4 bytes, because it comes from a u32, but is it (u8, u8, u8, u8) or [u8; 4], or am I missing something?

@acoglio
Copy link
Collaborator

acoglio commented Feb 24, 2021

I was also wondering about endianness. Would the bits/bytes be big endian or little endian? Should both options be provided?

E.g.:

258:u32 --> [0 0 1 2] // big endian bytes
258:u32 --> [2 1 0 0] // little endian bytes

@collinc97
Copy link
Collaborator

I was also wondering about endianness. Would the bits/bytes be big endian or little endian? Should both options be provided?

E.g.:


258:u32 --> [0 0 1 2] // big endian bytes

258:u32 --> [2 1 0 0] // little endian bytes

This is a good point. The gadgets will return vectors in little endian form. Should we provide additional methods to specify target endianness?

@gluax
Copy link
Contributor Author

gluax commented Feb 24, 2021

I was also wondering about endianness. Would the bits/bytes be big endian or little endian? Should both options be provided?
E.g.:


258:u32 --> [0 0 1 2] // big endian bytes

258:u32 --> [2 1 0 0] // little endian bytes

This is a good point. The gadgets will return vectors in little endian form. Should we provide additional methods to specify target endianness?

I would say a good approach is to have method default one, i.e. .bytes -> little-endian bytes and we document that. Then we can have another method .big_endian_bytes(). At least that's my thoughts on the matter!

@acoglio
Copy link
Collaborator

acoglio commented Feb 24, 2021

My first instinct would be to have "symmetry", e.g. big_endian_bytes() and little_endian_bytes(), or just bebytes() and lebytes(), but it is more verbose, especially if there is a more common case, so I'm also happy with @gluax's proposal.

@acoglio
Copy link
Collaborator

acoglio commented Feb 25, 2021

By the way, having two possible conversions (big vs. little endian) reinforces the fact that this should not use a type cast syntax (against my proposal earlier in this message thread), because the latter implies just one conversion.

@acoglio
Copy link
Collaborator

acoglio commented Feb 25, 2021

@collinc97 @gluax I'm still unclear about what, say, the type of b1 would be (see my earlier message). (Sorry if I'm being thick.)

@gluax
Copy link
Contributor Author

gluax commented Feb 25, 2021

@acoglio On the rust side it returns a vec of u8 for the bytes gadget. So I believe on the Leo side it would an array of the appropriately sized array for the type of u8s. So for b1 I think the correct choice would be to have it typed as [u8; 4].

@collinc97
Copy link
Collaborator

@acoglio I think we should start a separate proposal for a type casting to bits/bytes syntax after #600 is implemented.

I also think that specifying endianness + reversing endianness should be a separate proposal and we can simply support little endian format as the default for now.

@collinc97
Copy link
Collaborator

Implementation

@gluax To finalize, this proposal can be closed by implementing:

  • .bits() / .bytes() syntax for unsigned integers u8, u16, u32, u64, u128
  • .bits() / .bytes() syntax for signed integers i8, i16, i32, i64, i128
  • .bits() / .bytes() syntax for fields
  • .bits() / .bytes() syntax for groups
  • .bits() / .bytes() syntax for addresses

@gluax
Copy link
Contributor Author

gluax commented Mar 2, 2021

Having chatted with both @Protryon and @collinc97 about this topic, as it currently stands there isn't a good way to implement these into Leo. The ideal way would be to have the Leo types as circuits or related to circuits, then have static/instance methods on that circuit that calls to some rust sided gadget implementation.

However, I'd love to hear if anyone else has different ideas, but I will make this a separate proposal.

@howardwu
Copy link
Member

howardwu commented Mar 5, 2021

Adding my thoughts on this thread, I think the .bits() and .bytes() proposal is better than the Bytes::from proposal.

However, from many first-hand encounters, I think the (slightly less attractive) convention of .bits_le(), .bits_be(), .bytes_le(), and .bytes_be() should be used to clearly delineate the endianness for the end user.

To add some evidence supporting this from other domains, we are adding this to our convention in snarkVM and Rust uses a similar delineation convention as well - to_le_bytes() and to_be_bytes()

@howardwu
Copy link
Member

howardwu commented Mar 5, 2021

I'll add an additional proposal:

We could stick with the .bits() and .bytes() convention, strictly (and explicitly) enforcing only 1 form of endianness (e.g. all calls will always return little-endian), and in addition, we introduce a .rev() or .reverse() method to reverse the endian representation. The methods can then be chained to form .bits().reverse() for example.

In my opinion, this approach could lead to unnecessary verbosity and hurt readability if the chaining sequence becomes too long.

@gluax
Copy link
Contributor Author

gluax commented Jul 7, 2021

Blocked by #975.

@gluax gluax added this to the Leo Developer Preview III milestone Jul 29, 2021
@gluax gluax removed the blocked Blocked. label Aug 26, 2021
@gluax
Copy link
Contributor Author

gluax commented Aug 26, 2021

Due to the essence of time, the two proposed ways of implementing this quickly are:

  1. Through typecasting.
  2. An operator that signifies to convert the type.

@acoglio
Copy link
Collaborator

acoglio commented Aug 26, 2021

Just to enhance the discussion, this would be a temporary solution for expediency. Just after DP3, we will revisit this.

I also cast my vote for 2, as it seems to provide a more natural transition towards the use of scalar type methods. Type cast would have to pick either little or big endian, and do so implicitly. So my preference is to be explicit and symmetric.

@collinc97
Copy link
Collaborator

Will revisit after testnet3 refactor

@d0cd d0cd added proposal A proposal for something new. and removed module-ast labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature A new feature. proposal A proposal for something new.
Projects
None yet
Development

No branches or pull requests

5 participants