Portable hashes #81
diogofriggo
started this conversation in
ARCs
Replies: 1 comment
-
A primitive implementation for this proposal. Branch: ARC #81 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Abstract
SnarkVM's (and thus Aleo's) hashing implementations do not match off-chain implementations of the same algorithms.
Our practical interest lies mostly on having keccak256 working well but we are aware that it might be desirable to apply the same changes to all hashing algorithms for consistency.
If possible we would rather postpone the latter to a follow-up ARC. The proposal has been named "portable hashes" because the hash needs to be portable/understandable by Aleo and every other chain.
Motivation
This is a requirement for verifying data produced outside of Aleo. It is useful when communication between Aleo and other blockchains is needed.
While snarkVM's keccak hashing is implemented in-house the mismatch with external implementations (like the Rust crate tiny keccak's) seems to stem from the two points below.
Current hashing implementations
Looking at how the hashes are implemented sheds some light on possible issues:
For keccak256 we've verified that the
hash_keccak256
function matches external implementations. We suspect the same holds for the other hash types.Hashes start to differ in the subsequent wrapping call to
hash_to_group_bhp###
which modifies the hash making it incompatible with any other standard implementation of the hashing scheme just applied.Furthermore, a second hash-modification step happens when the hash is converted to one of Leo's primitive types. Using
keccak256
as an example, it needs 256 bits to be represented, however a necessarily lossy cast can easily be done to an aleo primitive type like u8 or i128 which we fail to understand the purpose of. Of particular importance is the fact that there's not a single primitive type that can fully hold the 256 bits (this can however be part of another ARC). This naturally introduces yet another layer of irreconcilable differences with respect to external implementations.It would be very convenient to have a type that can hold the full contents of each hash. For
keccak256
, the only such type would be an array[u8; 32]
. Forkeccak512
on the other hand, it's unclear how it could be represented in a lossless and idiomatic fashion given the 32-element limitation of arrays. One possible approach would be to have a dedicated type for hashes like for addresses. For the practical purpose of communicating with EVM chains,keccak256
would suffice. Should a larger design effort be required to support all hash types, then this ARC could be focused onkeccak256
as a first milestone.Data representation
The call to
to_bits_le
shown in the above code snippet introduces another modification to the input data before any hashing is done.As an example calling
to_bits_le(49u8)
produces00 10010000 0001000000000000 10001100
instead of just0011 0001
or1000 1100
.Hashing input:
The current implementation is hashing the internal representation of data in snarkVM.
a. The code snippet requires a group value returned from the hash, which in our case is not what we need.
b. When the function
to_bits_le
is called, it translates the snarkVM representation of the input to bits as the input with its metadata.For example, let's take the execution of this Aleo instruction
hash.keccak256 49u8 into r0 as field;
When the
to_bits_le
is called on the input49u8
the result is:[0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0]
and not0b0011 0001
or0b1000 1100
. The above result has metadata that describes the49u8
as this:The output of
to_bits_le(49u8)
carries metadata along with the actual result, the metadata and data represented by00 10010000 0001000000000000 10001100
can be broken down as:00
10010000
0001000000000000
10001100
Suggested improvements
This proposal seeks to understand whether both the above limitations can be lifted, and if so, propose implementations for them, namely:
[u8; ##]
so that full hash data can be kept untrimmed (or some other idiomatic alternative that preserves the full hash)hash_to_group_bhp###
step at least when the target type is[u8; #]
The overarching goal of this ARC is compatibility with EVM chains (and possibly others) with the goal of interchain communication.
To achieve that goal it must be possible to hash data in exactly the same way in both chains.
Dependencies
This impacts the snarkVM, & Leo repositories. Impact to the snarkOS repository is not expected.
Backwards Compatibility
Casting to a byte array fundamentally does not introduce backward-compatibility issues whereas it does for existing casts. The latter is a breaking change and would require a mechanism to deal with it, this ARC's goal would be limited to the [u8; ##] type or an equivalent alternative
Security & Compliance
There should not be any security or compliance considerations.
References
snarkVM's hashing usage
snarkVM's hashing implementation
Tiny keccak Rust crate
Beta Was this translation helpful? Give feedback.
All reactions