Standarizing V cryptographic hash API #20894

blackshirt · 2024-02-23T13:51:42Z

blackshirt
Feb 23, 2024

Motivation

As we all know, v has already supports several cryptographic hash modules, such as sha1, sha256, sha512 from the SHA family or from blake family likes blake2b, blake2s or even blake3 modules. However, unfortunately, there are still variations in implementation, and there is still no standard interface that can be used as a reference for this. Golang has a crypto.Hash interface and some other languages I think have it too, perhaps in different forms. This efforts is a step further to standarize v cryptographic hash API.

The discussion about standardizing the hash API has actually happens before this thread, its on the discord channel, see 1, 2 or 3 for more background.
With this in mind and suggestion from @joe-conigliaro, so i pasted here to get more broader scopes, more feedbacks, and more ideas from the V developer, maintainer or all V community.

Ideas

The basic idea is to add more formal API in the sense of standarized interface for V cryptographic hash API. The idea mostly inspired by the go version crypto.Hash, so, i'm gone to share my thought on this so we can get lots of input and feedback from the community.

The first one, let's begin with interface name

For interface name, if we're looking into already defined structs, enum, interface or other object that have been previously defined, the first choice where it similar the go one is Hash name, but its already used as crypto.Hash enum.

The second choices is Digest, but its (would) conflict with already defined `Digest struct in scattered hash modules.

The third option is Hasher or Digester, its it's closer to the topic and standarized way we're going to create, and its also follows some existing pattern for interface names where its postixed by -er, think of io.Reader, io.Writer, io.Seeker and so on.

The second thing, where should these interfaces be defined
I think, the best places for this is defined in and lived at vlib/crypto/crypto.v file, side by side with the enum Hash definition but it can be live in separate module, like vlib/crypto/digest or other alternatives. Please, give an idea
The third things, are fields or methods for interface.
The standard interface, I think, should at least contains the following methods:

Identity method, calls it .id() or other name
Its acts as the identity of these interfaces. I don't have an opinion on what the return value of id() should be, but I think it should be an unique value, for example is enumerable value that has been defined, likes the crypto.Hash enum, but string or int is also possible.
Apart from functioning as an interface identity, id() can also act as a comparability identity with other interfaces and can be functioned as a self-check mechanism with other fields/methods from this interface which will be explained later.
A method thats return the size of output of the hash/digest bytes produced by the hash algorithm, calls it .size()
This method dictates the size of the output (in bytes) of the digest produced by this hash algorithm. The others language has this similar one, likes the python hashlib, has a hash.digest_size, java has MessageDigest.getDigestLength(). Fortunately, standard sha Digest has already implemented size() part.
A method thats return the size of the block of underlying hash algorithm used, maybe calls it block_size() method
This method has been implemented in several existing modules, for example sha256.Digest.block_size() and of course sha512 too. This method specifies the block size of the digest it operates on. But I don't know whether .block_size() has any meaning and makes sense for those who don't operate on a block basis, think of it like a stream cipher, even though it also uses a block-based strategy internally. Python has hash.block_size in their hashlib, and golang hash crypto.Hash.block_size() also.
A method for updating internal state of the hash with the bytes of data.
The general pattern for this is .write(b []u8) !int even the .update(b []u8) pattern is also popular among others.
The general semantics of write follow those already defined in several modules, including the basic io.Writer interface, This method is used to update the internal state of the digest.
A similar method is generally availables in other languages, in python there is hash.update(data) from the hashlib modul.
In java there is MessageDigest.update(bytes[]) with several variant.
In my opinion, write is the best choice for this purpose
A method for obtaining or calculating a digest from this hash algorithm,
There are many pattern for this purpose. At the standard v library, especially the sha family hash, there are already some Digest.checksum and Digest.sum(b []) and in the blake family only implements .checksum, whereas .sum is not available.
In my thought, .sum(b []u8) is little more confusing because its contains two semantics, the first semantic is checksum semantic by calling sum with nill bytes, ie, .sum([]) and the second is sum semantic, the sum of bytes message plus current digest, ie, b << current_digest

I think for this purpose, we could follow the python hashlib one with hash.digest() ..its exactly produces digest bytes with no other misinterpretation. the .digest() name also represents what it does. But, as I said before, please give your ideas.

A method for resetting digest state to default state, calls it .reset()
I think the name describes what it is exactly does., but explicitly defines it is more good thing.

Resumed idea

From the several items described above, let's summarize what we have written before,
In minimal form, the hash interface look like this one

interface Hasher {
       // id returns identify of this hash algorithm
        id() crypto.Hash
       // size returns the size of the hash (in bytes) produced by this hash algortihm
       size() int
       // block_size returns the size of the block of underlying hash algorithm
       block_size() int
mut: 
       // write updates internal state of this hash algorithm. Its fullfills `io.Writer` interface.
       write(b []u8) !int 
      // digest produces and returns digest with `.size()` bytes length
      digest() []u8 // or maybe digest(mut out)
      // reset turns back internal state to default state
      reset()

That's iit ..

Give your idea

Please, give your idea, feedback and maybe just comments
All of hem are very appreciated.
Thanks ..

JalonSolov · 2024-02-23T14:14:30Z

JalonSolov
Feb 23, 2024
Collaborator

First random thoughts...

Rename crypto.Hash to crypto.HashType, or crypto.Hashes, since that's what it is enumerating. With a singular name of Hash, I expect it to be a single, concrete thing, not a list of things.

The interface could be named HashApi or HashBase, since it's the common base for all the hash types. Hasher implies that's what it does, not what it is for. Hasher would be more appropriate as an alternative to the current sum() routine (though I don't suggest that change).

Speaking of sum()... it would make more sense for it to be named hash(). It doesn't create a sum of anything, it creates the cryptographic hash for the data it's given. digest is what the hash() function returns - hash is the action, digest is the result.

0 replies

blackshirt · 2024-02-23T19:47:46Z

blackshirt
Feb 23, 2024
Author

First random thoughts...

Rename crypto.Hash to crypto.HashType, or crypto.Hashes, since that's what it is enumerating. With a singular name of Hash, I expect it to be a single, concrete thing, not a list of things.

Renaming crypto.Hash to another one is a good alternative, even maybe it would break something else that using crypto.Hash enum
I also think of enum HashType is more common term in enumerable world, its good choices.

The interface could be named HashApi or HashBase, since it's the common base for all the hash types. Hasher implies that's what it does, not what it is for. Hasher would be more appropriate as an alternative to the current sum() routine (though I don't suggest that change).

Its doesn't really matter, the main reason for Hasher is to follow common term from standard io module, likes already defined io.Writer, io.Reader ,io.RandomWriter ..etc.

Speaking of sum()... it would make more sense for it to be named hash(). It doesn't create a sum of anything, it creates the cryptographic hash for the data it's given. digest is what the hash() function returns - hash is the action, digest is the result.

A good option, this is list common term for that in others language

python hashlib has update and digest
Java bouncycastle has update and digest with some variants
rust has update and finalize
dart has one-shot in the form of hash

2 replies

JalonSolov Feb 23, 2024
Collaborator

Rename Hash -> HashType shouldn't be a big problem. All uses in vlib can be changed at the same time. I only know of one other repo that might be using it, and it was mentioned recently on Discord.

Hasher as a word doesn't make sense, even if it follows the way other names have been done. Possibly just from not seeing it... well... anywhere else.

As for digest vs hash as the name of the function - there's no reason we can't correct the mistakes of those other languages. :-) As I said, hash is the action, digest is the result. Making the name of the function the same as the result from the function is... awkward. Functions are usually named for actions they perform. Otherwise we would have io.WhatWasRead instead of io.Reader.

blackshirt Feb 23, 2024
Author

If crypto.Hash has been renamed, i think we can use this name for the new interface. It also align with the context. If it happen, Hash.hash() is a good as is

blackshirt · 2024-02-25T06:43:02Z

blackshirt
Feb 25, 2024
Author

Wait more feedbacks from all v communities, @spytheman, @hungrybluedev, @enghitalo, @danilolekovic, @medvednikov, @joe-conigliaro

2 replies

hungrybluedev Feb 25, 2024
Collaborator

I like the interface definition suggested.

When it is added in a pull request, the rest of the implementations should also be modified to line up with this new one. Ideally, there should also be unit tests that specify the interface as the type and ensure all necessary methods work correctly across the interface implementations.

blackshirt Feb 26, 2024
Author

Thank for the response. I think the change is not invasive too much ..stock implementation in standard vlib already have some of them.

blackshirt · 2024-02-29T06:37:12Z

blackshirt
Feb 29, 2024
Author

If we can take temporary conclusions, there are at least a few changes that can be proposed
1). rename crypto.Hash to crypto.HashType
2. propose a new interface

interface Hash {
       // id returns the identification of this hash algorithm
        id() crypto.HashType
       // size returns the size of the hash (in bytes) produced by this hash algorithm
       size() int
       // block_size returns the block size of the underlying hash algorithm
       block_size() int
mut:
       // write updates to the internal state of this hash algorithm. This satisfies the `io.Writer` interface.
       write(b []u8) !int
      // hash generates and returns a digest message with the byte length `.size()`
      hash() ![]u8 // or maybe hash(mut out)!
      // reset returns the internal state to the default state
      reset()

3). Adapts the existing hash module to meet the new interface.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standarizing V cryptographic hash API #20894

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Standarizing V cryptographic hash API #20894

blackshirt Feb 23, 2024

Motivation

Ideas

Resumed idea

Give your idea

Replies: 4 comments · 4 replies

JalonSolov Feb 23, 2024 Collaborator

blackshirt Feb 23, 2024 Author

JalonSolov Feb 23, 2024 Collaborator

blackshirt Feb 23, 2024 Author

blackshirt Feb 25, 2024 Author

hungrybluedev Feb 25, 2024 Collaborator

blackshirt Feb 26, 2024 Author

blackshirt Feb 29, 2024 Author

blackshirt
Feb 23, 2024

Replies: 4 comments 4 replies

JalonSolov
Feb 23, 2024
Collaborator

blackshirt
Feb 23, 2024
Author

JalonSolov Feb 23, 2024
Collaborator

blackshirt Feb 23, 2024
Author

blackshirt
Feb 25, 2024
Author

hungrybluedev Feb 25, 2024
Collaborator

blackshirt Feb 26, 2024
Author

blackshirt
Feb 29, 2024
Author