Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce get_checksum_bytes method and improvements #671

Merged
merged 1 commit into from
Jul 20, 2022

Conversation

evanlinjin
Copy link
Member

@evanlinjin evanlinjin commented Jul 16, 2022

Description

get_checksum_bytes() returns a descriptor checksum as [u8; 8] instead of String, potentially improving performance and memory usage.

In addition to this, since descriptors only use characters that fit within a UTF-8 8-bit code unit (US-ASCII), there is no need to use the char type (which is 4 bytes). This can also potentially bring in some performance and memory-usage benefits.

Notes to the reviewers

This is useful because we will be using descriptor checksums for indexing operations in the near future (multi-descriptor wallets #486 ).

Refer to comments by @afilini :

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature
  • I've updated CHANGELOG.md

@notmandatory notmandatory added the new feature New feature or request label Jul 16, 2022
@@ -43,15 +41,17 @@ fn poly_mod(mut c: u64, val: u64) -> u64 {
c
}

/// Compute the checksum of a descriptor
pub fn get_checksum(desc: &str) -> Result<String, DescriptorError> {
Copy link
Member

@notmandatory notmandatory Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if anyone is using get_checksum() but since it's public our policy is to deprecate it for at least one release before completely removing it. We haven't been super strict about it, but we do need to be more consistent. I think you could make a simple function to call your new get_checksum_bytes and convert to a String.

Copy link
Member Author

@evanlinjin evanlinjin Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notmandatory I haven't removed it at all! And yes, get_checksum calls get_checksum_bytes 😅 (line 79)

Copy link
Contributor

@vladimirfomene vladimirfomene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK

Copy link
Member

@afilini afilini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, just a small nit

Comment on lines 19 to 20
const INPUT_CHARSET: &[u8] = "0123456789()[],'/*abcdefgh@:$%{}IJKLMNOPQRSTUVWXYZ&+-.;<=>?!^_|~ijklmnopqrstuvwxyzABCDEFGH`#\"\\ ".as_bytes();
const CHECKSUM_CHARSET: &[u8] = "qpzry9x8gf2tvdw0s3jn54khce6mua7l".as_bytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On both those lines I think you can use byte string literals: https://doc.rust-lang.org/reference/tokens.html#byte-string-literals

It should make the code more portable as it works on older rust versions as well (as_bytes() is a const fn only since 1.39) and it will also cause a compile error if we accidentally put a non-ascii character in there: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=81fadd81957dbc1c56844a279a9d239f

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afilini I just updated this! Hopefully the tests pass.

`get_checksum_bytes` returns a descriptor checksum as `[u8; 8]` instead
of `String`, potentially improving performance and memory usage.

In addition to this, since descriptors only use charaters that fit
within a UTF-8 8-bit code unit, there is no need to use the `char` type
(which is 4 bytes). This can also potentially bring in some performance
and memory-usage benefits.
@evanlinjin
Copy link
Member Author

@afilini @notmandatory This is ready to go imo. 😃

Copy link
Member

@afilini afilini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 6db5b4a

@afilini afilini merged commit 45a4ae5 into bitcoindevkit:master Jul 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants