Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible optimization for secp256k1 #1582

Closed
sipa opened this issue Apr 1, 2023 · 10 comments
Closed

Possible optimization for secp256k1 #1582

sipa opened this issue Apr 1, 2023 · 10 comments

Comments

@sipa
Copy link

sipa commented Apr 1, 2023

The current C code for the 5x52 field implementation in libsecp256k1 has (at least) this optimization which seems not present in the fiat-crypto output (it's also absent in libsecp256k1's asm code, which probably explains how it was missed): bitcoin-core/secp256k1#810

I'm really happy with how close the fiat-crypto generated code seems to be getting, and we're seriously contemplating incorporating it in the library, but it'd be an easier sell if it has at least all the same high-level optimizations.

@dderjoel
Copy link
Contributor

Hi,
just wanted to gauge how complex this would be to implement in fiat-crypto.

As far as I can see from the changes in C-code, it replaces an

c & M * R  // producing AND and MULX
c>>52        // producing SHR 
...
c * R          // 128*64 -> 2 MULX + ADDs (or maybe 2 ADDs)

by

R*c                  // MULX
c>>64              // free
(R<<12) * c     // SHL and MULX

This happens twice in each function.

I think this is genius!

I've dug a little through the fiat-code and only found this Dettman-related file and in it I could only identify lines 82 -- 99, which seem like they specify the multiplication, but I am too unfamiliar with what is actually happening.
I'm more than happy to help with this wherever I can, but I'd need some help from the experts @OwenConoly, @JasonGross, @andres-erbsen .

@OwenConoly
Copy link
Collaborator

it's also absent in libsecp256k1's asm code, which probably explains how it was missed

Yeah, I referenced the asm code when I wrote fiat-crypto's template, so that would explain it.

My thought is that this change would not be very difficult to implement in fiat-crypto. It looks just like the same sort of operations that we already have. I haven't yet had time to look at this in detail, though.

@dderjoel
Copy link
Contributor

Hi @OwenConoly , did you have time already?
Would it help, if I would rewrite the assembly incorporating the optimization?

@OwenConoly
Copy link
Collaborator

Hi! Sorry for the delay, this part of the semester has been a bit chaotic for me. I've actually generated the new code already - you can see it on the dettman_avoid_wide_mul branch of fiat-crypto, in the file secp256k1_dettman_64.c.

Disclaimer: this C code is not yet formally verified, and I haven't yet tested it either. I have to write a few proofs still and refactor some things, which is why I haven't created a pull request for this yet.

@OwenConoly
Copy link
Collaborator

@sipa, @dderjoel - do these changes look like what you would expect to see?

@dderjoel
Copy link
Contributor

dderjoel commented May 1, 2023

@sipa, @dderjoel - do these changes look like what you would expect to see?

looks like it to me. If you could generate the JSON, I could start some optimization runs and we could see if that gives us more performance :)

@OwenConoly
Copy link
Collaborator

Here's the JSON:
https://github.com/mit-plv/fiat-crypto/blob/69a2d4bc867f396b9002d4d57c7c87d169bfd2dd/fiat-json/src/secp256k1_dettman_64.json

I've finished proofs of the mul and square operations. The only remaining obstacle to merging is that my code is a bit of a mess.

@dderjoel
Copy link
Contributor

dderjoel commented May 2, 2023

Thank you very much, Owen, for engaging here.

I was able to compile from this commit there, and generate and prove some code locally. I'll run on our machines now and report once I've got benchmarks.

@dderjoel
Copy link
Contributor

dderjoel commented May 3, 2023

The numbers are looking very good! Have a look at bitcoin-core/secp256k1#1261 if you like. The gist is, that Fiat-C is 2.3% faster (on scmuls) than the existing C and with CryptOpt we can be 6.2% faster.

@OwenConoly
Copy link
Collaborator

Resolved by #1601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants