Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment: backport ff_derive x86_64 assembly from fff crate #58

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

str4d
Copy link
Member

@str4d str4d commented May 31, 2021

This is an experimental backport of the changes from https://github.com/filecoin-project/ff (published as the fff crate). I've opened this as a draft PR for discussion. I see two main additions:

  • ASM for modular multiplication. This is specific to BLS12-381, and is probably unlikely to be beneficial here. When the fff fork was created, the main BLS12-381 implementation was in pairing::bls12_381, and it used ff_derive for its field logic. That implementation has been removed from pairing, and replaced with the bls12_381 crate which implements field logic directly. So we might want to consider potentially adding assembly optimisations there.

  • ASM to use _addcarry_u64 and _subborrow_u64 explicitly on x86_64. I'm surprised that Rust doesn't figure this out itself; it seems like we should be able to rework ff_derive to encourage LLVM to lower to this directly.

@tarcieri
Copy link
Contributor

ASM to use _addcarry_u64 and _subborrow_u64 explicitly on x86_64. I'm surprised that Rust doesn't figure this out itself; it seems like we should be able to rework ff_derive to encourage LLVM to lower to this directly.

This was definitely working at one point. Did something change?

@str4d
Copy link
Member Author

str4d commented Jun 1, 2021

IDK. @dignifiedquire implemented that in January 2020 so it's "relatively" recent; they might have more information about what they were seeing that motivated the change.

@tarcieri
Copy link
Contributor

tarcieri commented Jun 1, 2021

Here's a Godbolt example which shows adc lowering: https://godbolt.org/z/rq7xTsvx6

Taken from the crypto-bigint crate just to cross-check what it's doing, although the implementation is ultimately derived from and quite similar to what's in this crate (but amended to be const-friendly)

@dignifiedquire
Copy link
Contributor

For add I don't remember the exact assembly difference, but back then the intrinsic implementation ended up being faster than the non intrinsic version, at least all the ones we tried.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants