-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bignum syscalls #17393
Add bignum syscalls #17393
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17393 +/- ##
=========================================
- Coverage 82.3% 82.1% -0.2%
=========================================
Files 433 434 +1
Lines 120912 122197 +1285
=========================================
+ Hits 99586 100420 +834
- Misses 21326 21777 +451 |
e203cf3
to
504b55c
Compare
@jackcmay In addition to your insight why this fails in @arthurgreef and I recently changed the execution code of However; this results in a GAS unit cost. My question is, what is the unit ratio? Does 1 EU = 1 GAS ( 1:1 )? Or....? Thanks |
@FrankC01 I left review comments in one of the syscall implementations that I think will apply to the rest, once we resolve those and the changes are applied to the rest I can review. Also, are you expecting any failure in the bn calcs to abort the program? |
Yes |
Is the JIT/BPF version fast enough though? I don't see any discussion about how much slower the the BPF version would be. It seems like we jumped right into "make a syscall because it'll be fastest". The C version could be compiled into the BPF program as well, so I think we'd just be dealing with the memory translation overhead of the JIT: something that if we can optimize further will benefit all programs. |
I definitely see that point of view, it's can slippery slope to continue adding syscalls for computation-intensive operations. We've been adding syscalls for some of the general and commonly used but expensive operations (hashing, memory movement, etc...) and bignum and ecrecover I think fall into that category. Especially in terms of pure performance of the chain vs charging more to go slower. |
Ya, ecrecover I'm not going to push back on because EVM. Here though I don't see the full trade-off yet. The part that's missing for me is the X in "Hi, we tried to do X on Solana and failed because we don't have bignum syscalls" |
I think @FrankC01 originally tried to use the OpenSSL BigNum library in-program and it exceeded the instruction count. Not sure if there were execution time measurements done. @FrankC01 have you measured the time it took to do it in-program (given an expanded compute budget) vs doing it in as a syscall? |
@mvines @jackcmay - We originally started with our program, grant application attached, to achieve constant time/space NFT Mint, Transfer and Burn. For this we needed RSA accumulator verification on-chain. The underlying capabilities needed are:
We first tried membership/non-membership verification:Consists of: At this point we determined to locally added the underlying big number syscalls and took the opportunity to use OpenSSL because it has much better performance, has Completing that we also recognized the potential demand for other Solana programs that could benefit from a consistent and performant BigNumber and crypto operations (e.g. When the BigNumber syscalls with BigNumber were approved we were going to quickly follow up with syscalls for crypto hashes. |
@jackcmay @aeyakovenko @t-nelson @sakridge @mvines Team: More information on why we believe we need syscalls for crypto where the PR under analysis is just the start: We developed a PoC for verifying GM17/BN128 proofs generated from ZoKrates. The proof was verified using the arkworks-rs, a pure Rust crypto library which uses This capability is another contribution we were hoping to add to the Solana platform. However; we hit a number of limits right off the bat:
fn process_zokrates(_program_id: &Pubkey, _accounts: &[AccountInfo]) -> ProgramResult {
msg!("Processing ZoKrates");
let proof: Proof<ProofPoints<G1Affine, G2Affine>> = serde_json::from_str(PROOF).unwrap();
let vk: VerificationKey<G1Affine, G2Affine> = serde_json::from_str(VKEY).unwrap();
let is_good = verify::<Bn128Field, GM17, Ark>(vk, proof);
msg!("zokrates verify = {}", is_good);
Ok(())
}
We see this approach to using syscalls as enhancing Solana so that developers have more tools to create scalable applications with privacy. [1] - The elliptic curve pairing operations are implemented in Ethereum precompiles to reduce GAS costs. |
Certainly, but it also adds to the ongoing maintenance burden of the runtime and appears to bring unsafe C external dependencies into the consensus path |
From a JIT perspective syscalls are easy to compile but the most expensive thing at runtime because of the two context switches: Saving the registers, switching the stack, rebaseing the instruction meter, etc. But again, what are the alternatives? Fused ops, static analysis? |
@mvines. Just thought I'd add that Solana would not be the first to add an external C library into a consensus path. Cardano uses gmp as the backing C library for arbitrary length integers. They do mention that the implementation is in assembly and that may result in different cost functions per processing architecture. |
This doesn't mean it's ok but this is an interesting data point, thanks!
This seems less than ideal. Perhaps BPF assembly is the way to go here. It seems like with frequent use in a program, the syscall overhead might come into play in the actual wallclock execution time. I'm still generally of the opinion that it would be better to make these libraries work within the BPF environment. It certainly stresses the compute model and perhaps the VM/JIT, which is much a harder problem to solve than escaping to a syscall but ultimately is probably a more useful problem to solve for the platform as a whole. |
Just to be sure, we are not recommending GMP i(with lots of assembly code) instead of OpenSSL. Don't want to conflate the primary discussion. |
Rather than maintaining so many syscalls directly into the Basically this is @Lichtso 's fused ops comment taken to its conclusion, together with an extension of the idea of batching invokes (#18428) There are two approaches here:
For 2., for instance, a single fused operation and operands (in terms of data in keyed accounts) can be serialized accordingly, the syscall can be handled by a single entrypoint, the feature set of which can be extended separately from the bpf_loader and bpf stack. This separates concerns between the bpf VM stack and native optimised bignum operations. Still, a cryptographic or bignum VM is on the consensus path. So keeping it lightweight to ensure correctness while achieving the efficiency goals might be challenging. Extra maintenance costs include correct compute costs for the VM's primitive ops. But this is already a maintenance burden for per-syscall compute costs required for the approach in this PR. If we are talking about long-term maintainability, in-terms of supporting more cryptographic operations beyond that of this PR, this is the direction I see most promise in. But getting there takes some work. Taking things further: an extensible |
@jon-chuang Would you please open a new issue for the fused-op VM idea, we should debate that on its own. |
I like the direction but it looked like earlier attempts at using these libraries with BPF resulted in ~4M instructions, well above the 200k we now allow and probably clock-time a lot slower than native. I have my doubts that even with some heavy optimizations that we could get the BPF solution down enough to work, even in the mid-term. We already have mem-syscalls (more syscalls) that can help with memory movement which BPF has been notoriously bad at. @Lichtso @dmakarov do you have a sense of what kinds of optimizations we could do in the mid-term to help? We've also talked about expanding the BPF instruction set, maybe adding some more advanced but primitive instructions would help? |
Adding more instructions is similar to adding intrinsic functions. These functions would be implemented in native code by the VM. |
Edit: actually, according to #17720 (comment), BPF program wall clock time is 73-300x slower for ecrecover compared to several native. But I'm not sure if that is JITed or interpreted. Edit: It's almost certainly interpreted as At this point I have to say I'm not convinced the issue is with the efficiency of bpf. So far, regarding cryptographic operations, which are ALU rather than memory intensive, I have yet to see an actual benchmark showing wall clock time of the JITed BPF is significantly slower (although one expects it to be somewhat slower). As suggested here solana-labs/rbpf#185, it could be that the bpf uniform opcost model is unfairly-tuned towards ALU-heavy computations. The right solution might be to increase the opcosts for memory ops, and increase the overall compute budget proportionally. To determine if this is the right action, we need benchmarks of JITed BPF versions of cryptographic operations like bignum and elliptic curves. @FrankC01 , would you be able to provide this data to help move things forward? |
Btw @jackcmay , if we're talking about the performance of bytecode, it may be instructive to consider that a WASM secp256k1 in this bitcoin lib is shown to achieve performance comparable to the native rust libsecp256k1: recall:
By contrast the bpf code ran in 0.014s: #17720 (comment) If it turns out WASM is strictly faster than eBPF, the correct thing to do might be to support WASM in a cross-program invocation, as a separate VM, just like was proposed for MoveVM and I think is proposed for NeonEVM (?), instead of intrinsics and syscalls. Programs needing near-native performance can then leverage the WASM VM rather than compiling to eBPF. Here are some resources: I decided to bench samkim's bpf-ristretto. Here's the time taken for Edwards curve scalar muls:
This is pretty much insanely slow - a more than 500x slowdown even when JITed...! Here are more detailed benchmarks
Here are the results for test_edwards_mul when compiling with a reduced-effiency mul using only 32, 32 -> 64 bit muls: https://eprint.iacr.org/2019/542.pdf shows that libsodium when compiled down to WASM (emscripten) has a Here, WASM is benchmarked against the EVM.
|
@jon-chuang Can you send me (alexander solana com) the exact ELF files of the benchmark you ran? |
@jon-chuang Thanks, I received the ELF and analyzed it statically (no profiling so far) like this: First I stripped all functions which are not directly involved in the ec_math like deserialization, formating, allocation, dropping, memcpy, etc. Then I ran:
Which already reveals the problem:
In other words, one third of all instructions in the arithmetic functions are memory accesses. And we know that these are extremely slow, so most likely responsible for the poor performance. |
@Lichtso , one thing to note is that the compute cost for the edwards_mul operation is 3.4M, and the execution time is 34,000us, which means 100CU/us. The cybercore secp256k1 benchmark showed a 200CU/us. This is 2x and 4x better than mainnet median already. However, they are 50x slower than an unoptimised (reference) rust implementation. This makes sense since cryptography is typically more ALU-intensive than memory-intensive. This suggests the performance impact even from relatively small number of loads/stores dwarfs ALU ops. It suggests that the impact on mainnet programs is even more pronounced. While the instruction count is informative, it's not perfect - it would be better to profile the opcode type count as executed, for instance for test_edward_mul, if this is possible, rather than count their occurence in the elf. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This stale pull request has been automatically closed. Thank you for your contributions. |
Problem
The problem is that Solana did not have bignum for solana programs
Summary of Changes
Add syscalls to cover fundamental bignum operations at a fair price to execution cost
This PR is the successor to #17082
Fixes #
Draft reviewers:
@jackcmay
@arthurgreef
@seanyoung