Add BigNumber Type and syscalls for BPF programs #17082

FrankC01 · 2021-05-06T13:34:38Z

@jackcmay - As per your comments... for review...

Problem

BigNum operations are not available in vanilla Solana for BPF programs. For big and prime numbers it is desirable to expose these capabilities to BPF programs.

In addition, hashing to a prime number is valuable as well.

Summary of Changes

Added budget costing for use of BigNumber syscalls as well as hashing to prime number. Can be reviewed and modified.
Added BigNumber operational syscalls and unit tests
Added BigNumber type and implementation. When used in BPF programs, passing a BigNumb back and forth to the syscalls does so via Box raw pointers

Fixes #

FrankC01 · 2021-05-07T17:42:04Z

@jackcmay Second commit should address the sanity failure

jackcmay · 2021-05-07T17:52:40Z

@joncinque @jstarry I think you guys have been using big numbers recently? If so, do you see a syscall for these types of things as a good solution?

jstarry · 2021-05-08T03:39:38Z

@joncinque @jstarry I think you guys have been using big numbers recently? If so, do you see a syscall for these types of things as a good solution?

Yes, definitely. Had to make some compromises on the amount of precision used in token-lending in order to reduce compute cost of exponentiation, multiplication, and division. This is super useful if the compute cost is reasonable.

FrankC01 · 2021-05-09T15:35:26Z

@jackcmay From a PR completeness standpoint, is there anything else I should be adding to this?

joncinque · 2021-05-10T17:42:54Z

This looks like a great set of syscalls to me. I'll echo @jstarry 's caveat, that as long as the compute usage is reasonable, these will definitely be useful. How were the current numbers decided?

jackcmay · 2021-05-10T21:33:15Z

programs/bpf_loader/Cargo.toml

@@ -21,6 +21,9 @@ solana-runtime = { path = "../../runtime", version = "=1.7.0" }
 solana-sdk = { path = "../../sdk", version = "=1.7.0" }
 solana_rbpf = "=0.2.8"
 thiserror = "1.0"
+hkdf = "0.11.0"


nit: nice to keep these in alphabetical order

Fixed in 03af751

jackcmay · 2021-05-10T21:33:45Z

programs/bpf_loader/src/syscalls.rs

@@ -1,5 +1,8 @@
 use crate::{alloc, BpfError};
 use alloc::Alloc;
+use blake3::traits::digest::Digest;


Are you using cargo fmt on these changes?

Fixed in 03af751

jackcmay · 2021-05-10T21:37:15Z

sdk/program/src/bignum.rs

+    BorshSerialize,
+    BorshDeserialize,
+    BorshSchema,
+    // Clone,


That is what rustfmt does, although I did find one other change with rustfmt

Why is clone commented out? Do you not want to support the default implementation of clone? If not, remove this line.

jackcmay · 2021-05-10T21:38:07Z

sdk/program/src/bignum.rs

+
+// Syscall interfaces
+#[cfg(target_arch = "bpf")]
+extern "C" {


Stick these in their representative functions below

Fixed in 03af751

jackcmay · 2021-05-10T21:40:34Z

sdk/program/src/bignum.rs

+        {
+            use openssl::bn::BigNum;
+            let braw: &BigNum = unsafe { &*(self.0 as *mut BigNum) };
+            write!(f, "{}", braw)


Does bignum not already implement default?

jackcmay · 2021-05-10T21:41:44Z

sdk/program/src/bignum.rs

+}
+
+impl AsRef<u64> for BigNumber {
+    fn as_ref(&self) -> &u64 {


This can't fail?

Will review

@jackcmay No, it will always return the u64 of the BigNumber structure.0

jackcmay · 2021-05-10T21:43:00Z

sdk/program/src/bignum.rs

+}
+
+/// Drop - removes the underlying BigNum
+impl Drop for BigNumber {


This means that if someone hacks thier program they could force a leaked bignum in the runtime?

Need clarity here...

sdk/program/src/bignum.rs

jackcmay · 2021-05-10T21:46:03Z

sdk/program/src/bignum.rs

+}
+
+#[cfg(test)]
+mod tests {


Would like to see more comprehensive testing. Also would need testing from within a bpf program, check out the sha program in this or for an example: #16498

Added BPF program testing in 2b1f3e1

programs/bpf_loader/src/syscalls.rs

jackcmay · 2021-05-10T21:49:24Z

programs/bpf_loader/src/syscalls.rs

@@ -934,901 +1150,1892 @@ impl<'a> SyscallObject<BpfError> for SyscallSha256<'a> {
    }
 }

-fn get_sysvar<T: std::fmt::Debug + Sysvar + SysvarId>(


Not sure why github is breaking the diff up like this but it's making it very hard to read

jackcmay · 2021-05-10T21:50:25Z

programs/bpf_loader/src/syscalls.rs

+        let rwptr = Box::into_raw(bbox);
+        let bignum_ptr = rwptr as u64;
+        *big_number = bignum_ptr;
+        question_mark!(


Should consume first so that the call fails before doing any work if the caller doesn't have enough units left

Moved consumes to early as possible depending on when size is available to calculate byte costs

Fixed in 03af751

jackcmay · 2021-05-10T21:51:20Z

programs/bpf_loader/src/syscalls.rs

+        );
+        let bytes: f64 = byte_slice.len() as f64;
+        let bbox = Box::new(BigNum::from_slice(byte_slice).unwrap());
+        let rwptr = Box::into_raw(bbox);


This screems memory leak, is it possible to create a container in program space (above the syscall) and just have the syscall fill in the result?

Maybe pass back up a byte array and then call from_slice()?

May need clarity here.

The model here looks like it is allocating a bignum type in the syscall and then passing back a raw pointer, which in program space is converted into a bignum type. This means the allocation is occurring in the runtime and relying on the program to trigger the drop. A malicious program could skip the drop resulting in a memory leak in the runtime.

Could you use from_bytes and to_bytes to pass a byte array between the two. That does lead to local allocations on both ends which isn't ideal but better then a possible memory leak.

@jackcmay So, remember I am a Rust n00b:

Instead of boxing and raw-pointers, send and receive byte arrays to/from syscalls and in the syscalls instantiate the BigNum's?

Is there a way to pass in a Vec and translate it to type in a syscall? I've had trouble and can't seem to make it work.

The reason for the Vec is that BigNum are ultimately variable length and can get quite large so knowing fixed buffers a priori would be daunting.

jackcmay · 2021-05-10T22:06:36Z

sdk/src/process_instruction.rs

@@ -188,6 +226,25 @@ impl BpfComputeBudget {
            max_cpi_instruction_size: 1280, // IPv6 Min MTU size
            cpi_bytes_per_unit: 250,        // ~50MB at 200,000 units
            sysvar_base_cost: 100,
+            bignum_new_base_cost: 100,


I get the sense that these costs are too small. By chance have you done any perf measurements of the big-num calls on native?

arthurgreef · 2021-05-11T01:45:21Z

The compute unit calculations are different for each of the following classes of operation.

Big number operations – new, drop, convert from bytes, convert to bytes, log.
Big number arithmetic operations – add, multiply, divide, exp.
Big number modular arithmetic operations – mod_mul, mod_inv, mod_sqr, mod_exp.
Hash operations – hash to prime, hashed generator, blake3 digest.

Big number operations. The base cost of 100 was chosen as that is what it cost to log and we really do not know how else to select or derive this number.

Big number arithmetic operations. The compute unit calculations are based on this (Ethereum precompile for big arithmetic issue] (ethereum/EIPs#101). The formula is GADDSUBBASE + GARITHWORD * ceil( / 32). The value for GARITHWORD is 6. The base cost of an operation GADDSUBBASE we took by logging the compute units required to perform the same operation in a Solana program using u64 numbers which is 15 or 30 units.

Big number modular arithmetic operations. The unit calculation uses the same formula from the Ethereum precompile for big arithmetic but the base cost is the cost of two operations in a Solana program on u64 numbers. For example, mod_mul base cost is the cost of a multiply operation plus the cost of a mod operation. We did not use the [Ethereum Big integer modular exponentiation precompile](EIP-198: Big integer modular exponentiation (ethereum.org)) formula for the mod_exp calculation but this is something we are looking for feedback on.

Hash operations. For hash operations we used the same formula from the Ethereum precompile for big arithmetic but the base costs vary based on the number of sub-operations. The blake3 digest base cost is the same as the SHA256 digest base cost. The hash generator has more sub-operatinos then a digest operation. The hash to prime operation has a variable number of loops which is a concern as I do not know the maximum number of loops required to find the next prime number if the digest is not a prime number. This is the most expensive hash operation and we wanted feedback on how to calculate compute units. It would not be acceptable for a hash to prime operation to fail for a 32 byte number.

When we were developing our program we moved all the operations in our RSA accumulator membership and non-membership proof verification to sys calls if an instruction resulted in us exceeding the 200_000 compute unit limit. When our program was able to run within the 200_000 compute units we tweaked the cost calculations so that we could still execute our proof verifications without exceeding the unit limit. It may be the case that the current Solana compute unit limits and the cost of executing the operations required for RSA accumulator proof verification make it impossible to execute these types of programs on Solana but I’m hoping that this is not the case.

seanyoung · 2021-05-11T09:15:00Z

The solang compiler https://github.com/hyperledger-labs/solang could possibly use bignum operations if they are efficient enough for math of integer types > 64. However there are some issues with that.

Converting to/from BigNum is expensive, and needs to be done often. I have my doubts how well this will work out IRL.
Solang needs BigNum operations with overflow checks for BigNum of N bits width. This PR doesn't offer that.
The only relevant instruction not available in BPF is 64 bit mul with 128 bit result (imulq with result $rax/$rdax on x86). If BPF had that instruction then I don't know if this implementation is more efficient than a native one.

FrankC01 · 2021-05-11T10:18:09Z

@jackcmay - Checked in that addresses review comments I marked with 👍 , commented where I was not sure or did not address yet

arthurgreef · 2021-05-12T00:20:06Z

@seanyoung these big number operations are useful for cryptographic protocols such as the RSA accumulator we are using. There is very little conversion between native numbers and big numbers in this protocol. Some RSA accumulator implementations use fixed width integers, u256 and u512, usually for hashing operations but we did not take that approach in this iteration.

sdk/program/src/bignum.rs

programs/bpf_loader/src/syscalls.rs

jackcmay · 2021-05-18T23:16:59Z

@FrankC01 Does this test pass for you locally:

thread 'test_program_bpf_finalize' panicked at 'assertion failed: `(left == right)`
left: `InstructionError(0, MissingAccount)`,
right: `InstructionError(0, ProgramFailedToComplete)`', tests/programs.rs:2427:5

Updating from solana master

FrankC01 · 2021-05-19T08:18:06Z

@jackcmay When I run either (in solana/programs/bpf) :
cargo test --features="bpf_c,bpf_rust" -- --nocapture test_program_bpf_sanity

or

cargo test --features="bpf_c,bpf_rust" -- --nocapture

I do not see that error locally. And here is what I captured from running:

[2021-05-19T08:20:39.545901000Z DEBUG solana_runtime::message_processor] Finalized account 2KSfz5hoWqr8vkBKKthsctpAP96vwdKcat7nHZ2dCXPB
[2021-05-19T08:20:39.549157000Z DEBUG solana_runtime::message_processor] Program 11111111111111111111111111111111 invoke [1]
[2021-05-19T08:20:39.549270000Z DEBUG solana_runtime::message_processor] Program 11111111111111111111111111111111 success
[2021-05-19T08:20:39.552253000Z DEBUG solana_runtime::message_processor] Program BfKnjeXBD7bQxRjpoK2VKNWQt1AGMU94mqn24EYDtGut invoke [1]
[2021-05-19T08:20:39.552498000Z DEBUG solana_runtime::message_processor] Program log: Finalize a program
[2021-05-19T08:20:39.552831000Z DEBUG solana_runtime::message_processor] Program BfKnjeXBD7bQxRjpoK2VKNWQt1AGMU94mqn24EYDtGut consumed 2131 of 200000 compute units
[2021-05-19T08:20:39.552921000Z DEBUG solana_runtime::message_processor] Program failed to complete: Program BPFLoader2111111111111111111111111111111111 not supported by inner instructions
[2021-05-19T08:20:39.553006000Z DEBUG solana_runtime::message_processor] Program BfKnjeXBD7bQxRjpoK2VKNWQt1AGMU94mqn24EYDtGut failed: Program failed to complete
test test_program_bpf_finalize ... ok

jackcmay · 2021-05-19T08:50:01Z

Hmm, I also don't see that error when running the exact test that CI is running :-(

I kicked CI, will try and resolve tomorrow

jackcmay · 2021-05-19T08:52:31Z

The solang compiler https://github.com/hyperledger-labs/solang could possibly use bignum operations if they are efficient enough for math of integer types > 64. However there are some issues with that.
* Converting to/from BigNum is expensive, and needs to be done often. I have my doubts how well this will work out IRL.

* Solang needs BigNum operations with overflow checks for BigNum of N bits width. This PR doesn't offer that.

* The only relevant instruction not available in BPF is 64 bit mul with 128 bit result (imulq with result $rax/$rdax on x86). If BPF had that instruction then I don't know if this implementation is more efficient than a native one.

@seanyoung is the overflow checks something that can be added later or must it be part of these initial syscalls to meet Solang's requirements?

FrankC01 · 2021-05-19T18:33:43Z

@jackcmay It seems like a different error now (in coverage)... other PRs seem to have issues as well. Are any actually getting through?

arthurgreef · 2021-05-19T19:26:25Z

@jackcmay The syscalls in this pull request do not provide fixed width integer arithmetic. For this you would need to use something like unit provided by parity. https://crates.io/crates/uint. These syscalls rely on OpenSSL to catch underflow and overflow errors.

jackcmay · 2021-05-19T19:43:31Z

@FrankC01 Are you rebased up to the latest master?

FrankC01 · 2021-05-19T20:42:34Z

@FrankC01 Are you rebased up to the latest master?

Before each checkin i do
git fetch upstream
git merge upstream/master

The current implementations use only the id and disregard other fields, in particular wallclock. This can lead to bugs where an outdated contact-info shadows or overrides a current one because they compare equal.

* improve insert into map initially * rework towards single code path * rename * update test

* don't log shrink metrics on first call * simplify logic

- upgrade rustc to 1.52.1 and clang to 12.0

FrankC01 · 2021-05-20T12:52:50Z

@jackcmay @arthurgreef @seanyoung - Folks, I'm closing this PR as I had completely messed up the works. I will open a new Draft PR and reference this for the background

Apologies

FrankC01 added 2 commits May 6, 2021 09:17

Add BigNumber and syscalls

8fea0cc

Removed extra whitespaces

2cd8645

jackcmay reviewed May 10, 2021

View reviewed changes

sdk/program/src/bignum.rs Outdated Show resolved Hide resolved

jackcmay reviewed May 10, 2021

View reviewed changes

programs/bpf_loader/src/syscalls.rs Outdated Show resolved Hide resolved

jackcmay reviewed May 10, 2021

View reviewed changes

Updated for review comments

03af751

Merge branch 'master' of https://github.com/solana-labs/solana

440aa5e

arthurgreef reviewed May 12, 2021

View reviewed changes

sdk/program/src/bignum.rs Outdated Show resolved Hide resolved

arthurgreef reviewed May 12, 2021

View reviewed changes

sdk/program/src/bignum.rs Outdated Show resolved Hide resolved

arthurgreef reviewed May 12, 2021

View reviewed changes

programs/bpf_loader/src/syscalls.rs Outdated Show resolved Hide resolved

FrankC01 added 2 commits May 18, 2021 05:38

Cleanup tests

35d5c62

Updated logging for non-bpf execution

4b7925c

Merge remote-tracking branch 'upstream/master'

176e4cc

Updating from solana master

FrankC01 and others added 8 commits May 19, 2021 05:29

Doubled BigNumber syscall execution base units

bc7eb30

Merged with upstream

4f96707

Fix deadcode errors

1f422ab

Fix assertion expression

7a00993

move Ancestors to its own module (#17316)

ed9cbd5

shink all in parallel on startup (#17308)

c20b27b

Fix typo (#17326)

f1b4a0a

add calc_stored_meta_us metric (#17318)

b5302e7

adds gossip metrics for number of staked nodes (#17330)

e7073ec

jackcmay and others added 7 commits May 19, 2021 13:43

Optimize aligned memory used by the runtime (#17324)

477898f

removes manual trait impl for contact-info (#17332)

13b032b

The current implementations use only the id and disregard other fields, in particular wallclock. This can lead to bugs where an outdated contact-info shadows or overrides a current one because they compare equal.

generate_index inserts ideal initial data (#17247)

32ec834

* improve insert into map initially * rework towards single code path * rename * update test

don't log shrink metrics on first call (#17328)

a544010

* don't log shrink metrics on first call * simplify logic

Bump bpf-tools version to 1.9

2ae57c1

- upgrade rustc to 1.52.1 and clang to 12.0

syscall corrections

abe0064

Hard reset

86c9814

FrankC01 closed this May 20, 2021

FrankC01 mentioned this pull request May 21, 2021

Add bignum syscalls #17393

Closed

Add BigNumber Type and syscalls for BPF programs #17082

Add BigNumber Type and syscalls for BPF programs #17082

Conversation

FrankC01 commented May 6, 2021

Problem

Summary of Changes

FrankC01 commented May 7, 2021

jackcmay commented May 7, 2021

jstarry commented May 8, 2021

FrankC01 commented May 9, 2021

joncinque commented May 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FrankC01 May 12, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arthurgreef commented May 11, 2021

seanyoung commented May 11, 2021

FrankC01 commented May 11, 2021

arthurgreef commented May 12, 2021

jackcmay commented May 18, 2021

FrankC01 commented May 19, 2021 • edited Loading

jackcmay commented May 19, 2021 • edited Loading

jackcmay commented May 19, 2021

FrankC01 commented May 19, 2021

arthurgreef commented May 19, 2021

jackcmay commented May 19, 2021

FrankC01 commented May 19, 2021 • edited Loading

FrankC01 commented May 20, 2021

FrankC01 May 12, 2021 •

edited

Loading

FrankC01 commented May 19, 2021 •

edited

Loading

jackcmay commented May 19, 2021 •

edited

Loading

FrankC01 commented May 19, 2021 •

edited

Loading