feat: ICICLE msm integration #498

alxiong · 2024-02-27T15:55:58Z

Description

closes: #490

Unit tests already test the correctness of the MSM results.
Benchmark can be run via cargo bench --bench msm --features "test-srs icicle"

Fixed cudatoolkit inside nix shell is incompatible with cuda driver problem (causing CudaErrorInsufficientDriver err)
Tweak around benchmark code and setup
Update Changelog

Benchmark

TL;DR: it's about 50x speedup compared to arkworks! 🎉

Current criterion benchmark:

MSM with arkworks/19    time:   [1.5497 s 1.5526 s 1.5554 s]
MSM with arkworks/20    time:   [2.8726 s 2.8801 s 2.8848 s]
MSM with arkworks/21    time:   [5.8268 s 5.8572 s 5.8847 s]
MSM with arkworks/22    time:   [11.618 s 11.641 s 11.665 s]

MSM with ICICLE/19      time:   [26.753 ms 26.841 ms 27.018 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
MSM with ICICLE/20      time:   [48.716 ms 48.728 ms 48.737 ms]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
MSM with ICICLE/21      time:   [87.649 ms 87.671 ms 87.697 ms]
MSM with ICICLE/22      time:   [164.33 ms 164.40 ms 164.45 ms]

Individual GPU-accelerated breakdown MSM(2^20)

Start:   Committing to polynomial of degree 1048576
··Start:   Type Conversion: ark->ICICLE: Group
··End:     Type Conversion: ark->ICICLE: Group .....................................11.500ms
··Start:   Load group elements: CPU->GPU
··End:     Load group elements: CPU->GPU ...........................................5.197ms
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................5.835ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................2.624ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................20.493ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................26.000ms
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................27.140µs
End:     Committing to polynomial of degree 1048576  ...............................84.879ms

note: the "GPU-accelerated MSM" is somewhat misleading, because we use non-blocking async MSM on GPU, the computation actually wasn't finished before our CPU moved on, thus part of "Load MSM result GPU->CPU" is "synchronizing the result on the cuda stream" which means wait for the work to finish. Since we know the loading a single projective group should be instant, the more accurate MSM computation time is 20.49 + 26 = 47 sec which aligns with the criterion output above.

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

Targeted PR against correct branch (main)
Linked to GitHub issue with discussion and accepted design OR have an explanation in the PR that describes this work.
Wrote unit tests
Updated relevant documentation in the code
Added a relevant changelog entry to the Pending section in CHANGELOG.md
Re-reviewed Files changed in the GitHub PR explorer

alxiong · 2024-02-27T16:00:27Z

Current benchmark doesn't look right, the difference is only 5x, we should expect at least >10x speedup. In initial exploration, we observed 200x differences!

Again could due to setup, or warm up (i believe criterion only warm up CPU, not GPU)

The number below is run on computing the same MSM (committing to the same polynomial) multiple times, and taking the average runtime. (instead of running multiple instances of MSM at the same time, which GPU should be better at?)

MSM with arkworks/12    time:   [20.128 ms 20.157 ms 20.189 ms]
MSM with arkworks/13    time:   [35.529 ms 35.653 ms 35.778 ms]
MSM with arkworks/14    time:   [67.961 ms 68.246 ms 68.623 ms]
MSM with arkworks/15    time:   [119.05 ms 119.43 ms 119.77 ms]
MSM with arkworks/16    time:   [216.39 ms 216.61 ms 216.91 ms]
MSM with arkworks/17    time:   [435.28 ms 435.69 ms 436.10 ms]
MSM with arkworks/18    time:   [820.43 ms 821.17 ms 822.05 ms]
MSM with arkworks/19    time:   [1.5854 s 1.5876 s 1.5902 s]
MSM with arkworks/20    time:   [2.9161 s 2.9178 s 2.9199 s]
MSM with arkworks/21    time:   [5.9400 s 5.9642 s 5.9849 s]
MSM with arkworks/22    time:   [11.604 s 11.664 s 11.716 s]

MSM with ICICLE/12      time:   [15.503 ms 15.512 ms 15.523 ms]
MSM with ICICLE/13      time:   [17.806 ms 17.834 ms 17.862 ms]
MSM with ICICLE/14      time:   [22.151 ms 22.192 ms 22.222 ms]
MSM with ICICLE/15      time:   [30.990 ms 31.176 ms 31.352 ms]
MSM with ICICLE/16      time:   [51.316 ms 51.570 ms 51.919 ms]
MSM with ICICLE/17      time:   [89.319 ms 89.740 ms 90.149 ms]
MSM with ICICLE/18      time:   [160.70 ms 161.77 ms 163.22 ms]
MSM with ICICLE/19      time:   [318.50 ms 319.18 ms 319.94 ms]
MSM with ICICLE/20      time:   [635.62 ms 637.99 ms 640.69 ms]
MSM with ICICLE/21      time:   [1.2574 s 1.2616 s 1.2680 s]
MSM with ICICLE/22      time:   [2.5118 s 2.5236 s 2.5354 s]

chancharles92 · 2024-02-27T18:24:08Z

primitives/src/pcs/univariate_kzg/mod.rs

+        end_timer!(conv_time);
+
+        // load them on host first
+        let bases = HostOrDeviceSlice::Host(bases);


Do we need to convert and load SRS bases everytime during committing? Can we just load it once and reuse it?

Another point: in VID, the polynomial degree is much lower. Thus we also won't need to upload too many powers_of_g elements to GPUs.

Do we need to convert and load SRS bases everytime during committing? Can we just load it once and reuse it?

this is exactly what I meant on being unsure about the API boundary

ultimately we probably would not use this function in a standalone way, instead, we will pick it apart, and flesh out the full steps inside vid's function, to have fine-grained control on when are data being loaded, and maximize reuse.

imo, we would modify our struct Advz and store Option<&HostOrDeviceSlice<T>> that stores cudamem ref to the srs loaded in the previous run.

I see. I feel it might be better to split it at the PCS level rather than in the VID code. Because the PCS commit function itself shouldn't upload SRS everytime. Is it possible that we add another API, load_srs_to_gpu(), and the commit_in_gpu function can take &HostOrDevicesSlice<T> and no longer need to upload srs, and it will return an error if SRS hasn't been uploaded?

In the VID code, it will call load_srs_to_gpu and commit_in_gpu when needed.

your suggestion makes more sense! I'll implement that!

to go even further, I would also separate out load_poly_coeffs_on_gpu(), and during commit_on_gpu() only takes in pointers, this is because we could be reusing the coeffs from on-gpu FFT in the future. This would account for that flexibility.

a slight annoyance is lifetime due to HostOrDeviceSlice<'a, T> as a return parameter, I've been fighting this today. I don't want to assign 'static for it, as we shouldn't expect the reference to live that long, we only want it to be as long as the cuda pointer being active.

I'll figure sth out, but just to share some engineering journey.

done in 514d479

primitives/src/pcs/univariate_kzg/mod.rs

alxiong · 2024-03-02T16:54:16Z

I'm so annoyed that I can't get my nix shell to work. outside it, i can compile and run icicle code, but inside I can only successfully compile, but any invocation of cuda FFI code would failed.

$ nix develop .#cudaShell
$ cargo test gpu --release --features icicle

runtime error trace

---- pcs::univariate_kzg::tests::gpu_end_to_end_test stdout ----
thread 'pcs::univariate_kzg::tests::gpu_end_to_end_test' panicked at primitives/src/pcs/univariate_kzg/mod.rs:1026:14:
test failed for bn254: IcicleError("IcicleError { icicle_error_code: InternalCudaError, cuda_error: Some(cudaErrorInsufficientDriver), reason: Some(\"Runtime CUDA error.\") }")
stack backtrace:
   0:     0x555555b12366 - std::backtrace_rs::backtrace::libunwind::trace::hbee8a7973eeb6c93
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x555555b12366 - std::backtrace_rs::backtrace::trace_unsynchronized::hc8ac75eea3aa6899
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x555555b12366 - std::sys_common::backtrace::_print_fmt::hc7f3e3b5298b1083
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x555555b12366 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hbb235daedd7c6190
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x555555b3b720 - core::fmt::rt::Argument::fmt::h76c38a80d925a410
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/rt.rs:142:9
   5:     0x555555b3b720 - core::fmt::write::h3ed6aeaa977c8e45
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/fmt/mod.rs:1120:17
   6:     0x555555b0fbff - std::io::Write::write_fmt::h1299aa7741865f2b
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/io/mod.rs:1810:15
   7:     0x555555b12144 - std::sys_common::backtrace::_print::h5d645a07e0fcfdbb
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x555555b12144 - std::sys_common::backtrace::print::h85035a511aafe7a8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x555555b13eb7 - std::panicking::default_hook::{{closure}}::hcce8cea212785a25
  10:     0x555555b13b9d - std::panicking::default_hook::hf5fcb0f213fe709a
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:289:9
  11:     0x555555a8d207 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h60265c2dfa87ee34
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2029:9
  12:     0x555555a8d207 - test::test_main::{{closure}}::h77865bd3127078c6
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:138:21
  13:     0x555555b144d6 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::hbc5ccf4eb663e1e5
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2029:9
  14:     0x555555b144d6 - std::panicking::rust_panic_with_hook::h095fccf1dc9379ee
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:783:13
  15:     0x555555b14222 - std::panicking::begin_panic_handler::{{closure}}::h032ba12139b353db
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:657:13
  16:     0x555555b12866 - std::sys_common::backtrace::__rust_end_short_backtrace::h9259bc2ff8fd0f76
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:171:18
  17:     0x555555b13f80 - rust_begin_unwind
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
  18:     0x5555555ace85 - core::panicking::panic_fmt::h784f20a50eaab275
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
  19:     0x5555555ad393 - core::result::unwrap_failed::h03d8a5018196e1cd
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/result.rs:1649:5
  20:     0x555555793ade - jf_primitives::pcs::univariate_kzg::tests::gpu_end_to_end_test::h4a2c05e21a017c3d
  21:     0x555555915b69 - core::ops::function::FnOnce::call_once::h34aa7a3b7c3c87aa
  22:     0x555555a92adf - core::ops::function::FnOnce::call_once::h8dc6907944022cf6
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
  23:     0x555555a92adf - test::__rust_begin_short_backtrace::haae1a87433f1efb3
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:627:18
  24:     0x555555a91861 - test::run_test_in_process::{{closure}}::h8c7decfa7c14e152
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:650:60
  25:     0x555555a91861 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h19e6ff056d9d21e9
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
  26:     0x555555a91861 - std::panicking::try::do_call::h89c848fcaa37c035
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
  27:     0x555555a91861 - std::panicking::try::h57ab3dc74e2839b8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
  28:     0x555555a91861 - std::panic::catch_unwind::hfb6a1b1abc120fb9
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
  29:     0x555555a91861 - test::run_test_in_process::h5ae2f9875edd562d
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:650:27
  30:     0x555555a91861 - test::run_test::{{closure}}::h35d7300d8928a067
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:573:43
  31:     0x555555a58b96 - test::run_test::{{closure}}::h7525ced405d23d1b
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/test/src/lib.rs:601:41
  32:     0x555555a58b96 - std::sys_common::backtrace::__rust_begin_short_backtrace::h4e7db78ce05afad8
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys_common/backtrace.rs:155:18
  33:     0x555555a5dbf7 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hcfbcb64f1a1b3482
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:529:17
  34:     0x555555a5dbf7 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h9d89c5c4108bd689
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panic/unwind_safe.rs:272:9
  35:     0x555555a5dbf7 - std::panicking::try::do_call::h8a4869bc94ec50c9
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:552:40
  36:     0x555555a5dbf7 - std::panicking::try::h9a576f20ff81ac30
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:516:19
  37:     0x555555a5dbf7 - std::panic::catch_unwind::hbcb4e3f860ef9830
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panic.rs:142:14
  38:     0x555555a5dbf7 - std::thread::Builder::spawn_unchecked_::{{closure}}::h93c79a6be1505948
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/thread/mod.rs:528:30
  39:     0x555555a5dbf7 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h426d96740c81bdaf
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/ops/function.rs:250:5
  40:     0x555555b19095 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h12de4fc57affb195
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  41:     0x555555b19095 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h3c619f45059d5cf1
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/alloc/src/boxed.rs:2015:9
  42:     0x555555b19095 - std::sys::unix::thread::Thread::new::thread_start::hbac657605e4b7389
                               at /rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/sys/unix/thread.rs:108:17
  43:     0x7ffff76a2333 - start_thread
  44:     0x7ffff7724efc - __clone3
  45:                0x0 - <unknown>

I first suspected that clang-sys that icicle's build.rs relies on didn't get the correct env var, but even with explicit setting it to the proper clang and llvm bin and lib, I still run into the above error.

I might need to dig deeper into what actually happens during the FFI calling phase, which object files are being used, and which dynamic library were used to built these executables etc. to pinpoint the source.

p.s. since icicle's build script explicitly use /usr/local/cuda/lib64 , I'm also setting my CUDA_PATH etc to a local installation instead of installing a cudatoolkit inside a nix shell

        baseShell = with pkgs;
          clang15Stdenv.mkDerivation {
            name = "clang15-nix-shell";
            buildInputs = [
              argbash
              openssl
              pkg-config
              git
              nixpkgs-fmt

              cargo-with-nightly
              stableToolchain
              nightlyToolchain
              cargo-sort
              clang-tools_15
              clangStdenv
              llvm_15
            ] ++ lib.optionals stdenv.isDarwin
              [ darwin.apple_sdk.frameworks.Security ];

            CARGO_TARGET_DIR = "target/nix_rustc";

            shellHook = ''
              export RUST_BACKTRACE=full
              export PATH="$PATH:$(pwd)/target/debug:$(pwd)/target/release"
              # Prevent cargo aliases from using programs in `~/.cargo` to avoid conflicts with local rustup installations.
              export CARGO_HOME=$HOME/.cargo-nix

              # Ensure `cargo fmt` uses `rustfmt` from nightly.
              export RUSTFMT="${nightlyToolchain}/bin/rustfmt"

              export C_INCLUDE_PATH="${llvmPackages_15.libclang.lib}/lib/clang/${llvmPackages_15.libclang.version}/include"
              export CC="${clang-tools_15.clang}/bin/clang"
              export CXX="${clang-tools_15.clang}/bin/clang++"
              export AR="${llvm_15}/bin/llvm-ar"
              export CFLAGS="-mcpu=generic"

              # ensure clang-sys got the correct version
              export LLVM_CONFIG_PATH="${llvmPackages_15.llvm.dev}/bin/llvm-config"
              export LIBCLANG_PATH=${llvmPackages_15.libclang.lib}/lib
              export CLANG_PATH=${clang-tools_15.clang}/bin/clang

              # by default choose u64_backend
              export RUSTFLAGS='--cfg curve25519_dalek_backend="u64"'
            ''
              # install pre-commit hooks
              + self.check.${system}.pre-commit-check.shellHook;
          };

//....
        devShells = {
          # enter with `nix develop .#cudaShell`
          cudaShell = baseShell.overrideAttrs (oldAttrs: {
            # for GPU/CUDA env (e.g. to run ICICLE code)
            name = "cuda-env-shell";
            buildInputs = oldAttrs.buildInputs ++ [ cmake util-linux gcc11 ];
            # CXX is overridden to use gcc11 as icicle-curves's build scripts need them, but gcc12 is not supported
            shellHook = oldAttrs.shellHook + ''
              export CUDA_PATH=/usr/local/cuda
              export PATH="${pkgs.gcc11}/bin:$CUDA_PATH/bin:$CUDA_PATH/nvvm/bin:$PATH"
              export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$LIBCLANG_PATH"
            '';
          });
        };

🤦 need more time on this

alxiong · 2024-03-06T18:10:17Z

The code is ready for review, interestingly I have some naive "warmup" function that makes the benchmark more accurate, as the warmup takes a constant of ~200ms, which you won't see now:

Start:   Committing to polynomial of degree 1048576
··Start:   Type Conversion: ark->ICICLE: Group
··End:     Type Conversion: ark->ICICLE: Group .....................................11.500ms
··Start:   Load group elements: CPU->GPU
··End:     Load group elements: CPU->GPU ...........................................5.197ms
··Start:   Type Conversion: ark->ICICLE: Scalar
··End:     Type Conversion: ark->ICICLE: Scalar ....................................5.835ms
··Start:   Load scalars: CPU->GPU
··End:     Load scalars: CPU->GPU ..................................................2.624ms
··Start:   GPU-accelerated MSM
··End:     GPU-accelerated MSM .....................................................20.493ms
··Start:   Load MSM result GPU->CPU
··End:     Load MSM result GPU->CPU ................................................26.000ms
··Start:   Type Conversion: ICICLE->ark: Group
··End:     Type Conversion: ICICLE->ark: Group .....................................27.140µs
End:     Committing to polynomial of degree 1048576  ...............................84.879ms

There are some remaining tasks:

Test passing for subslice (right now construction is fine, but memory dropping still panic; note without subslice things are fine)
add batch_commit() API which depends on the subslice (because I don't just want exact multiple number of bases of scalars, but I want multiple poly of any degrees <= supported_degree to be committed in batch)
update criterion benchmark code

primitives/src/pcs/univariate_kzg/mod.rs

primitives/benches/kzg_gpu.rs

chancharles92

LGTM

alxiong added 4 commits February 27, 2024 03:51

nixShell setup for icicle

02db3e7

add basic msm benchmark

32835ee

add Kzg::commit_with_gpu() and unit tests

ef61c72

update benchmark

10f2d30

alxiong requested review from mrain, akonring, ggutoski and chancharles92 February 27, 2024 15:55

alxiong self-assigned this Feb 27, 2024

philippecamacho mentioned this pull request Feb 27, 2024

Integrate with ICICLE MSM #490

Closed

6 tasks

chancharles92 reviewed Feb 27, 2024

View reviewed changes

mrain reviewed Feb 27, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

alxiong added 2 commits February 28, 2024 15:14

add par_iter and fix bench err

250e6f4

finer-grain print trace, try to fix ci

b1f3c0c

mrain mentioned this pull request Feb 29, 2024

ICICLE <-> Arkworks type converter #501

Open

alxiong added 5 commits March 1, 2024 13:26

split up commit_using_gpu() into 4 api

514d479

use PCSError instead of unwrap

7096bad

ci: explict all-features and avoid icicle feature

7ce5363

adjust benchmark name

819eaf6

Merge remote-tracking branch 'origin/main' into icicle-msm

33b4d84

mrain reviewed Mar 2, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

alxiong added 5 commits March 4, 2024 14:17

hide gpu bench behind feature flag

b9c9ac9

introduce trait GPUCommit with less mem copy during conversion

4405329

specialize conversion and mont-based msm for bn254

c43ec12

warmup for more accurate benchmark

964c63a

wip: add subslice into srs ref

4b98d1f

alxiong marked this pull request as ready for review March 6, 2024 18:06

mrain reviewed Mar 6, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

mrain reviewed Mar 6, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

mrain reviewed Mar 6, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

mrain reviewed Mar 6, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

alxiong added 4 commits March 7, 2024 07:42

use mem::forget() to avoid double-free panic

bd5dd36

improve test

ca5b494

Merge remote-tracking branch 'origin/main' into icicle-msm

4674eee

update bench code

d150d66

mrain reviewed Mar 7, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

mrain reviewed Mar 7, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

alxiong added 3 commits March 7, 2024 17:09

test: high degree behind print-trace feature

34bc815

feat: add batch_commit for gpu

24083ba

update CHANGELOG

49663ab

chancharles92 reviewed Mar 7, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

chancharles92 reviewed Mar 7, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Show resolved Hide resolved

primitives/src/pcs/univariate_kzg/mod.rs Show resolved Hide resolved

mrain reviewed Mar 7, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Show resolved Hide resolved

mrain reviewed Mar 7, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Show resolved Hide resolved

mrain reviewed Mar 7, 2024

View reviewed changes

primitives/src/pcs/univariate_kzg/mod.rs Outdated Show resolved Hide resolved

This was referenced Mar 7, 2024

GPU accelerated Plonk proof generation #506

Open

Improve load_batch_poly_to_gpu #507

Open

mrain reviewed Mar 8, 2024

View reviewed changes

primitives/benches/kzg_gpu.rs Show resolved Hide resolved

alxiong added 4 commits March 8, 2024 06:31

fix bench, add 2 handy apis, rename to gpu_commit()

ae6a9ca

nit: minor edit on profile test

9fdb299

improve scalar type conv by 2x for batch_commit, directly use par_iter

7e5862c

fix trimmed_size logic, comparable test

b371f65

chancharles92 approved these changes Mar 8, 2024

View reviewed changes

alxiong merged commit 0679d65 into main Mar 8, 2024
5 checks passed

alxiong deleted the icicle-msm branch March 8, 2024 15:57

alxiong mentioned this pull request Mar 14, 2024

GPU accelerated VID disperse #521

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ICICLE msm integration #498

feat: ICICLE msm integration #498

alxiong commented Feb 27, 2024 •

edited

Loading

alxiong commented Feb 27, 2024 •

edited

Loading

chancharles92 Feb 27, 2024

chancharles92 Feb 27, 2024

alxiong Feb 28, 2024 •

edited

Loading

chancharles92 Feb 28, 2024

alxiong Feb 29, 2024 •

edited

Loading

alxiong Feb 29, 2024

alxiong Mar 2, 2024

alxiong commented Mar 2, 2024

alxiong commented Mar 6, 2024 •

edited

Loading

chancharles92 left a comment

feat: ICICLE msm integration #498

feat: ICICLE msm integration #498

Conversation

alxiong commented Feb 27, 2024 • edited Loading

Description

Benchmark

alxiong commented Feb 27, 2024 • edited Loading

chancharles92 Feb 27, 2024

Choose a reason for hiding this comment

chancharles92 Feb 27, 2024

Choose a reason for hiding this comment

alxiong Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

chancharles92 Feb 28, 2024

Choose a reason for hiding this comment

alxiong Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

alxiong Feb 29, 2024

Choose a reason for hiding this comment

alxiong Mar 2, 2024

Choose a reason for hiding this comment

alxiong commented Mar 2, 2024

alxiong commented Mar 6, 2024 • edited Loading

chancharles92 left a comment

Choose a reason for hiding this comment

alxiong commented Feb 27, 2024 •

edited

Loading

alxiong commented Feb 27, 2024 •

edited

Loading

alxiong Feb 28, 2024 •

edited

Loading

alxiong Feb 29, 2024 •

edited

Loading

alxiong commented Mar 6, 2024 •

edited

Loading