Skip to content
This repository has been archived by the owner on Oct 1, 2024. It is now read-only.

Sync upstream #3

Closed
wants to merge 18 commits into from
Closed

Sync upstream #3

wants to merge 18 commits into from

Conversation

Schaeff
Copy link
Collaborator

@Schaeff Schaeff commented Sep 12, 2024

No description provided.

dependabot bot and others added 18 commits July 29, 2024 07:47
Updates the requirements on [halo2curves](https://github.com/privacy-scaling-explorations/halo2curves) to permit the latest version.
- [Release notes](https://github.com/privacy-scaling-explorations/halo2curves/releases)
- [Commits](privacy-scaling-explorations/halo2curves@v0.6.1...v0.7.0)

---
updated-dependencies:
- dependency-name: halo2curves
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: sha256 pseudo-compression 2-to-1 function

* chore: implement `CompressionFunction`
* Remove an index_out_of_bound_check.

* Moving the dereference.

* Another tiny improvement.

* tiny change

* Remove index check into submat.

* Another minor refactor to avoid more index checks.

* fmt
* better tracing info message

* make clippy happy
* Adding bench functions for mul for packed fields.
Fixing a bug which seems to be massively slowing down monty multiplication in AVX2.

* Uncommenting a couple of things.
- Added explanatory comment to `reverse_slice_index_bits` function in `util/src/lib.rs`.
- Extended unit tests coverage within `util/src/lib.rs`, focusing on the `reverse_bits_len` and `reverse_slice_index_bits` functions.
* Poseidon2 AIR

With support for S-boxes of higher degree than our constraint system; based on @bhgomes's valida-xyz/valida#10

* Update poseidon2-air/src/generation.rs

Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com>

* Address warnings

---------

Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com>
* Remove unused deps.

* Remove two more unused deps.
…elds (Plonky3#437)

* Adding custom AVX2 implementations for x^3, x^5 and x^7 for Monty31Fields.

This is part of the work to speed up Poseidon2 which can be done immediately without needing any API changes. At some point I'll implement this for AVX512/NEON but there is no need to wait for that in order to merge this.

Looking at GodBolt: https://godbolt.org/z/Ksx1z1Whe we see that in pure number of instructions, we get a little less than a 10% improvement for x^3 and this climbs to a 20% improvement for x^7.

As x^3 and x^7 are the sbox used by KoalaBear and BabyBear respectively for their Poseidon2 hash, this should give a mild improvement to that.

Indeed we find that we get close to a 10% speed up for Poseidon2 BabyBear but broadly an unnoticeable speed up for Poseidon2 KoalaBear. This is likely due to the x -> x^7 being a major bottle neck for Poseidon2 BabyBear whereas Poseidon2 KoalaBear is currently being slowed down more by the current (poor) implementation of the internal layer with the map x -> x^3 already being reasonably quick.

* Moving all testing of exp 3, 5, 7 into packed_field_testing.

Also allows us to delete a couple of testing functions in the NEON code.

* Update monty-31/src/x86_64_avx2/packing.rs

Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com>

* Update monty-31/src/x86_64_avx2/packing.rs

Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com>

* Update monty-31/src/x86_64_avx2/packing.rs

Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com>

* Update monty-31/src/x86_64_avx2/packing.rs

Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com>

* Minor fixes to align to review.

* Minor comment updates

---------

Co-authored-by: Hamish Ivey-Law <426294+unzvfu@users.noreply.github.com>
* First draft.

* Second draft.

* Add root tables; remove dumb reduce; prepare for Monty version.

* Use Monty rather than Barrett.

* Remove old comments.

* Do partial reduction; inline sizes 128 and 256.

* Remove Barrett reduc code.

* Refactor butterfly.

* Use u32 repr rather than i64; misc. tidying.

* Working version; initial benchmarking harness.

* Working with non-square inputs.

* Four-step FFT fiddling.

* Move BabyBear FFT to Monty 31 crate.

* Remove 'Real' typedef.

* Move implementation into MontyField31 struct.

* Implement the TwoAdicSubgroupDft trait; move tests to concrete field

* More thorough transpose benchmark.

* Move `pretty_name` to utils crate; use `pretty_name` in fft benches

* Remove unused four-step code.

* Tidy up implementation and testing; store precomputed roots.

* Tidying.

* Remove unused 'backward' transform.

* Move `split_at_mut_unchecked` to utils crate; remove unused import.

* Clippy.

* Remove unnecessary function.

* Fix name of algo.

* Refactor bitrev & transpose parts of dft

* Refactor DFT tests.

* Minor simplification.

* Fix specification of twiddle table.

* Expanded benchmarks.

* Remove unnecessary borrows.

* Add more tracing information.

* Messy but working version of `coset_lde_batch`.

* Reduce allocations by removing dependency on `RowMajorMatrix`.

* Unsafe scratch initialisation.

* Don't apply coset powers to zero elements.

* Tidying up; parallelise `scale()`.

* Update Keccak AIR examples

* Use new FFT in KoalaBear example; misc tidying.

* Fix dumb bug.

* Rename var.

* Switch DIT and DIF for DFT and IDFT; adjust bit-reversals & zeroing; scale and shift at once.

* Refactor internal functions; rename some things.

* clippy

* Remove unused function.

* Update some documentation.

* Expand first layer of DFT.

* Reduce memory consumption.

* Specialise inverse roots; unroll radix4; move fn's to utils.

* Remove unused fn; comment.

* Rename Radix2Dft -> RecursiveDft.

* Clean up examples.

* Miscellaneous documentation and tidying.

* Minor tidying.

* cargo fmt

* Address review comments.

* Fix URL.

* `split_at_mut_unchecked` is now available in stable.

* Remove comment.

* Remove `partial_monty_reduce`; add comments; cargo fmt.

* Faster alloc and padding; remove specialised first FFT layer.

* Use `transmute` instead of `Vec::set_len`.

* "Tidying"

* Review comments.
@Schaeff Schaeff marked this pull request as draft September 12, 2024 19:46
@Schaeff Schaeff closed this Oct 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants