Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send and Sync Region #173

Closed
alexander-camuto opened this issue Apr 27, 2023 · 7 comments
Closed

Send and Sync Region #173

alexander-camuto opened this issue Apr 27, 2023 · 7 comments

Comments

@alexander-camuto
Copy link

alexander-camuto commented Apr 27, 2023

Hey all.

I'm working on ezkl https://github.com/zkonduit/ezkl a zk-ml library leveraging your repos :) We've recently been able to load and create circuits for models with hundreds of millions of parameters, which generates even more constraints ! Assigning values / laying out can now take quite a while and because our circuit shapes lend themselves to parallel assignment, we were hoping to parallelize this process.

However Region and its inner region field (see below) are not thread safe as they are not Send / Sync.

#[derive(Debug)]
pub struct Region<'r, F: Field> {
    region: &'r mut dyn layouter::RegionLayouter<F>, // here be dragons
}

The idea here would be to make Region Send + Sync.

  • From what I can tell Field is already Send + Sync so it would just be a matter of constraining the RegionLayouter trait to be Send and Sync as well (and any resulting changes in the rest of the codebase those constraints may induce) i.e:
pub trait RegionLayouter<F: Field>: fmt::Debug + Send + Sync { // add the new constraints here
    /// Enables a selector at the given offset.
...
  • Presumably the inner region field of Region may need to be wrapped in a Arc<Mutex<>> or Arc<RwLock<>> sort of pattern
#[derive(Debug)]
pub struct Region<'r, F: Field> {
    region: Arc<Mutex<dyn layouter::RegionLayouter<F>>>, // here be dragons
}
  • This wrapping could be conditional on a feature flag ?

    • Alternatively could make an entirely different thread safe Region struct
@alexander-camuto alexander-camuto changed the title Send and Sync Region and/or Layouter Send and Sync Region Apr 27, 2023
@alexander-camuto
Copy link
Author

alexander-camuto commented Apr 28, 2023

Quick update on this. I implemented the above using just the Send + Sync traits.

I roughly tested it in a parallelized setting and it works a-ok by mock proving. Will test out with full proving soon.

The modifications did not require wrapping the inner region field of Region in Arc<Mutex<...>>. I've crossed that out in the initial comment.

@alexander-camuto
Copy link
Author

alexander-camuto commented Apr 28, 2023

For some visibility in the changes made see: zkonduit@5eebade

In terms of impact on other repos:

  • I would also note that halo2wrong required a small update: see zkonduit/halo2wrong@bc43a20 for the updates

  • snark-verifier compiles fine with the changes

@alexander-camuto
Copy link
Author

alexander-camuto commented Apr 30, 2023

Alright yet another update. One note is that the copy constraint labels can be non-deterministic in the case of a parallelized layout. In particular, referring to step 2 of the algorithm in the Halo2 Book for permutation arguments:

  1. Otherwise, left and right belong to different cycles. Make left the larger cycle and right the smaller one, by swapping them iff sizes(aux(left))<sizes(aux(right)).

The smaller cycle in a multi-threaded setting can be non-deterministic and as such, the constructed copy constraint labels can differ between the key generation and proof stage. A way to mitigate this is to make this label deterministic. What I've done in zkonduit@5eebade is to swap:

if self.sizes[left_cycle.0][left_cycle.1] < self.sizes[right_cycle.0][right_cycle.1] {
std::mem::swap(&mut left_cycle, &mut right_cycle);
}

for a deterministic label propagation such as:

if right_cycle.0 >= left_cycle.0 && right_cycle.1 > left_cycle.1 {
            std::mem::swap(&mut left_cycle, &mut right_cycle);
}

The downside is that sometimes this means that the re-labelling of the right cycle can take more time (as its no longer necessarily the smaller cycle). Would be curious to get y'alls thoughts on a way to make this both deterministic and only require re-labelling of the smaller cycle (but may not be possible in given construction). Also given the mods made to the halo2 algorithm it may make sense for us to keep this as our own fork tbh rather than risk impacting the performance of this repo.

@CPerezz
Copy link
Member

CPerezz commented May 2, 2023

I'm working on ezkl https://github.com/zkonduit/ezkl a zk-ml library leveraging your repos :) We've recently been able to load and create circuits for models with hundreds of millions of parameters,

How do you deal with hundreds of milions of constraints? Shouldn't this force you to have a massive trusted setup??

I would also note that halo2wrong required a small update: see zkonduit/halo2wrong@bc43a20 for the updates
I think if the changes are that minimal, you should try to include a parallel_layouter feature. Which provides the API with Mutex/Arc rather than the CS directly.
Same applies to the Halo2 codebase.

How are you able to have parallel witness assignments if you don't wrap the CS inside an Arc<Mutex<T>>?? I'm not sure I understand why is this possible. You marked the thing as Send/Sync. But it does not mean that it is safe to mark it as it.

Could you elaborate a bit more on that @alexander-camuto ??

@alexander-camuto
Copy link
Author

alexander-camuto commented May 2, 2023

How do you deal with hundreds of milions of constraints? Shouldn't this force you to have a massive trusted setup??

Hey @CPerezz good question ! The way we've been able to deal with this is by having operations "wrap" around advice columns. To illustrate this consider our dot product argument:

| a0  | a1  | m     | s_dot | 
|-----|-----|-------|-------|
| x_i | y_i |m_{i-1}| s_dot |
|     |     |m_i    |       |

where constraints are such that $x_i*y_i + m_{i-1} = m_i$, and i indexes over the elements of a vector or some other iterable structure.

Lets say $i$ exceeds the number of rows available in a single column (lets call that $M$). We start allocating the $x_i, y_i$ where $i &gt; M$ to a new set of advice columns $a_2, a_3$. It requires some trickery in that we also need to map the result of the bottom of $a_0, a_1$ to the top of $a_2, a_3$ (by way of copy constraints), such that the constraint with a negative rotation of $-1$ still holds.

In this manner we can have a relatively large number of constraints by increasing the # of columns, rather than increasing the number of rows. The way we get around the region shuffling that may break this is .... we have a single region for the entire circuit and make sure to keep track of the current offset.

How are you able to have parallel witness assignments if you don't wrap the CS inside an Arc<Mutex>?? I'm not sure I understand why is this possible. You marked the thing as Send/Sync. But it does not mean that it is safe to mark it as it.

For this part what we do is that when putting the Region inside a parallel loop for assignment we wrap it itself inside an Arc<Mutex<Region>> (without it we couldn't pass the mutable reference to the region into the parallel loop). Whenever a thread needs to create an assignment it:

  1. locks the Mutex
  2. gets the underlying Region and the mutable reference to the CS for assignment
  3. performs assignment
  4. frees lock on Region (and the underlying mutable reference to the CS)
  5. other threads can now use it

@CPerezz
Copy link
Member

CPerezz commented May 3, 2023

It requires some trickery in that we also need to map the result of the bottom

Ohh I'm sure about that. You have dynamic circuits basically. Looks cool!! I like it!!

If I'm honest, I'm a bit skeptical about marking &mut RegionLayouter as Send + Sync. I agree that it works. And theoretically, if all the contained elements are Send then it should be Send too.

Let me think about this and get back to you. On the meantime, feel free to send a PR!

@CPerezz
Copy link
Member

CPerezz commented Jun 1, 2023

Resolved via #180

@CPerezz CPerezz closed this as completed Jun 1, 2023
iquerejeta pushed a commit to input-output-hk/halo2 that referenced this issue May 8, 2024
…-hk/dev-feature/transcript-chip-book

Add in-circuit random oracle book chapter
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants