Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mem2reg): Remove trivial stores #5865

Closed
wants to merge 27 commits into from

Conversation

vezenovm
Copy link
Contributor

@vezenovm vezenovm commented Aug 29, 2024

Description

Problem*

Partially resolves #4535. We can still do some more cleanup but will handle that in a follow-up. References the PR discussion comments for more context.

Summary*

Just marking the result of a load known was causing some failures, specifically for arrays. So for now I just look for trivial stores that are immediately storing the same value that was just loaded.

Also testing on CI as I keep getting rayon errors with cargo test for some reason. Making this check mark more values known was also difficult to due to this issue with testing so I am just pushing the more trivial case in this draft.

Additional Context

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

Copy link
Contributor

github-actions bot commented Aug 29, 2024

Changes to Brillig bytecode sizes

Generated at commit: 26f990809064817b8d19adc57e83c38ee80620cd, compared to commit: 1737b656c861706c38b59bd5ef6cd095687a2898

🧾 Summary (10% most significant diffs)

Program Brillig opcodes (+/-) %
no_predicates_numeric_generic_poseidon -196 ✅ -14.79%
fold_numeric_generic_poseidon -196 ✅ -14.79%
poseidon2 -92 ✅ -21.70%

Full diff report 👇
Program Brillig opcodes (+/-) %
nested_array_dynamic 4,206 (-4) -0.10%
eddsa 67,414 (-104) -0.15%
fold_complex_outputs 1,020 (-2) -0.20%
nested_array_in_slice 1,676 (-4) -0.24%
regression_5252 36,277 (-104) -0.29%
slice_regex 7,283 (-25) -0.34%
regression 728 (-9) -1.22%
uhashmap 24,307 (-2,647) -9.82%
brillig_loop_size_regression 55 (-7) -11.29%
hashmap 36,942 (-5,846) -13.66%
no_predicates_numeric_generic_poseidon 1,129 (-196) -14.79%
fold_numeric_generic_poseidon 1,129 (-196) -14.79%
poseidon2 332 (-92) -21.70%

Copy link
Contributor

github-actions bot commented Aug 29, 2024

Changes to circuit sizes

Generated at commit: 26f990809064817b8d19adc57e83c38ee80620cd, compared to commit: 1737b656c861706c38b59bd5ef6cd095687a2898

🧾 Summary (10% most significant diffs)

Program ACIR opcodes (+/-) % Circuit size (+/-) %
nested_array_in_slice -36 ✅ -3.29% -36 ✅ -0.64%
hashmap -3,109 ✅ -3.18% -3,646 ✅ -2.32%

Full diff report 👇
Program ACIR opcodes (+/-) % Circuit size (+/-) %
nested_array_in_slice 1,057 (-36) -3.29% 5,596 (-36) -0.64%
hashmap 94,642 (-3,109) -3.18% 153,248 (-3,646) -2.32%

Copy link
Contributor

@jfecher jfecher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some potential aliasing issues

compiler/noirc_evaluator/src/ssa/opt/mem2reg/block.rs Outdated Show resolved Hide resolved
compiler/noirc_evaluator/src/ssa/opt/mem2reg.rs Outdated Show resolved Hide resolved
@vezenovm
Copy link
Contributor Author

vezenovm commented Aug 30, 2024

I'm kind of surprised by the brillig_cow_assign failure looking at the SSA:

brillig fn main f0 {
  b0():
    inc_rc [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0]
    v2 = allocate
    store [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] at v2
    v3 = load v2
    inc_rc v3
    v4 = load v2
    inc_rc v4
    store v4 at v2
    v5 = allocate
    store v3 at v5
    jmp b1(u32 0)
  b1(v6: u32):
    v9 = lt v6, u32 10
    jmpif v9 then: b2, else: b3
  b2():
    v21 = eq v6, u32 5
    jmpif v21 then: b4, else: b5
  b4():
    v27 = load v2
    inc_rc v27
    v28 = load v2
    inc_rc v28
    store v28 at v2
    store v27 at v5
    jmp b5()
  b5():
    v22 = load v2
    v23 = array_set v22, index v6, value Field 27
    v25 = add v6, u32 1
    store v23 at v2
    v26 = add v6, u32 1
    jmp b1(v26)
  b3():
    v10 = load v2
    v12 = array_get v10, index u32 6
    v14 = eq v12, Field 27
    constrain v12 == Field 27
    v15 = load v5
    v16 = array_get v15, index u32 6
    v17 = eq v16, Field 27
    v18 = not v17
    constrain v17 == u1 0
    return 
}
After Mem2Reg:
brillig fn main f0 {
  b0():
    inc_rc [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0]
    v30 = allocate
    inc_rc [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0]
    inc_rc [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0]
    store [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] at v30
    v33 = allocate
    store [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] at v33
    jmp b1(u32 0)
  b1(v6: u32):
    v34 = lt v6, u32 10
    jmpif v34 then: b2, else: b3
  b2():
    v42 = eq v6, u32 5
    jmpif v42 then: b4, else: b5
  b4():
    v43 = load v30
    inc_rc v43
    v44 = load v30
    inc_rc v44
    store v43 at v33
    jmp b5()
  b5():
    v45 = load v30
    v46 = array_set v45, index v6, value Field 27
    v47 = add v6, u32 1
    store v46 at v30
    v48 = add v6, u32 1
    jmp b1(v48)
  b3():
    v35 = load v30
    v36 = array_get v35, index u32 6
    v37 = eq v36, Field 27
    constrain v36 == Field 27
    v38 = load v33
    v39 = array_get v38, index u32 6
    v40 = eq v39, Field 27
    v41 = not v40
    constrain v40 == u1 0
    return 
}

When comparing to the SSA without this optimization the only difference is there is an additional store v44 at v30 before store v43 at v33 in b4. I assume this is due Brillig's CoW optimization and how inc_rc is processed, although the SSA looks like it should execute correctly.

Edit: Found this hack (

// We can't re-use `value` in case the original address was stored
). This PR is removing that re-store.

@vezenovm
Copy link
Contributor Author

vezenovm commented Aug 30, 2024

Accounting the extra re-load and re-store for inc_rc instructions fixes the brillig_cow_assign bug.

Interestingly, uhashmap is the last failure then and it only occurs with the additional mem2reg and DIE passes. Those passes were added to cleanup leftover unused stores after the loads referencing that store are removed during DEI.
e.g. without those two passes brillig_loop_size_regression gives the following SSA:

After Array Set Optimizations:
brillig fn main f0 {
  b0():
    v50 = allocate
    store u1 1 at v50, from_rc false
    v51 = allocate
    store Field 1 at v51, from_rc false
    v52 = allocate
    store u1 0 at v52, from_rc false
    v53 = allocate
    store Field 0 at v53, from_rc false
    v54 = allocate
    store u1 0 at v54, from_rc false
    v55 = allocate
    store Field 0 at v55, from_rc false
    jmp b1(u32 0)
  b1(v10: u32):
    v56 = eq v10, u32 0
    jmpif v56 then: b2, else: b3
  b2():
    v63 = load v50
    v64 = load v51
    constrain v63 == u1 1
    constrain v64 == Field 1
    v70 = add v10, u32 1
    jmp b1(v70)
  b3():
    store u1 1 at v50, from_rc false
    store Field 2 at v51, from_rc false
    return Field 2
}

Then with those passes we get the following:

After Array Set Optimizations:
brillig fn main f0 {
  b0():
    v71 = allocate
    store u1 1 at v71, from_rc false
    v72 = allocate
    store Field 1 at v72, from_rc false
    jmp b1(u32 0)
  b1(v10: u32):
    v77 = eq v10, u32 0
    jmpif v77 then: b2, else: b3
  b2():
    v78 = load v71
    v79 = load v72
    constrain v78 == u1 1
    constrain v79 == Field 1
    v80 = add v10, u32 1
    jmp b1(v80)
  b3():
    store u1 1 at v71, from_rc false
    store Field 2 at v72, from_rc false
    return Field 2
}

brillig_loop_size_regression 65 (-8) -10.96%

We still get a reduction without these extra mem2reg and DIE passes, but we should try to remove these instructions as we know we can. We just have to figure out why it is causing issues for uhashmap. We could merge these changes as they still provide a nice reduction in poseidon2, hashmap, and uhashmap. Then in a follow-up we can address getting rid of these leftover stores.

@jfecher
Copy link
Contributor

jfecher commented Aug 30, 2024

Interestingly, uhashmap is the last failure then and it only occurs with the additional mem2reg and DIE passes. Those passes were added to cleanup leftover unused stores after the loads referencing that store are removed during DEI.

Ideally we get rid of those extra passes anyway. It's a fairly unsatisfactory tradeoff with compilation time IMO to add two extra passes to get rid of a couple extra instructions in some brillig functions. More concerning is why running these again (although the culprit is presumably mem2reg) produces an issue in the first place.

@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 3, 2024

Ideally we get rid of those extra passes anyway. It's a fairly unsatisfactory tradeoff with compilation time IMO to add two extra passes to get rid of a couple extra instructions in some brillig functions.

Yeah agreed. I'm going to mark this PR ready for review without those extra passes this PR still shows a good improvement.

More concerning is why running these again (although the culprit is presumably mem2reg) produces an issue in the first place.

Then in a follow-up we can investigate the cause of uhashmap failing.

@vezenovm vezenovm marked this pull request as ready for review September 3, 2024 14:57
@vezenovm vezenovm requested a review from a team September 3, 2024 16:29
Comment on lines 176 to 177
Instruction::Store { address, value, from_rc } => {
writeln!(f, "store {} at {}, from_rc {}", show(*value), show(*address), *from_rc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of including from_rc on every store instruction for a mem2reg-specific issue.

Can we add tracking to mem2reg specifically instead? E.g. if we see a load -> dec-rc -> store we mark that we can't remove the store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at switching to that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to tracking the current rc reload per instruction with an Option<(ValueId, bool)>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah something looks to be failing. I had it passing before but I guess I made a bad change while cleaning up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had moved when I was calling my method to track the rc reload state, but this led to inadvertently calling the method on the wrong instruction id. This is now fixed and the PR is ready for review again.

Copy link
Contributor Author

@vezenovm vezenovm Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to still be failing on the debugger actually as the debugger inserts foreign calls which are breaking up the expected block of this form:

    v72 = load v57
    inc_rc v72
    v73 = load v57
    inc_rc v73
    store v73 at v57
    store v72 at v60

In the debugger we see the following before mem2reg after the inlining pass:

    v76 = load v14
    inc_rc v76
    v77 = load v14
    inc_rc v77
    store v77 at v14
    call v80(Field 1, v76)
    inc_rc v76
    v81 = load v14
    inc_rc v81
    store v81 at v14
    store v76 at v26

@jfecher
Copy link
Contributor

jfecher commented Sep 4, 2024

@vezenovm if it helps for this PR I was talking with @sirasistant who mentioned changing the design of how inc/dec-rc works in brillig. A side-effect of that work would be that we no longer would have to store after we load and dec-rc arrays at the end of each function. Should make things easier here I imagine?

@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 4, 2024

A side-effect of that work would be that we no longer would have to store after we load and dec-rc arrays at the end of each function. Should make things easier here I imagine?

Yeah it should. The main issue here is I have to differentiate which stores I can actually remove if the known value of a store equals the address it is storing into. Without inc_rc/dec_rc, I can safely remove any of these stores. It does look to work with the solution on this branch. Although it is a bit hacked around as I have uhashmap failures on #5905 which I believe are happening to due these stores being inadvertently deleted.

@jfecher
Copy link
Contributor

jfecher commented Sep 4, 2024

We can pause this work for a week or so then while the brillig changes are being worked on. Hopefully with the removal of the special tracking uhashmap won't fail anymore.

@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 4, 2024

We can pause this work for a week or so then while the brillig changes are being worked on. Hopefully with the removal of the special tracking uhashmap won't fail anymore.

I haven't nailed down the exact cause but I think uhashmap might be failing due to a separate reason. I want to nail down exactly why it is failing. If it is due to inc_rc/dec_rc I'll wait for the brillig changes, otherwise I will try to resolve it.

@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 4, 2024

If it is due to inc_rc/dec_rc I'll wait for the brillig changes, otherwise I will try to resolve it.

Ok I have nailed down the cause of the failure and it is in fact unrelated to inc_rc/dec_rc.

When cleaing up stores we check a couple things:

  1. That there is not a load of that store.
  2. That we do not have a reference param.

Now that I am cleaning up loads as well (#5905) we can remove some stores if we know all loads to that store have been removed. However, I was not checking whether the address of the store we want to remove is possibly used as a reference directly such as in the parameter of a call. uhashmap is now passing for me locally on #5905 when checking whether the last store we want to remove is used in a call.

@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 4, 2024

@jfecher I am going to pull out the changes from #5905 as I had built it off of this PR and the uhashmap failure is unrelated to the inc_rc/dec_rc edge case. I will pause working on this PR (but leave it open) until the design of inc_rc/dec_rc is updated in brillig.

@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 5, 2024

@jfecher I was thinking we could replace this PR with (#5935) which handles inc_rc / dec_rc in the safest manner by just always assuming when we have an inc_rc/dec_rc before a store we cannot remove that store. In this PR I was attempting to check for specifically this case but that was leading to issues. PR #5935 just accepts a smaller improvement (21% on this PR -> 7% on the new one) for safety. And then when inc_rc/dec_rc are redesigned we can remove the small check.

@TomAFrench TomAFrench marked this pull request as draft September 5, 2024 15:15
@TomAFrench
Copy link
Member

Changing to draft for clarity.

@vezenovm
Copy link
Contributor Author

vezenovm commented Sep 5, 2024

Closing in favor of #5935

@vezenovm vezenovm closed this Sep 5, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 5, 2024
… mem2reg (#5935)

# Description

## Problem\*

Partially resolves #4535 

Replaces #5865

## Summary\*

When we see a load we mark the address of that load as being a known
value of the load result. When we reach a store instuction, if that
store value has a known value which is equal to the address of the store
we can remove that store.

We also check whether the last instruction was an `inc_rc` or a
`dec_rc`. If it was we do not remove the store.

## Additional Context


## Documentation\*

Check one:
- [X] No documentation needed.
- [ ] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist\*

- [X] I have tested the changes locally.
- [X] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSA optimization is ineffective when an unconstrained function has loops
3 participants