Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Moore] [Canonicalizer] Lower struct-related assignOp #7341

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

mingzheTerapines
Copy link
Contributor

@mingzheTerapines mingzheTerapines commented Jul 18, 2024

Using pass canonicalizers:
lower structExtractRefOp to structInjectOp
Refresh referring to struct SSA when new struct is created (structInjectOp) with pass simplifyprocedure and mem2reg
update union verify function
Register func dialect for func.func Op

@mingzheTerapines
Copy link
Contributor Author

mingzheTerapines commented Jul 19, 2024

@fabianschuiki which is better?

  %0 = moore.constant 0 : i64
  %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
  %f = moore.variable %1 : <struct<{a: i32, b: i32}>>

a.rewrite variableOp

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
 %2 = moore.struct_extract %1 , "a"
 %3 = moore.struct_extract %1 , "b"
 %4 = moore.struct_create %2, %3 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

The problem is structExtractOp is a readOp like Op, the input should be refType value. But conversionOp returns unpackedType value. We may need a Op like addressOp opposite readOp
So Ir will be like this

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
%2 = moore.address %1
 %3 = moore.struct_extract %2 , "a"
%4 = moore.address %1
 %5 = moore.struct_extract %4 , "b"
 %6 = moore.struct_create %3, %5 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

b. rewrite conversionOp and variableOp

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %5  = moore.concat %2, %4
  %f = moore.struct_create %5 : <struct<{a: i32, b: i32}>>

Then

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %f = moore.struct_create %2, %4 : <struct<{a: i32, b: i32}>>

@fabianschuiki
Copy link
Contributor

Hmmm I don't think struct_create can return a ref type. It simply creates a struct<{...}> SSA value, but it does not allocate any storage like variable. Same with struct_extract: it operates directly on struct<{...}> values, not ref types. The only operation that goes from ref<T> to T is moore.read.

@mingzheTerapines
Copy link
Contributor Author

mingzheTerapines commented Jul 22, 2024

Hmmm I don't think struct_create can return a ref type. It simply creates a struct<{...}> SSA value, but it does not allocate any storage like variable. Same with struct_extract: it operates directly on struct<{...}> values, not ref types. The only operation that goes from ref<T> to T is moore.read.

In Moore, as I understand, RefType means pointer that does not equally allocate memory.

// CHECK-LABEL: moore.module @structCreateFold
moore.module @structCreateFold(in %a : !moore.i1, out b : !moore.i1) {
    %0 = moore.struct_create %a : !moore.i1 -> struct<{a: i1}>
    %1 = moore.struct_extract %0, "a" : <struct<{a: i1}>> -> i1
    // CHECK:  moore.output %a : !moore.i1
    moore.output %1 : !moore.i1
  }

For Ops like these, I can not read field %a if %0 is not a refType. Unless, we have Op like addressOp I mentioned.

@fabianschuiki
Copy link
Contributor

// CHECK-LABEL: moore.module @structCreateFold
moore.module @structCreateFold(in %a : !moore.i1, out b : !moore.i1) {
    %0 = moore.struct_create %a : !moore.i1 -> struct<{a: i1}>
    %1 = moore.struct_extract %0, "a" : <struct<{a: i1}>> -> i1
    // CHECK:  moore.output %a : !moore.i1
    moore.output %1 : !moore.i1
  }

For Ops like these, I can not read field %a if %0 is not a refType. Unless, we have Op like addressOp I mentioned.

You should be able to get field "a" as follows:

%0 = moore.struct_create %a : !moore.i1 -> !moore.struct<{a: i1}>
%1 = moore.struct_extract %0, "a" : !moore.struct<{a: i1}> -> i1  // this operates on struct, not ref<struct>
moore.output %1 : !moore.i1

The struct_extract op should operate on !moore.struct type. There should be a separate struct_extract_ref op if you need to go from a ref<struct<{a: T}>> to a ref<T>.

@mingzheTerapines
Copy link
Contributor Author

@fabianschuiki structExtractOp reads sub memory slot relates to as readOp in Moore dialect and LoadOp in memrefOp. So it should operate on ref.

def LoadOp : MemRef_Op
let arguments = (ins Arg<AnyMemRef, "the reference to load from",
                           [MemRead]>:$memref,
                       Variadic<Index>:$indices,
                       DefaultValuedOptionalAttr<BoolAttr, "false">:$nontemporal);

if (auto width = memberType.getBitSize()) {
auto lowBit = rewriter.create<ConstantOp>(op.getLoc(), i32, bit);
bit += width.value();
auto field = rewriter.create<ExtractOp>(op->getLoc(), member.type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think UnpackedStructType doesn't allow bit extraction. Is it expected way to get a field from UnpackedStructType?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it should use structExtractOp to extract a struct type.

@mingzheTerapines
Copy link
Contributor Author

mingzheTerapines commented Jul 24, 2024

@fabianschuiki which is better?

  %0 = moore.constant 0 : i64
  %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
  %f = moore.variable %1 : <struct<{a: i32, b: i32}>>

a.rewrite variableOp

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
 %2 = moore.struct_extract %1 , "a"
 %3 = moore.struct_extract %1 , "b"
 %4 = moore.struct_create %2, %3 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

The problem is structExtractOp is a readOp like Op, the input should be refType value. But conversionOp returns unpackedType value. We may need a Op like addressOp opposite readOp So Ir will be like this

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
%2 = moore.address %1
 %3 = moore.struct_extract %2 , "a"
%4 = moore.address %1
 %5 = moore.struct_extract %4 , "b"
 %6 = moore.struct_create %3, %5 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

b. rewrite conversionOp and variableOp

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %5  = moore.concat %2, %4
  %f = moore.struct_create %5 : <struct<{a: i32, b: i32}>>

Then

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %f = moore.struct_create %2, %4 : <struct<{a: i32, b: i32}>>

@uenoku You are right, it was outdated. Please see these demo.
Do you think we need addressOp opposite to readOp in Moore dialect?

@mingzheTerapines mingzheTerapines changed the title [canonicalizer] Lower variableOp to structCreateOp [canonicalizer] Lower variableOp to structCreateOp & structExtractRefOp to structInjectOp Jul 25, 2024
@mingzheTerapines mingzheTerapines changed the title [canonicalizer] Lower variableOp to structCreateOp & structExtractRefOp to structInjectOp [Moore] [canonicalizer] Lower struct-related assignOp Jul 25, 2024
@mingzheTerapines mingzheTerapines changed the title [Moore] [canonicalizer] Lower struct-related assignOp [Moore] [Canonicalizer] Lower struct-related assignOp Jul 25, 2024
@hailongSun2000
Copy link
Member

Hmmm I don't think struct_create can return a ref type. It simply creates a struct<{...}> SSA value, but it does not allocate any storage like variable. Same with struct_extract: it operates directly on struct<{...}> values, not ref types. The only operation that goes from ref<T> to T is moore.read.

In Moore, as I understand, RefType means pointer that does not equally allocate memory.
...
For Ops like these, I can not read field %a if %0 is not a refType. Unless, we have Op like addressOp I mentioned.

Hey, @mingzheTerapines. The ref<T> represents a reference to a type. We discussed this at the PR(#7082). The memref.load as you mentioned, it's the reference to load from...
I agree with @fabianschuiki's viewpoint, we should tweak the result type of struct_create to non-refType. The struct_extract is also.

@mingzheTerapines
Copy link
Contributor Author

@hailongSun2000 I partly agree with you. structCreateOp can return unpacked type.

@mingzheTerapines
Copy link
Contributor Author

For the test, it actually has two steps:
original

moore.module @structExtractRefLower2Inject(out a : !moore.struct<{a: i32, b: i32}>) {
  %ii = moore.variable : <struct<{a: i32, b: i32}>>
  %0 = moore.struct_extract_ref %ii, "a" : <struct<{a: i32, b: i32}>> -> <i32>
  %1 = moore.constant 1 : i32
  moore.blocking_assign %0, %1 : i32
  %2 = moore.read %ii : <struct<{a: i32, b: i32}>>
  moore.output %2 : !moore.struct<{a: i32, b: i32}>
}

Firstly
rewrite patern:
variableOp -> constantOp for initial field value + structCreateOp + addressOp for variableOp requiring returning RefType
blockingAssignOp + structExtrcatRefOp -> structInjectOp

moore.module @structExtractRefLower2Inject(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 0 : i32
    %1 = moore.constant 0 : i32
    %2 = moore.struct_create %0, %1 : !moore.i32, !moore.i32 -> struct<{a: i32, b: i32}>
    %3 = moore.address %2 : <struct<{a: i32, b: i32}>>
    %4 = moore.constant 1 : i32
    %5 = moore.read %3 : <struct<{a: i32, b: i32}>>
    %6 = moore.struct_inject %5, "a", %4 : struct<{a: i32, b: i32}>, i32
    %7 = moore.address %6 : <struct<{a: i32, b: i32}>>
    %8 = moore.read %7 : <struct<{a: i32, b: i32}>>
    moore.output %8 : !moore.struct<{a: i32, b: i32}>
  }

Secondly
rewrite patern:
readOp ->If read addressOp, directly use addressOp's input value replacing the use of readOp
structInjectOp -> Chase a chain of struct_inject ops, with an optional final struct_create, and take note of the values assigned to each field.

moore.module @structExtractRefLower2Inject(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 0 : i32
    %1 = moore.constant 1 : i32
    %2 = moore.struct_create %1, %0 : !moore.i32, !moore.i32 -> struct<{a: i32, b: i32}>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }

@fabianschuiki
Copy link
Contributor

I think you can just use moore.variable instead of a new moore.address op. It looks like these would do the same thing.

The steps you are describing sound exactly like a mem2reg pass. Does MLIR's existing mem2reg pass potentially already do what you are describing? That would probably be easier than doing it by hand. Especially since you can't just directly replace variable ops with struct_create, and if there are multiple assigns to the variable in between, the pass becomes much more complicated. I'm pretty sure you can't do this with rewriting patterns alone -- there's probably quite a bit of analysis involved (like what mem2reg does).

@mingzheTerapines
Copy link
Contributor Author

mingzheTerapines commented Aug 2, 2024

@Moxinilian @fabianschuiki It seems SROA Pass + Mem2Reg Pass look very similar to Canonicalize Pass. They all rewrite Ops to original value and remove unused value. What responsibility should they do?
I was considering SROA+Mem2Reg for local variable, Canonicalize for global variable.

@mingzheTerapines
Copy link
Contributor Author

mingzheTerapines commented Aug 2, 2024

I think you can just use moore.variable instead of a new moore.address op. It looks like these would do the same thing.

The steps you are describing sound exactly like a mem2reg pass. Does MLIR's existing mem2reg pass potentially already do what you are describing? That would probably be easier than doing it by hand. Especially since you can't just directly replace variable ops with struct_create, and if there are multiple assigns to the variable in between, the pass becomes much more complicated. I'm pretty sure you can't do this with rewriting patterns alone -- there's probably quite a bit of analysis involved (like what mem2reg does).

But we want to lower variableOp of struct to structCreateOp, the result type is different which needs to be the same.

berfore
 %ii = moore.variable : <struct<{a: i32, b: i32}>>
//after
    %0 = moore.constant 0 : i32
    %1 = moore.constant 0 : i32
    %2 = moore.struct_create %0, %1 : !moore.i32, !moore.i32 -> struct<{a: i32, b: i32}>
    %3 =  moore.variable %2 : <struct<{a: i32, b: i32}>>

It seems not elegent

@Moxinilian
Copy link
Member

Moxinilian commented Aug 2, 2024

Mem2Reg is most useful when you have branching control flow or interaction between dialects, hence why its effect is conceptually simple here. I don't think it makes sense to use canonicalization to do what it does as it's not very local, it requires tracking many effects (notably, memory side-effects).

I'm still not sure what you are trying to achieve with global variables. How would you want to eliminate a global variable?

@fabianschuiki
Copy link
Contributor

I agree with @Moxinilian: I don't think you can use a canonicalization to replicate a mem2reg pass. Variables can be read and written from many different procedures and module-level operations at the same time, and it will not be possible to find a simple value substitution for all of those situations. It's important to not treat a moore.module body as a program that runs from top to bottom: the module has no control flow, so you cannot reason about moore.assign as taking a current value of a variable and updating it to a different value.

I think what you are trying to do is eliminate as many variables in the module and replace them with an SSA value directly. That would be a great thing to do! The moore.assigned_variable op is a great starting point for that: if a variable has only a single moore.assign, you can merge the variable and assigned value into moore.assigned_variable (note that this is different from the initial value of a moore.variable). I think this is already implemented as a canonicalization.

Another canonicalization you could do is to try and pull assigns to struct_extract_ref up through the extract:

// before:
%3 = moore.struct_extract_ref %0, "a"
moore.assign %3, %1
%4 = moore.struct_extract_ref %0, "b"
moore.assign %4, %2

// after:
%3 = moore.struct_create %1, %2
moore.assign %1, %3

@mingzheTerapines
Copy link
Contributor Author

@fabianschuiki I just tested new version of circt-verilog, I found I may not need to lower to structCreateOp for variable of struct.

module Foo(output [63:0] cout);
  struct packed {int a; int b;} ST = 'b0;
  assign cout = ST;
endmodule
// -----// IR Dump Before CSE (cse) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %cout = moore.net wire : <l64>
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.assign %cout, %3 : l64
  %4 = moore.read %cout : <l64>
  moore.output %4 : !moore.l64
}

// -----// IR Dump Before Canonicalizer (canonicalize) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %cout = moore.net wire : <l64>
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.assign %cout, %3 : l64
  %4 = moore.read %cout : <l64>
  moore.output %4 : !moore.l64
}

// -----// IR Dump Before LowerConcatRef (moore-lower-concatref) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before SROA (sroa) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before Mem2Reg (mem2reg) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before CSE (cse) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before Canonicalizer (canonicalize) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before ConvertMooreToCore (convert-moore-to-core) //----- //
module {
  moore.module @Foo(out cout : !moore.l64) {
    %0 = moore.constant 0 : i32
    %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
    %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
    %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
    %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
    moore.output %3 : !moore.l64
  }
}


// -----// IR Dump Before CSE (cse) //----- //
hw.module @Foo(out cout : i64) {
  %c0_i32 = hw.constant 0 : i32
  %c0_i32_0 = hw.constant 0 : i32
  %0 = comb.concat %c0_i32_0, %c0_i32 : i32, i32
  %1 = hw.bitcast %0 : (i64) -> !hw.struct<a: i32, b: i32>
  %ST = llhd.sig %1 : !hw.struct<a: i32, b: i32>
  %2 = llhd.prb %ST : !hw.inout<struct<a: i32, b: i32>>
  %3 = hw.bitcast %2 : (!hw.struct<a: i32, b: i32>) -> i64
  hw.output %3 : i64
}

@fabianschuiki
Copy link
Contributor

@mingzheTerapines I agree, this looks good! We could still improve the canonicalizer of moore.variable a bit: currently variables only get replaced with a moore.assigned_variable if there is a single assignment to the variable. In your example however we have a moore.variable that has an initial value which is never changed by an assign. This combination of moore.variable with initial assignment and no continuous assignment could also be lowered to a moore.assigned_variable 😃

@mingzheTerapines
Copy link
Contributor Author

mingzheTerapines commented Aug 23, 2024

@fabianschuiki canonicalize Pass seems not as reliable as mem2reg Pass.
Because it can only be used for folding structinject or structextract when struct varible itself not used between these two folding Ops. But I am not sure this kind of using check has been done. What's more some procedure region should be handled very carefully.
For example

moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
  %a = moore.variable : <struct<{a: i32, b: i32}>>
  %x = moore.variable : <i32>
  %y = moore.variable : <i32>
  %z = moore.variable : <i32>
  %ii = moore.variable : <struct<{a: i32, b: i32}>>
  moore.procedure always_comb {
    %1 = moore.struct_extract_ref %ii, "a" : <struct<{a: i32, b: i32}>> -> <i32>
    %2 = moore.constant 1 : i32
    moore.blocking_assign %1, %2 : i32
    %3 = moore.struct_extract_ref %ii, "b" : <struct<{a: i32, b: i32}>> -> <i32>
    %4 = moore.constant 4 : i32
    moore.blocking_assign %3, %4 : i32
    moore.return
  }
  %0 = moore.read %a : <struct<{a: i32, b: i32}>>
  moore.output %0 : !moore.struct<{a: i32, b: i32}>
}

We can not canonicalize to

moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
  %a = moore.variable : <struct<{a: i32, b: i32}>>
  %x = moore.variable : <i32>
  %y = moore.variable : <i32>
  %z = moore.variable : <i32>
  moore.procedure always_comb {
  %ii = moore.struct_create %2, %3
    moore.return
  }
  %0 = moore.read %a : <struct<{a: i32, b: i32}>>
  moore.output %0 : !moore.struct<{a: i32, b: i32}>
}

@mingzheTerapines
Copy link
Contributor Author

mingzheTerapines commented Aug 23, 2024

@fabianschuiki As you known, there are some problems with --moore-simplify-procedures, so it can not be enabled. But it works making sturctinjectOp be chain use like. I tested local successfully. Here is the flow
importverilog

moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
  %a = moore.variable : <struct<{a: i32, b: i32}>>
  %x = moore.variable : <i32>
  %y = moore.variable : <i32>
  %z = moore.variable : <i32>
  %ii = moore.variable : <struct<{a: i32, b: i32}>>
   moore.procedure always_comb {
    %1 = moore.struct_extract_ref %ii, "a" : <struct<{a: i32, b: i32}>> -> <i32>
    %2 = moore.constant 1 : i32
    moore.blocking_assign %1, %2 : i32
    %3 = moore.struct_extract_ref %ii, "b" : <struct<{a: i32, b: i32}>> -> <i32>
    %4 = moore.constant 4 : i32
    moore.blocking_assign %3, %4 : i32
     moore.return
  }
  %0 = moore.read %a : <struct<{a: i32, b: i32}>>
  moore.output %0 : !moore.struct<{a: i32, b: i32}>
}

canonicalizer

module {
  moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 4 : i32
    %1 = moore.constant 1 : i32
    %a = moore.variable : <struct<{a: i32, b: i32}>>
    %ii = moore.variable : <struct<{a: i32, b: i32}>>
    moore.procedure always_comb {
      %3 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %4 = moore.struct_inject %3, "a", %1 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %4 : struct<{a: i32, b: i32}>
      %5 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %6 = moore.struct_inject %5, "b", %0 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %6 : struct<{a: i32, b: i32}>
      moore.return
    }
    %2 = moore.read %a : <struct<{a: i32, b: i32}>>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }
}

--moore-simplify-procedures

module {
  moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 4 : i32
    %1 = moore.constant 1 : i32
    %a = moore.variable : <struct<{a: i32, b: i32}>>
    %ii = moore.variable : <struct<{a: i32, b: i32}>>
    moore.procedure always_comb {
      %3 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %4 = moore.variable : <struct<{a: i32, b: i32}>>
      moore.blocking_assign %4, %3 : struct<{a: i32, b: i32}>
      %5 = moore.read %4 : <struct<{a: i32, b: i32}>>
      %6 = moore.struct_inject %5, "a", %1 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %4, %6 : struct<{a: i32, b: i32}>
      %7 = moore.read %4 : <struct<{a: i32, b: i32}>>
      moore.blocking_assign %ii, %7 : struct<{a: i32, b: i32}>
      %8 = moore.read %4 : <struct<{a: i32, b: i32}>>
      %9 = moore.struct_inject %8, "b", %0 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %4, %9 : struct<{a: i32, b: i32}>
      %10 = moore.read %4 : <struct<{a: i32, b: i32}>>
      moore.blocking_assign %ii, %10 : struct<{a: i32, b: i32}>
      moore.return
    }
    %2 = moore.read %a : <struct<{a: i32, b: i32}>>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }
}

mem2reg

module {
  moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 4 : i32
    %1 = moore.constant 1 : i32
    %a = moore.variable : <struct<{a: i32, b: i32}>>
    %ii = moore.variable : <struct<{a: i32, b: i32}>>
    moore.procedure always_comb {
      %3 = moore.constant 0 : i64
      %4 = moore.conversion %3 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
      %5 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %6 = moore.struct_inject %5, "a", %1 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %6 : struct<{a: i32, b: i32}>
      %7 = moore.struct_inject %6, "b", %0 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %7 : struct<{a: i32, b: i32}>
      moore.return
    }
    %2 = moore.read %a : <struct<{a: i32, b: i32}>>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }
}

Comment on lines +202 to +204
%0 = moore.constant 42 : i32
%1 = moore.struct_extract_ref %arg0, "a" : <struct<{a: i32, b: i32}>> -> <i32>
moore.blocking_assign %1, %0 : i32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's perfect if you can add some other cases. Such as:

%0 = moore.constant 42 : i32
%1 = moore.struct_extract_ref %arg0, "a" : <struct<{a: i32, b: i32}>> -> <i32>
moore.blocking_assign %1, %0 : i32

%2 = moore.constant 43 : i32 
moore.blocking_assign %1, %2 : i32

And

%0 = moore.constant 42 : i32
%1 = moore.struct_extract_ref %arg0, "a" : <struct<{a: i32, b: i32}>> -> <i32>
moore.blocking_assign %1, %0 : i32

%2 = moore.constant 43 : i32 
%3 = moore.struct_extract_ref %arg0, "b" : <struct<{a: i32, b: i32}>> -> <i32>
moore.blocking_assign %2, %3 : i32

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests may be complicated I can print the result of your case here

%0 = moore.constant 43 : i32
    %1 = moore.constant 42 : i32
    %2 = moore.read %arg0 : <struct<{a: i32, b: i32}>>
    %3 = moore.struct_inject %2, "a", %1 : struct<{a: i32, b: i32}>, i32
    moore.blocking_assign %arg0, %3 : struct<{a: i32, b: i32}>
    %4 = moore.read %arg0 : <struct<{a: i32, b: i32}>>
    %5 = moore.struct_inject %4, "a", %0 : struct<{a: i32, b: i32}>, i32
    moore.blocking_assign %arg0, %5 : struct<{a: i32, b: i32}>
    %6 = moore.read %arg0 : <struct<{a: i32, b: i32}>>
    %7 = moore.struct_inject %6, "a", %0 : struct<{a: i32, b: i32}>, i32
    moore.blocking_assign %arg0, %7 : struct<{a: i32, b: i32}>
    return %arg0 : !moore.ref<struct<{a: i32, b: i32}>>

and

%0 = moore.constant 43 : i32
    %1 = moore.constant 42 : i32
    %2 = moore.read %arg0 : <struct<{a: i32, b: i32}>>
    %3 = moore.struct_inject %2, "a", %1 : struct<{a: i32, b: i32}>, i32
    moore.blocking_assign %arg0, %3 : struct<{a: i32, b: i32}>
    %4 = moore.read %arg0 : <struct<{a: i32, b: i32}>>
    %5 = moore.struct_inject %4, "b", %0 : struct<{a: i32, b: i32}>, i32
    moore.blocking_assign %arg0, %5 : struct<{a: i32, b: i32}>
    return %arg0 : !moore.ref<struct<{a: i32, b: i32}>>

@mingzheTerapines
Copy link
Contributor Author

@fabianschuiki Could you please give some advice for this commit, thanks a lot.

@fabianschuiki
Copy link
Contributor

@mingzheTerapines I'm wondering if the SROA you have added to the Moore dialect doesn't more or less do a similar thing than this PR 🤔. If your goal is to get rid of subfield assignments to structs and arrays, SROA feels like a pretty good approach because it eliminates the struct altogether.

Converting assignments to struct fields into assignments to the entire struct is a lot more complicated. The difficult part is that you have to make sure that assigning to the entire struct doesn't overwrite other assignments to individual fields. For example, you could have the following:

struct packed {
  int a;
  int b;
} x;
initial x.a <= #1ns 42;
initial x.b <= #2ns 9001;

This would show up like the following in the IR:

%x = moore.variable
moore.procedure initial {
  %x.a = moore.struct_extract_ref %x, "a"
  moore.nonblocking_assign %x.a, 42 after 1ns
}
moore.procedure initial {
  %x.b = moore.struct_extract_ref %x, "b"
  moore.nonblocking_assign %x.b, 9001 after 2ns
}

Here you'd be tempted to try and convert the assignment to the struct fields into assignments of the entire struct, like the following:

moore.procedure initial {
  %0 = moore.read %x
  %1 = moore.struct_inject %0, "a", 42
  moore.nonblocking_assign %x, %1 after 1ns
}
moore.procedure initial {
  %0 = moore.read %x
  %1 = moore.struct_inject %0, "b", 9001
  moore.nonblocking_assign %x, %1 after 2ns
}

Unfortunately this wouldn't work properly, since the two procedures would read the initial value of x, {0, 0}, compute an updated value, {42, 0} and {0, 9001}, and then schedule an assignment at 1ns and 2ns. This means that x would have the following values:

0ns  x = {0, 0}
1ns  x = {42, 0}
2ns  x = {0, 9001}  // correct would be {42, 9001}

I suspect that there will be a lot of other examples where you would get a similar kind of misbehavior by simply overwriting an entire struct's value where you originally only accessed an individual field.

There are a few ways to make this work though:

  1. You could rely on the subtle semantics of packed structs which, if I recall correctly, guarantee that even when you write to just a field, the write will update the entire struct in the background. You could only do your conversion if the struct or array is packed.
  2. You could ensure that the struct or array you are trying to optimize is only accessed by the current procedure, and that there is absolutely no way for other references to the same struct to exists (no aliasing). This would be the case for example if the struct/array ref you're dealing with comes directly from a moore.variable (not a block argument or some other op), and that all uses of the variable are in the current procedure, and there is no time/event control that could suspend execution (or alternatively that all uses are in the same block and there are no side-effecting ops in between them). If this is the case, you are sure that you see every assignment to the struct, and then you can perform any kind of merging that you want 😃.
  3. You could treat module-level moore.assigns to variables differently. These should be simpler to deal with: if there is no aliasing (again, the struct/array ref does not escape through an output port, or is not assigned in any procedure) and every field of the struct has a single unique assign to it, then you can convert everything into an assign for the entire struct. This should actually be handled more or less by the mem2reg pass already. This would be a very nice optimization to have! 🥳

In any case, since this is a pretty complicated optimization that requires analysis across an entire module, this would have to be a separate transformation pass instead of a canonicalizer.

@mingzheTerapines
Copy link
Contributor Author

@fabianschuiki Yes I found you mentioned in #7158 (comment)

 Pretty sure moore.struct_inject doesn't really work for module-level variables, since the assignments in a module all occur in parallel, so there is no order in the sense of old struct before the assignment and new struct after the assignment.

So I let moore.struct_inject works in moore.blocking_assign canonicalize pass, it make sense.
Also I remembered you mentioned reading as a struct and writing would be faster than reading as several parts and writing. So personally I think moore.struct_inject is still efficient for moore.blocking_assign.
SROA could work for parallel case.

@fabianschuiki
Copy link
Contributor

I like the idea of using struct_inject for blocking assigns in procedures, for variables declared in the procedure! 😃 That way you don't have to reason about module-level variables that may have other reads/assigns that you cannot easily control.

Mem2reg should offer a way to generate struct injects for assignments through subfield accesses. I'm not entirely sure how that works; you might have to declare a memory slot for every field in a struct/array?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants