[Arc] Remove obsolete arc.clock_tree and arc.passthrough ops #7704

fabianschuiki · 2024-10-14T02:43:18Z

The new LowerState pass does not produce arc.clock_tree and arc.passthrough ops anymore. Remove them from the dialect entirely.

TaoBi22

LGTM (once it builds 😉)!

maerhart · 2024-10-15T10:55:46Z

lib/Dialect/Arc/Transforms/LowerClocksToFuncs.cpp

-    funcName.append("_passthrough");
-    builder.create<func::CallOp>(clockOp->getLoc(), funcName, TypeRange{},
-                                 ValueRange{funcOp.getBody().getArgument(0)});
-  }


If the initial does affect an output port value directly, do we not need to guarantee that this change is propagated to the output port?

Yeah that's a good question. Since passthrough is now gone, this boils down to whether the initial function should also call the eval function to propagate the effects of the initial assignments to outputs and side-effecting ops. I'm trying to think whether there is a use case where the user wants to run initial, but then do some other tweaking (or maybe dump a first timestep to VCD?) before calling the first eval and potentially triggering output or state changes.

@uenoku Has observed that some simulators have fairly intricate bring-up sequences. Generally it seems like there's an initial value assignment to variables, often X, then an execution of all variable initializers, then executing all initial blocks, and then triggering any posedge and the like based on those variables. We might need to do something similar at some point? Maybe some form of tiered arc.initial to have containers for the different rounds of initialization?

That is what I've been desperately trying to avoid so far. But as the SV LRM so nonchalantly states on initialization at declaration:

This may require a special pre-initial pass at run time.

☹️

Considering that commercial simulators also tend to take their freedoms here [citation needed], I was hoping we could avoid that madness. I feel when we have to debate whether a posedge has occurred when a bit is default initialized to 0, then initialized to 1 at declaration, and then initialized back to 0 in an initial process, we have already lost.

Yeah I totally agree. We should probably imitate the basic fixing of Verilog semantics that simulators do with their observable X-to-initial-value transitions, because otherwise Verilog's always @(...) combinational processes and async resets are virtually unusable. But we should do as little as possible/necessary.

For the triggered processes over in #7676 I currently require a tieoff constant for all processes, which also serves as pre-initial value. I've thought about changing this to take !seq.immutable values. Considering that neither me nor @uenoku have intended it that way, I think this would mesh surprisingly well. It would de facto push seq.initial into the pre-initial phase and we wouldn't be able to use non-!seq.immutable values in that phase. But I guess this is what we already arrived at in #7638 ?!?

But to go back to Martin's original question here: I think it is a fair choice to not force propagation after initialization. In the current implementation I've ended up introducing an inconsistency between models without an initializer (which won't call passthrough at startup) vs. the same model with an empty initializer (which will call passthrough). It's a good thing to avoid that.

This is a complete rewrite of the `LowerState` pass that makes the `LegalizeStateUpdate` pass obsolete. The old implementation of `LowerState` produces `arc.model`s that still contain read-after-write conflicts. This primarily happens because the pass simply emits `arc.state_write` operations that write updated values to simulation memory for each `arc.state`, and any user of `arc.state` would use an `arc.state_read` operation to retrieve the original value of the state before any writes occurred. Memories are similar. The Arc dialect considers `arc.state_write` and `arc.memory_write` operations to be _deferred_ writes until the `LegalizeStateUpdate` pass runs, at which point they become _immediate_ since the legalization inserts the necessary temporaries to resolve the read-after-write conflicts. The previous implementation would also not handle state-to-output and state-to-side-effecting-op propagation paths correctly. When a model's eval function is called, registers are updated to their new value, and any outputs that combinatorially depend on those new values should also immediately update. Similarly, operations such as asserts or debug trackers should observe new values for states immediately after they have been written. However, since all writes are considered deferred, there is no way for `LowerState` to produce a mixture of operations that depend on a register's _old_ state (because they are used to compute a register's new state), and on a _new_ state because they are combinatorially derived values. This new implementation of `LowerState` completely avoids read-after-write conflicts. It does this by changing the way modules are lowered in two ways: **Phases:** The pass tracks in which _phase_ of the simulation lifecycle a value is needed and allows for operations to have different lowerings in different phases. An `arc.state` operation for example requires its inputs, enable, and reset to be computed based on the _old_ value they had, i.e. the value the end of the previous call to the model's eval function. The clock however has to be computed based on the _new_ value it has in the current call to eval. Therefore, the ops defining the inputs, enable, and reset are lowered in the _old_ phase, while the ops defining the clock are lowered in the _new_ phase. The `arc.state` op lowering will then write its _new_ value to simulation storage. This phase tracking allows registers to be used as the clock for other registers: since the clocks require _new_ values, registers serving as clock to others are lowered first, such that the dependent registers can immediately react to the updated clock. It also allows for module outputs and side-effecting ops based on `arc.state`s to be scheduled after the states have been updated, since they depend on the state's _new_ value. The pass also covers the situation where an operation depends on a module input and a state, and feeds into a module output as well as another state. In this situation that operation has to be lowered twice: once for the _old_ phase to serve as input to the subsequent state, and once for the _new_ phase to compute the new module output. In addition to the _old_ and _new_ phases representing the previous and current call to eval, the pass also models an _initial_ and _final_ phase. These are used for `seq.initial` and `llhd.final` ops, and in order to compute the initial values for states. If an `arc.state` op has an initial value operand it is lowered in the _initial_ phase. Similarly for the ops in `llhd.final`. The pass places all ops lowered in the initial and final phases into corresponding `arc.initial` and `arc.final` ops. At a later point we may want to generate the `*_initial`, `*_eval`, and `*_final` functions directly. **No more clock trees:** The new implementation also no longer generates `arc.clock_tree` and `arc.passthrough` operations. These were a holdover from the early days of the Arc dialect, where no eval function would be generated. Instead, the user was required to directly call clock functions. This was never able to model clocks changing at the exact same moment, or having clocks derived from registers and other combinatorial operations. Since Arc has since switched to generating an eval function that can accurately interleave the effects of different clocks, grouping ops by clock tree is no longer needed. In fact, removing the clock tree ops allows for the pass to more efficiently interleave the operations from different clock domains. The Rocket core in the circt/arc-tests repository still works with this new implementation of LowerState. In combination with the MergeIfs pass the performance stays the same. I have renamed the implementation and test files to make the git diffs easier to read. The names will be changed back in a follow-up commit.

The new LowerState pass does not produce `arc.clock_tree` and `arc.passthrough` ops anymore. Remove them from the dialect entirely.

fabianschuiki added the Arc Involving the `arc` dialect label Oct 14, 2024

fabianschuiki requested a review from TaoBi22 October 14, 2024 02:43

fabianschuiki requested a review from maerhart as a code owner October 14, 2024 02:43

TaoBi22 approved these changes Oct 14, 2024

View reviewed changes

fabianschuiki force-pushed the fschuiki/arc-new-lower-state branch from 252420f to e648462 Compare October 14, 2024 17:07

fabianschuiki force-pushed the fschuiki/arc-remove-clock-trees branch 2 times, most recently from 2c1bc29 to 7da872c Compare October 14, 2024 17:40

fabianschuiki force-pushed the fschuiki/arc-new-lower-state branch from e648462 to 821d12f Compare October 14, 2024 17:40

maerhart reviewed Oct 15, 2024

View reviewed changes

fabianschuiki force-pushed the fschuiki/arc-new-lower-state branch from 821d12f to 2b87d2e Compare October 15, 2024 16:45

fabianschuiki force-pushed the fschuiki/arc-remove-clock-trees branch 2 times, most recently from 2af2512 to b4d2ae4 Compare October 15, 2024 17:18

fabianschuiki force-pushed the fschuiki/arc-new-lower-state branch from 2b87d2e to 8c64305 Compare October 15, 2024 17:18

fabianschuiki force-pushed the fschuiki/arc-remove-clock-trees branch from b4d2ae4 to 8bef704 Compare October 15, 2024 21:50

fabianschuiki force-pushed the fschuiki/arc-new-lower-state branch from 8c64305 to ca2369f Compare October 15, 2024 21:50

fabianschuiki force-pushed the fschuiki/arc-remove-clock-trees branch from 8bef704 to fce7727 Compare October 21, 2024 18:46

fabianschuiki force-pushed the fschuiki/arc-new-lower-state branch from ca2369f to 401c3cf Compare October 21, 2024 18:46

fabianschuiki added 2 commits October 28, 2024 14:36

[Arc] Remove obsolete arc.clock_tree and arc.passthrough ops

33c0e5a

The new LowerState pass does not produce `arc.clock_tree` and `arc.passthrough` ops anymore. Remove them from the dialect entirely.

fabianschuiki force-pushed the fschuiki/arc-new-lower-state branch from 401c3cf to c0246c2 Compare October 28, 2024 21:36

fabianschuiki force-pushed the fschuiki/arc-remove-clock-trees branch from fce7727 to 33c0e5a Compare October 28, 2024 21:36

Base automatically changed from fschuiki/arc-new-lower-state to main October 28, 2024 21:57

fabianschuiki merged commit ed17d33 into main Oct 28, 2024
8 checks passed

fabianschuiki deleted the fschuiki/arc-remove-clock-trees branch October 28, 2024 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Arc] Remove obsolete arc.clock_tree and arc.passthrough ops #7704

[Arc] Remove obsolete arc.clock_tree and arc.passthrough ops #7704

fabianschuiki commented Oct 14, 2024

TaoBi22 left a comment

maerhart Oct 15, 2024

fabianschuiki Oct 15, 2024

fzi-hielscher Oct 15, 2024 •

edited

Loading

fabianschuiki Oct 17, 2024

fzi-hielscher Oct 18, 2024 •

edited

Loading

[Arc] Remove obsolete arc.clock_tree and arc.passthrough ops #7704

[Arc] Remove obsolete arc.clock_tree and arc.passthrough ops #7704

Conversation

fabianschuiki commented Oct 14, 2024

TaoBi22 left a comment

Choose a reason for hiding this comment

maerhart Oct 15, 2024

Choose a reason for hiding this comment

fabianschuiki Oct 15, 2024

Choose a reason for hiding this comment

fzi-hielscher Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

fabianschuiki Oct 17, 2024

Choose a reason for hiding this comment

fzi-hielscher Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

fzi-hielscher Oct 15, 2024 •

edited

Loading

fzi-hielscher Oct 18, 2024 •

edited

Loading