Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart seq split #2217

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from
Draft

Smart seq split #2217

wants to merge 17 commits into from

Conversation

parthsarkar17
Copy link
Contributor

Similar to duplication, the goal with splitting a seq block is ultimately to reduce fanout from the one register that normally controls a sequential schedule. Initially, we tried to transform:

seq { A; B; C; D; E; F; }

into:

seq { 
    @new_fsm seq {A; B; C;}
    @new_fsm seq {D; E; F;}
}

The children schedules generated control registers like so:

@generated fsm0 = std_reg(2);
@generated fsm1 = std_reg(2);

which was the goal, except, with this, the following groups and assignments would be generated:

group tdcc0 {
    A[go] = ...
    ...
    fsm0.in = fsm0.out == 2'd0 & A[done] ? 2'd1; // line 1
    ...
}
group tdcc1 {
    D[go] = ...
    ...
    D[go] = fsm1.in = fsm0.out == 2'd0 & D[done] ? 2'd1; // line 2
    ...
}

Lines 1 and 2 are pretty similar; they each check the current state of the FSM register and ensure some other group has finished, and then they each update their register's value with a new value (the same value for each register!). Once we got synthesis results from this "new-fsm-insertion" method, we saw that WS decreased and LUT usage increased; we suspected it had something to do with the fact that we were duplicating the logic to transition FSM states, since both fsm0 and fsm1 have identical transition conditions (up to the names of the groups they are controlling) and new values.

So, we decided to open up an option in TDCC that lets a seq block be controlled by a parent register, a child register, and duplicated versions of each register (that agree with their respective original at each cycle). We can hopefully, therefore, get the benefits of reducing fan-out, while also ensuring that we reuse logic wherever we can. Here's what the tdcc group will look like for the above control block:

group tdcc {
    A[go] = !A[done] & parent0 == 1'd0 & fsm0 == 2'd0;

    ...  // notice how the enable queries are split among the two sets of registers. that's for fan-out reduction

    F[g] = !F[done] & parent1 = 1'd1 & fsm1 = 2'd2;
    
    ... // notice how transition logic is shared and not duplicated

    fsm0.in = A[done] & parent0 == 1'd0 & fsm0 == 2'd0 ? 2'd1;
    fsm1.in = A[done] & parent0 == 1'd0 & fsm0 == 2'd0 ? 2'd1;
   
   ... // transition logic for parents isn't shown
}

In short, the idea is duplication, but with an emphasis on making sure the registers reuse logic to update themselves. Benchmarking + synthesis results are in progress.

parthsarkar17 and others added 14 commits July 16, 2024 15:53
- need to add guards for parent fsm
- need to update  with correct logic in case of split, instead of just single/duplicate
- added up till + including parent fsm transitions
- next, make sure done / resets are implememented
- need to implement done/reset assignments
- need to test enable and transition assignments
- need to factor out & of all guards in a vec into ir::guard
- need to add correctness tests to runt.toml file
@parthsarkar17 parthsarkar17 requested a review from calebmkim July 22, 2024 20:59
@rachitnigam
Copy link
Contributor

@parthsarkar17 what is the status of this PR and what is blocking it from being merged?

@parthsarkar17
Copy link
Contributor Author

parthsarkar17 commented Aug 6, 2024

@rachitnigam, @calebmkim and I were deciding whether this splitting technique was "better" than duplication or not, given that they were pretty similar. We were also considering separating the idea of offloading onto a child register from the idea of creating two, identical children and parent registers (which is what this branch's code does).

Originally, the idea would be that for seq schedules, "smart splitting" would always be better than duplication. But, that benefit isn't clear right now (but there are benefits). For now, I think what could be good is to introduce another tdcc option for "offloading with duplication" and then merge this branch. I'll get on that

@rachitnigam
Copy link
Contributor

@parthsarkar17, that makes sense! I think we should perhaps consider making a page in docs.calyxir.org that describes the various options and links to the experiments? My long-term worry is that all of these options will be impossible to understand without the context of specific experiments and rationale around them.

@parthsarkar17
Copy link
Contributor Author

Yeah, agreed, it seems like it can quickly get out of control. That's a good plan! I'll get on it

@parthsarkar17 parthsarkar17 marked this pull request as ready for review August 7, 2024 19:53
@calebmkim
Copy link
Contributor

calebmkim commented Aug 7, 2024

Thanks for the PR + detailed explanation. One high level question: it seems like this PR is doing two things to splitting:

  1. It is duplicating the registers (both the parent and children)
  2. it is using the same register(s) for both children, AFAICT

I'm wondering where the benefit is coming from? Is it (1), (2), or a combination of both?

To explain what I mean, I've attached a drawing to show how I understand of each thing (please correct if I'm misunderstanding): each color is a register (if a color appears twice, then that means the same register is being used): this is building off your example, lmk if it makes sense.

Screenshot 2024-08-07 at 7 23 18 PM

Before merging this, I think having a comparison of performance for (1) vs. (2) vs. (3) vs. (4) would be helpful. Sorry for being so picky; I just want to avoid the problem you guys have already identified: these options are becoming too difficult to understand without specific context; it's also getting difficult to understand where the benefits in performance are coming from.

Lmk if my drawings make sense, or if you want to have a synchronous discussion.

@rachitnigam
Copy link
Contributor

@parthsarkar17 are we still planning to merge this PR?

Copy link
Contributor

github-actions bot commented Dec 6, 2024

This pull request has not seen activity in 14 days and is being marked as stale. If you're continuing work on this, please reply and state how to get this PR in a mergeable state or what issues it is blocked on. If the PR is not ready for review, please mark it as a draft. The PR will be closed in 7 days if there is no further activity.

Copy link
Contributor

This stale PR is being closed because it has not seen any activity in 7 days. If you're planning to continue work on it, please reopen it and mention how to make progress on it.

@github-actions github-actions bot closed this Dec 14, 2024
@rachitnigam rachitnigam reopened this Dec 14, 2024
@rachitnigam
Copy link
Contributor

@parthsarkar17 marking this as a draft so StaleBot doesn't close it for now

@rachitnigam rachitnigam removed the request for review from calebmkim December 14, 2024 12:09
@rachitnigam rachitnigam marked this pull request as draft December 14, 2024 12:09
@github-actions github-actions bot removed the S: Stale label Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants