General Refactoring Planning #47

kerbowa · 2020-02-28T00:33:39Z

Recent discussion in #45 and elsewhere has made it clear that we should make refactoring a priority. We have limited resources, however I believe that a strong argument can be made that these changes will help the project grow and have a positive net effect.

It seems like most active contributors feel similarly, so lets prioritize some of the idea's, and then decide how much time should be invested into this.

Why Refactor

This is an open source project. We would like to increase the number of collaborators. Companies have contacted us wanting to try our these scheduling techniques only to be discouraged by the lack of usability. The open source community respects well maintained code, and if the goal is to invite more collaboration we need to have things like easy-of-use, documentation, CI, ect in place.

The lack of any real testing infrastructure is a big problem. If the end goal is to have this be a tool recognized by the LLVM community, like say Poly for example, then this type of refactoring work is how we should start along that path.

What work needs to be done

Testing infrastructure. We should decide on lit, filecheck, gtest or some combination. In my opinion this is the most critical missing piece right now. It's impossible for new people to navigate the project without unit tests which can show them how the pieces of the scheduler work together.
Refactor scheduling pass manager. Each scheduling pass should be independent. There should be classes which represent a scheduling pass and the code should not rely on flags to change behavior for different passes.
The older code needs to be modernized. That means smart pointers, and variable names that are descriptive. It should be written with C++14 and LLVM programming style in mind. We should use iterators/vectors/maps ect and leverage the LLVM infrastructure whenever possible.
Use tablegen to generate the sched.ini options. The way we handle scheduler settings is horrible. It should be possible for all flags to be added on the CL as well as read from sched.ini. We also need to have sane defaults so that things work without messing with ini files.

I hope we can create some github issues with tasks that will make contributing to this project much easier for new people.

Quincunx271 · 2020-03-11T00:27:19Z

I've also noticed that we have a lot of protected member variables. It would be much easier to follow the code if these were private variables; that way, I can know what can and can't be changed at a given point of time.

Quincunx271 · 2020-03-11T00:35:37Z

I would also like to suggest renaming of functions and variables. UpdtRdyLst_ is not much shorter than UpdateRdyList_ or UpdateReadyList_, but those are much easier to read. There's a pervasive removal of vowels: Dynmc, PrirtyLst, RcrsvScsr.

Some shortenings are reasonable (Reg), but there should be very few of these. The pattern I've noticed is that I'm generally fine if it's a single syllable of the larger word, rather than the larger word with vowels removed: Reg vs Rgstr

kerbowa · 2020-03-11T18:50:40Z

I would also like to suggest renaming of functions and variables.

Agreed. It's frustrating if you don't remember exactly how a variable was spelled. Using English spelling fixes that problem :)

Quincunx271 · 2020-04-02T18:17:51Z

I don't know if this is a good idea: inline base classes for bases with only one derived type.

For example, GraphNode is only ever a SchedInstruction. It may make the code simpler to get rid of GraphNode entirely in favor of just having SchedInstruction, perhaps leaving GraphNode as a pure interface (no member variables).

Quincunx271 · 2020-04-09T17:09:10Z

Renaming variables/functions: how do we want to do this? One massive PR seems like a bad idea. That would be impossible to review. But if we split it into tons of PRs, we could easily end up with 10s of PRs. It's not entirely possible to restrict it to just a few files at a time, because we need to fix references to the functions/variables as well.

The options I can think of:

One massive PR (bad option IMO)
Do roughly one file at a time, e.g. alphabetically.
Do one kind of thing at a time (Cnt, Elmnt, Nxt, Frst). This would have a lot of overlapping lines.
As we do other things in the code, rename variables that are nearby what we are working on.

I think that either the 2nd or 4th option are the most realistic. The 4th would take a long time, but I think it may be the most realistic.

kerbowa · 2020-04-10T04:40:41Z

I think option 3, could be one or two "things" at a time. Maybe the particularly annoying names could be fixed first. For most it's probably just sed and replace.

Quincunx271 · 2020-04-10T15:25:11Z

Through my quick experiments, some of those already change 1000+ lines of code. That's an unreviewable size. But yeah, multiple things at once is possible. I agree that we should probably focus on the most annoying names first.

vangthao95 · 2020-04-10T15:31:44Z

Option 3 and 4 sounds good to me. I also agree that annoying names could be fixed first like UpdtRdyLst_. We could also focus on files that are worked on often like bb_spill, sched_region, sched_basic_data, etc.

Quincunx271 · 2020-05-18T17:48:37Z

I can't find examples inside the LLVM codebase of using TableGen for commandline arguments, which is what I'm understanding the proposal number 4 for replacing sched.ini is talking about. Instead, it seems that https://llvm.org/docs/CommandLine.html is used. Instead of using a sched.ini file, it's possible to load commandline flags from a file (clang argument: --config).

kerbowa · 2020-05-20T20:04:56Z

I was thinking of building our own tablegen generator. Although having everything be a LLVM CL option is simpler. We would still need some code to pass the options around. I was hoping to have something like SubtargetFeatures https://github.com/llvm/llvm-project/blob/master/llvm/utils/TableGen/SubtargetFeatureInfo.cpp.

Also, clang options are defined via talbegen: https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Driver/Options.td

I think the main issue with the current setup is that retrieving options can be unnecessarily expensive because of the way our config class is implemented.

I'd not heard of reading in CL options from a file before. Apparently that was added in LLVM 6.0. This is only for clang though right? One advantage of the current setup is that it doesn't matter how you are using the scheduler. We are not always doing it via clang. For example we can use llc or AMDGPU and comgr. With --config I think each setup would have a different process for changing options.

Ideally what we implement would have these features:

Config can be changed via CL options or an ini file.
Looking up options is cheap, and can be done using a single class.
We read in options once, instead of each time the MachineScheduler is instantiated.

Quincunx271 mentioned this issue Mar 11, 2020

Replace raw pointers with unique_ptr and vector #49

Closed

Quincunx271 mentioned this issue Mar 14, 2020

Add unittest infrastructure #51

Merged

Quincunx271 mentioned this issue Mar 28, 2020

Privatize / reduce scope of member variables #56

Merged

kerbowa mentioned this issue Apr 13, 2020

GT: Implement live-range covering (lengthened covered by shortened), plus relaxations #65

Merged

Quincunx271 mentioned this issue Oct 30, 2021

Fix variable and function names #176

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General Refactoring Planning #47

General Refactoring Planning #47

kerbowa commented Feb 28, 2020 •

edited

Loading

Quincunx271 commented Mar 11, 2020

Quincunx271 commented Mar 11, 2020

kerbowa commented Mar 11, 2020

Quincunx271 commented Apr 2, 2020

Quincunx271 commented Apr 9, 2020 •

edited

Loading

kerbowa commented Apr 10, 2020

Quincunx271 commented Apr 10, 2020

vangthao95 commented Apr 10, 2020

Quincunx271 commented May 18, 2020

kerbowa commented May 20, 2020

General Refactoring Planning #47

General Refactoring Planning #47

Comments

kerbowa commented Feb 28, 2020 • edited Loading

Why Refactor

What work needs to be done

Quincunx271 commented Mar 11, 2020

Quincunx271 commented Mar 11, 2020

kerbowa commented Mar 11, 2020

Quincunx271 commented Apr 2, 2020

Quincunx271 commented Apr 9, 2020 • edited Loading

kerbowa commented Apr 10, 2020

Quincunx271 commented Apr 10, 2020

vangthao95 commented Apr 10, 2020

Quincunx271 commented May 18, 2020

kerbowa commented May 20, 2020

kerbowa commented Feb 28, 2020 •

edited

Loading

Quincunx271 commented Apr 9, 2020 •

edited

Loading