-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented Two pass ACO #45
Conversation
@@ -52,6 +52,8 @@ ACOScheduler::ACOScheduler(DataDepGraph *dataDepGraph, | |||
decay_factor = schedIni.GetFloat("ACO_DECAY_FACTOR"); | |||
ants_per_iteration = schedIni.GetInt("ACO_ANT_PER_ITERATION"); | |||
print_aco_trace = schedIni.GetBool("ACO_TRACE"); | |||
aco_use_two_pass = schedIni.GetBool("ACO_USE_TWO_PASS"); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra whitespace
@@ -37,6 +43,9 @@ class ACOScheduler : public ConstrainedScheduler { | |||
pheremone_t &Pheremone(SchedInstruction *from, SchedInstruction *to); | |||
pheremone_t &Pheremone(InstCount from, InstCount to); | |||
double Score(SchedInstruction *from, Choice choice); | |||
FUNC_RESULT performSchedPass(InstSchedule *schedule_out); | |||
bool shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched, InstSchedule* newSched); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to match the LLVM style without using clang-format. I will run it through clang-format and will do so in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't follow LLVM coding conventions consider using clang-forrmat.
@@ -251,14 +253,59 @@ FUNC_RESULT ACOScheduler::FindSchedule(InstSchedule *schedule_out, | |||
SchedRegion *region) { | |||
rgn_ = region; | |||
|
|||
if(!aco_use_two_pass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not LLVM style
//the first pass result is the initial schedule the second pass | ||
pass = ACOPass::FIRST; | ||
FUNC_RESULT pass1Res = performSchedPass(schedule_out); | ||
if(pass1Res!=RES_SUCCESS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space between !=
pass = ACOPass::SECOND; | ||
setInitialSched(schedule_out); | ||
return performSchedPass(schedule_out); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline
bool ACOScheduler::shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched, | ||
InstSchedule* newSched) { | ||
//Logger::Info("old_sched_addr:%p, new_sched_addr:%p", oldSched, newSched); | ||
return newSched!=NULL && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be broken up into steps. Maybe with some static functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make it static and look into breaking it up into steps/functions. I didn't before, because I didn't want to add extra branches to fairly hot code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make it static and look into breaking it up into steps/functions. I didn't before, because I didn't want to add extra branches to fairly hot code
Let compiler do the optimization it will inline it. LLVM programming guide recommends this as well.
&& newSched->GetSpillCost() <= oldSched->GetSpillCost()) || | ||
(pass==ACOPass::ALL_IN_ONE && | ||
newSched->GetCost() < oldSched->GetCost())); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra whitespace
} | ||
|
||
FUNC_RESULT ACOScheduler::performSchedPass(InstSchedule *schedule_out) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline
InstCount heuristicCost = | ||
FindOneSchedule()->GetCost() + 1; // prevent divide by zero | ||
(pass==ACOPass::FIRST ? heuSched->GetSpillCost():heuSched->GetCost()) + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space around :
// initialize pheremone | ||
// for this, we need the cost of the pure heuristic schedule | ||
int pheremone_size = (count_ + 1) * count_; | ||
for (int i = 0; i < pheremone_size; i++) | ||
pheremone_[i] = 1; | ||
initialValue_ = 1; | ||
InstSchedule* heuSched = FindOneSchedule(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique ptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACOScheduler::FindOneSchedule() returns a raw pointer to a new allocated InstScheule. Previously this was leaked. Code before this commit would call GetCost() on the schedule returned without deleting it. I agree that we should use unique_ptr s in the future, but the scope of this commit was adding the code for two pass ACO. I intend to move ACOScheduler over to using unique_ptr s in the future.
//compares to schedules taking into account which pass | ||
//we are on and returns whether the old schedule | ||
//is inferior to the new one and ought to be replaced | ||
bool ACOScheduler::shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this function? I don't see a problem with the way we did it before. If we're using two-pass with the settings controlled by the wrapper, the first pass will only have instructions of latency 1. All schedules for the same scheduling region will have the same schedule length but differing RP costs so we only compare RP costs anyways in the first pass. In the second pass, the DDG is re-built to account for latencies so we will compare both schedule length (with stalls) and RP costs. If the RP weight is set high enough, the RP shouldn't ever decrease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ACO Scheduler is currently called from SchedRegion::FindOptimalSchedule which passes through the DDG that was created in the wrapper to ACO. If we were to create a new DDG for each pass of the ACO scheduler we would have to move the code that calls ACO up to the wrapper as opposed to SchedRegion::FindOptimalSchedule. This could be done but it would add further bloat to ScheduleDAGOptSched::schedule and would necessitate further rewrites of SchedRegion(e.g.: sched region calls the heuristic scheduler so it would also have to be moved up to the Wrapper if we wanted to have heuristic pass -> ACO pass). I think this also answers your other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally looked into constructing a new DDG for each ACO pass, but upon discovering the considerations listed in the previous comment I decided to use the GetCost()/GetSpillCost() functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the current two-pass works should already accomplish this. A new DDG is created for each pass. Using the correct DDG (all instructions 1 latency) for the first pass should make the problem much simpler for ACO. Additionally, one of the reasons why we wanted to separate the heuristic and the ACO scheduler was to use the current two-pass with ACO without much modification to ACO. ACO only has to be aware of the best minimal RP schedule found in the first pass (with stalls inserted to account for latency) and that is accomplished by setInitialSchedule(). A two-pass of ACO should already work with the current two-pass if both are enabled and the heuristic and B&B schedulers are disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have previously heard that we ultimately want to run both ACO and B&B when the scheduler executes. I thought that if we are using two pass ACO we would want to run both ACO passes first and then go on to the B&B passes, whereas the current implementation would run the passes in the order RP ACO, RP B&B, ACO, B&B. We should discuss the status of this commit at the meeting tomorrow especially given that two pass ACO did not give performance improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't rebuild the DDG you can't do things like graph transformations.
I think we need to do some refactoring to sched_region and the wrapper. There should be target hooks for all these things instead of custom modules like GCNOptSched.
We should define a scheduling pass interface that returns some information about the result of that scheduling pass. The current layout is getting untenable.
We should not have things like 1 pass scheduling, 2 pass scheduling, ACO 2 pass scheduling. There needs to be scheduling passes which can be applied independently, and in any order. They should return structs with information that has the state after scheduling.
The scheduler code itself should use inheritance to contain the state about what kind of a scheduling pass it is. I.e. there should be two versions of performSchedPass, updatePheromone based on the type of ACO scheduler is currently being run. The same should be true for 2-pass B&B. We need multiple version of sched_region.
I don't have time to work on this right now there are many things I'd like to do with the project but simply don't have the time. Since what matters here is the research and not necessarily the best software engineer principals if this is cleaned up a bit in terms of formatting it's OK to me.
What do you think Vang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Austin, I agree we need to spend some time and decide on how to refactor our current project. I believe we also have plans to parallelize our scheduler in the future and to do that we will definitely need to refactor our code. I will discuss this with Dr. Shobaki tomorrow on what he thinks and what the priority should be on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think he cares at all about refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue is that there may be other tasks that take priority over refactoring the current code. Additionally, what should the scope of the refactoring be? Should we also take parallelization into account and refactor for that, or do we do that later? For example, right now I believe the cost calculation in bb_spill can be refactored and we will need to do so for parallelization. We can't calculate the cost of two schedules in parallel because bb_spill stores the states of the schedule and we only have one copy of bb_spill. Although, If he gives us the go-ahead, then I believe we should start focusing on refactoring right away and with parallelization in mind.
} | ||
else | ||
{ | ||
//in the first pass we schedule for register pressure only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why perform individual passes in the aco class? The Data-Dependency Graphs (DDG) need to be re-constructed for each pass to account for latency and this is done in the wrapper.
//compares to schedules taking into account which pass | ||
//we are on and returns whether the old schedule | ||
//is inferior to the new one and ought to be replaced | ||
bool ACOScheduler::shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this function? I don't see a problem with the way we did it before. If we're using two-pass with the settings controlled by the wrapper, the first pass will only have instructions of latency 1. All schedules for the same scheduling region will have the same schedule length but differing RP costs so we only compare RP costs anyways in the first pass. In the second pass, the DDG is re-built to account for latencies so we will compare both schedule length (with stalls) and RP costs. If the RP weight is set high enough, the RP shouldn't ever decrease.
Based off the comments on the thread and the conversation Vang and I had with Dr. Shobaki in which we decided that this two ACO implementation is redundant with Ciprian's work I am closing this pull request. I will make a new pull request which will replace the raw pointers with unique pointers/vectors, fix the memory leak I identified, and add the logging changes. |
Code for the ACO two pass scheduler. Prints extra information about ACO improvements to stdout.
Also fixes a small memory leak in the ACO scheduler. To to best of my knowledge this is llvm format compliant.