Implemented Two pass ACO #45

paul-mchugh · 2020-02-25T21:42:14Z

Code for the ACO two pass scheduler. Prints extra information about ACO improvements to stdout.
Also fixes a small memory leak in the ACO scheduler. To to best of my knowledge this is llvm format compliant.

kerbowa · 2020-02-25T22:39:47Z

lib/Scheduler/aco.cpp

@@ -52,6 +52,8 @@ ACOScheduler::ACOScheduler(DataDepGraph *dataDepGraph,
  decay_factor = schedIni.GetFloat("ACO_DECAY_FACTOR");
  ants_per_iteration = schedIni.GetInt("ACO_ANT_PER_ITERATION");
  print_aco_trace = schedIni.GetBool("ACO_TRACE");
+  aco_use_two_pass = schedIni.GetBool("ACO_USE_TWO_PASS");
+


Extra whitespace

kerbowa · 2020-02-25T22:40:40Z

include/opt-sched/Scheduler/aco.h

@@ -37,6 +43,9 @@ class ACOScheduler : public ConstrainedScheduler {
  pheremone_t &Pheremone(SchedInstruction *from, SchedInstruction *to);
  pheremone_t &Pheremone(InstCount from, InstCount to);
  double Score(SchedInstruction *from, Choice choice);
+  FUNC_RESULT performSchedPass(InstSchedule *schedule_out);
+  bool shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched, InstSchedule* newSched);
+


Extra whitespace

I tried to match the LLVM style without using clang-format. I will run it through clang-format and will do so in the future.

kerbowa

This doesn't follow LLVM coding conventions consider using clang-forrmat.

kerbowa · 2020-02-25T22:41:10Z

lib/Scheduler/aco.cpp

@@ -251,14 +253,59 @@ FUNC_RESULT ACOScheduler::FindSchedule(InstSchedule *schedule_out,
                                       SchedRegion *region) {
  rgn_ = region;

+  if(!aco_use_two_pass)


Not LLVM style

kerbowa · 2020-02-25T22:42:04Z

lib/Scheduler/aco.cpp

+    //the first pass result is the initial schedule the second pass
+    pass = ACOPass::FIRST;
+    FUNC_RESULT pass1Res = performSchedPass(schedule_out);
+	if(pass1Res!=RES_SUCCESS) {


Space between !=

kerbowa · 2020-02-25T22:42:33Z

lib/Scheduler/aco.cpp

+    pass = ACOPass::SECOND;
+    setInitialSched(schedule_out);
+    return performSchedPass(schedule_out);
+


Extra newline

kerbowa · 2020-02-25T22:43:51Z

lib/Scheduler/aco.cpp

+bool ACOScheduler::shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched,
+                                         InstSchedule* newSched) {
+  //Logger::Info("old_sched_addr:%p, new_sched_addr:%p", oldSched, newSched);
+  return newSched!=NULL &&


This needs to be broken up into steps. Maybe with some static functions.

I will make it static and look into breaking it up into steps/functions. I didn't before, because I didn't want to add extra branches to fairly hot code

I will make it static and look into breaking it up into steps/functions. I didn't before, because I didn't want to add extra branches to fairly hot code

Let compiler do the optimization it will inline it. LLVM programming guide recommends this as well.

kerbowa · 2020-02-25T22:44:06Z

lib/Scheduler/aco.cpp

+         && newSched->GetSpillCost() <= oldSched->GetSpillCost()) ||
+         (pass==ACOPass::ALL_IN_ONE &&
+         newSched->GetCost() < oldSched->GetCost()));
+


Extra whitespace

kerbowa · 2020-02-25T22:44:26Z

lib/Scheduler/aco.cpp

+}
+
+FUNC_RESULT ACOScheduler::performSchedPass(InstSchedule *schedule_out) {
+


Extra newline

kerbowa · 2020-02-25T22:45:02Z

lib/Scheduler/aco.cpp

  InstCount heuristicCost =
-      FindOneSchedule()->GetCost() + 1; // prevent divide by zero
+    (pass==ACOPass::FIRST ? heuSched->GetSpillCost():heuSched->GetCost()) + 1;


Space around :

kerbowa · 2020-02-25T22:45:30Z

lib/Scheduler/aco.cpp

  // initialize pheremone
  // for this, we need the cost of the pure heuristic schedule
  int pheremone_size = (count_ + 1) * count_;
  for (int i = 0; i < pheremone_size; i++)
    pheremone_[i] = 1;
  initialValue_ = 1;
+  InstSchedule* heuSched = FindOneSchedule();


ACOScheduler::FindOneSchedule() returns a raw pointer to a new allocated InstScheule. Previously this was leaked. Code before this commit would call GetCost() on the schedule returned without deleting it. I agree that we should use unique_ptr s in the future, but the scope of this commit was adding the code for two pass ACO. I intend to move ACOScheduler over to using unique_ptr s in the future.

kerbowa · 2020-02-25T22:45:56Z

lib/Scheduler/aco.cpp

+//compares to schedules taking into account which pass
+//we are on and returns whether the old schedule
+//is inferior to the new one and ought to be replaced
+bool ACOScheduler::shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched,


This can be static

Do we need this function? I don't see a problem with the way we did it before. If we're using two-pass with the settings controlled by the wrapper, the first pass will only have instructions of latency 1. All schedules for the same scheduling region will have the same schedule length but differing RP costs so we only compare RP costs anyways in the first pass. In the second pass, the DDG is re-built to account for latencies so we will compare both schedule length (with stalls) and RP costs. If the RP weight is set high enough, the RP shouldn't ever decrease.

The ACO Scheduler is currently called from SchedRegion::FindOptimalSchedule which passes through the DDG that was created in the wrapper to ACO. If we were to create a new DDG for each pass of the ACO scheduler we would have to move the code that calls ACO up to the wrapper as opposed to SchedRegion::FindOptimalSchedule. This could be done but it would add further bloat to ScheduleDAGOptSched::schedule and would necessitate further rewrites of SchedRegion(e.g.: sched region calls the heuristic scheduler so it would also have to be moved up to the Wrapper if we wanted to have heuristic pass -> ACO pass). I think this also answers your other comment.

I originally looked into constructing a new DDG for each ACO pass, but upon discovering the considerations listed in the previous comment I decided to use the GetCost()/GetSpillCost() functions.

The way the current two-pass works should already accomplish this. A new DDG is created for each pass. Using the correct DDG (all instructions 1 latency) for the first pass should make the problem much simpler for ACO. Additionally, one of the reasons why we wanted to separate the heuristic and the ACO scheduler was to use the current two-pass with ACO without much modification to ACO. ACO only has to be aware of the best minimal RP schedule found in the first pass (with stalls inserted to account for latency) and that is accomplished by setInitialSchedule(). A two-pass of ACO should already work with the current two-pass if both are enabled and the heuristic and B&B schedulers are disabled.

I have previously heard that we ultimately want to run both ACO and B&B when the scheduler executes. I thought that if we are using two pass ACO we would want to run both ACO passes first and then go on to the B&B passes, whereas the current implementation would run the passes in the order RP ACO, RP B&B, ACO, B&B. We should discuss the status of this commit at the meeting tomorrow especially given that two pass ACO did not give performance improvements

If you don't rebuild the DDG you can't do things like graph transformations.

I think we need to do some refactoring to sched_region and the wrapper. There should be target hooks for all these things instead of custom modules like GCNOptSched.

We should define a scheduling pass interface that returns some information about the result of that scheduling pass. The current layout is getting untenable.

We should not have things like 1 pass scheduling, 2 pass scheduling, ACO 2 pass scheduling. There needs to be scheduling passes which can be applied independently, and in any order. They should return structs with information that has the state after scheduling.

The scheduler code itself should use inheritance to contain the state about what kind of a scheduling pass it is. I.e. there should be two versions of performSchedPass, updatePheromone based on the type of ACO scheduler is currently being run. The same should be true for 2-pass B&B. We need multiple version of sched_region.

I don't have time to work on this right now there are many things I'd like to do with the project but simply don't have the time. Since what matters here is the research and not necessarily the best software engineer principals if this is cleaned up a bit in terms of formatting it's OK to me.

What do you think Vang?

Austin, I agree we need to spend some time and decide on how to refactor our current project. I believe we also have plans to parallelize our scheduler in the future and to do that we will definitely need to refactor our code. I will discuss this with Dr. Shobaki tomorrow on what he thinks and what the priority should be on this.

I don't think he cares at all about refactoring.

The only issue is that there may be other tasks that take priority over refactoring the current code. Additionally, what should the scope of the refactoring be? Should we also take parallelization into account and refactor for that, or do we do that later? For example, right now I believe the cost calculation in bb_spill can be refactored and we will need to do so for parallelization. We can't calculate the cost of two schedules in parallel because bb_spill stores the states of the schedule and we only have one copy of bb_spill. Although, If he gives us the go-ahead, then I believe we should start focusing on refactoring right away and with parallelization in mind.

vangthao95 · 2020-02-25T21:59:44Z

lib/Scheduler/aco.cpp

+  }
+  else
+  {
+    //in the first pass we schedule for register pressure only


Why perform individual passes in the aco class? The Data-Dependency Graphs (DDG) need to be re-constructed for each pass to account for latency and this is done in the wrapper.

vangthao95 · 2020-02-25T23:35:33Z

lib/Scheduler/aco.cpp

+//compares to schedules taking into account which pass
+//we are on and returns whether the old schedule
+//is inferior to the new one and ought to be replaced
+bool ACOScheduler::shouldReplaceSchedule(ACOPass pass, InstSchedule* oldSched,


Do we need this function? I don't see a problem with the way we did it before. If we're using two-pass with the settings controlled by the wrapper, the first pass will only have instructions of latency 1. All schedules for the same scheduling region will have the same schedule length but differing RP costs so we only compare RP costs anyways in the first pass. In the second pass, the DDG is re-built to account for latencies so we will compare both schedule length (with stalls) and RP costs. If the RP weight is set high enough, the RP shouldn't ever decrease.

paul-mchugh · 2020-03-16T01:43:16Z

Based off the comments on the thread and the conversation Vang and I had with Dr. Shobaki in which we decided that this two ACO implementation is redundant with Ciprian's work I am closing this pull request. I will make a new pull request which will replace the raw pointers with unique pointers/vectors, fix the memory leak I identified, and add the logging changes.

Implemented ACO 2 pass

72ac075

kerbowa reviewed Feb 25, 2020

View reviewed changes

kerbowa requested changes Feb 25, 2020

View reviewed changes

vangthao95 reviewed Feb 25, 2020

View reviewed changes

kerbowa mentioned this pull request Feb 28, 2020

General Refactoring Planning #47

Open

paul-mchugh closed this Mar 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented Two pass ACO #45

Implemented Two pass ACO #45

paul-mchugh commented Feb 25, 2020

kerbowa Feb 25, 2020

kerbowa Feb 25, 2020

paul-mchugh Feb 26, 2020

kerbowa left a comment

kerbowa Feb 25, 2020

kerbowa Feb 25, 2020

kerbowa Feb 25, 2020

kerbowa Feb 25, 2020

paul-mchugh Feb 26, 2020

kerbowa Feb 27, 2020

kerbowa Feb 25, 2020

kerbowa Feb 25, 2020

kerbowa Feb 25, 2020

kerbowa Feb 25, 2020

paul-mchugh Feb 26, 2020

kerbowa Feb 25, 2020

vangthao95 Feb 25, 2020

paul-mchugh Feb 26, 2020

paul-mchugh Feb 26, 2020

vangthao95 Feb 26, 2020

paul-mchugh Feb 27, 2020 •

edited

Loading

kerbowa Feb 27, 2020

vangthao95 Feb 27, 2020

kerbowa Feb 27, 2020

vangthao95 Feb 27, 2020

vangthao95 Feb 25, 2020

vangthao95 Feb 25, 2020

paul-mchugh commented Mar 16, 2020

		}

		FUNC_RESULT ACOScheduler::performSchedPass(InstSchedule *schedule_out) {

Implemented Two pass ACO #45

Implemented Two pass ACO #45

Conversation

paul-mchugh commented Feb 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kerbowa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-mchugh Feb 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-mchugh commented Mar 16, 2020

paul-mchugh Feb 27, 2020 •

edited

Loading