[WIP][flang][OpenMP] Experimental pass to map `do concurrent` to OMP #77285

ergawy · 2024-01-08T08:43:07Z

Adds a pass to map do concurrent to OpenMP worksharing consturcts. For now, only maps basic loops to omp parallel do. This is still a WIP with more work needed for testing and mapping more advanced loops.

So far, the pass is not added to any flang pipelines; it only lives as a standalone pass tested by fir-opt. Once we hopefully agree on the approach and the pass matures, I will add it to flang.

llvmbot · 2024-01-08T08:43:39Z

@llvm/pr-subscribers-flang-fir-hlfir

Author: Kareem Ergawy (ergawy)

Changes

Adds a pass to map do concurrent to OpenMP worksharing consturcts. For now, only maps basic loops to omp parallel do. This is still a WIP with more work needed for testing and mapping more advanced loops.

Full diff: https://github.com/llvm/llvm-project/pull/77285.diff

6 Files Affected:

(modified) flang/include/flang/Optimizer/HLFIR/HLFIROps.td (+1-1)
(modified) flang/include/flang/Optimizer/Transforms/Passes.h (+2)
(modified) flang/include/flang/Optimizer/Transforms/Passes.td (+20)
(modified) flang/lib/Optimizer/Transforms/CMakeLists.txt (+1)
(added) flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp (+162)
(added) flang/test/Transforms/DoConcurrent/basic.mlir (+60)

diff --git a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
index 254771ca1780cf..718026e4fdfbe7 100644
--- a/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
+++ b/flang/include/flang/Optimizer/HLFIR/HLFIROps.td
@@ -75,7 +75,7 @@ def hlfir_DeclareOp : hlfir_Op<"declare", [AttrSizedOperandSegments,
     func.func @foo(%arg0: !fir.ref<!fir.array<?x?x!fir.char<1,?>>>, %arg1: !fir.ref<i64>) {
       %c10 = arith.constant 10 : index
       %c20 = arith.constant 20 : index
-      %1 = fir.load %ag1 : fir.ref<i64>
+      %1 = fir.load %arg1 : fir.ref<i64>
       %2 = fir.shape_shift %c10, %1, %c20, %1 : (index, index, index, index) -> !fir.shapeshift<2>
       %3 = hfir.declare %arg0(%2) typeparams %1 {uniq_name = "c"} (fir.ref<!fir.array<?x?x!fir.char<1,?>>>, fir.shapeshift<2>, index) -> (fir.box<!fir.array<?x?x!fir.char<1,?>>>, fir.ref<!fir.array<?x?x!fir.char<1,?>>>)
       // ... uses %3#0 as "c"
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.h b/flang/include/flang/Optimizer/Transforms/Passes.h
index 6970da8698ae84..1f389a1f571b75 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.h
+++ b/flang/include/flang/Optimizer/Transforms/Passes.h
@@ -93,6 +93,8 @@ std::unique_ptr<mlir::Pass> createFunctionAttrPass();
 std::unique_ptr<mlir::Pass>
 createFunctionAttrPass(FunctionAttrTypes &functionAttr);
 
+std::unique_ptr<mlir::Pass> createDoConcurrentConversionPass();
+
 // declarative passes
 #define GEN_PASS_REGISTRATION
 #include "flang/Optimizer/Transforms/Passes.h.inc"
diff --git a/flang/include/flang/Optimizer/Transforms/Passes.td b/flang/include/flang/Optimizer/Transforms/Passes.td
index e3c45d41f04cc7..41d9be4249b5e2 100644
--- a/flang/include/flang/Optimizer/Transforms/Passes.td
+++ b/flang/include/flang/Optimizer/Transforms/Passes.td
@@ -370,4 +370,24 @@ def FunctionAttr : Pass<"function-attr", "mlir::func::FuncOp"> {
   let constructor = "::fir::createFunctionAttrPass()";
 }
 
+def DoConcurrentConversionPass : Pass<"fopenmp-do-concurrent-conversion", "mlir::func::FuncOp"> {
+  let summary = "Map `DO CONCURRENT` loops to OpenMP worksharing loops.";
+
+  let description = [{ This is an experimental pass to map `DO CONCURRENR` loops
+     to their correspnding equivalent OpenMP worksharing constructs.
+
+     For now the following is supported:
+       - Mapping simple loops to `parallel do`.
+
+     Still to TODO:
+       - More extensive testing.
+       - Mapping to `target teams distribute parallel do`.
+       - Allowing the user to control mapping behavior: either to the host or
+         target.
+  }];
+
+  let constructor = "::fir::createDoConcurrentConversionPass()";
+  let dependentDialects = ["mlir::omp::OpenMPDialect"];
+}
+
 #endif // FLANG_OPTIMIZER_TRANSFORMS_PASSES
diff --git a/flang/lib/Optimizer/Transforms/CMakeLists.txt b/flang/lib/Optimizer/Transforms/CMakeLists.txt
index fc067ad3585395..13a46ef7422530 100644
--- a/flang/lib/Optimizer/Transforms/CMakeLists.txt
+++ b/flang/lib/Optimizer/Transforms/CMakeLists.txt
@@ -21,6 +21,7 @@ add_flang_library(FIRTransforms
   OMPMarkDeclareTarget.cpp
   VScaleAttr.cpp
   FunctionAttr.cpp
+  DoConcurrentConversion.cpp
 
   DEPENDS
   FIRDialect
diff --git a/flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp b/flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp
new file mode 100644
index 00000000000000..180c0bdf672af9
--- /dev/null
+++ b/flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp
@@ -0,0 +1,162 @@
+//===- DoConcurrentConversion.cpp -- map `DO CONCURRENT` to OpenMP loops --===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "flang/Optimizer/Dialect/FIRDialect.h"
+#include "flang/Optimizer/Dialect/FIROps.h"
+#include "flang/Optimizer/Dialect/FIRType.h"
+#include "flang/Optimizer/Dialect/Support/FIRContext.h"
+#include "flang/Optimizer/HLFIR/HLFIRDialect.h"
+#include "flang/Optimizer/Transforms/Passes.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
+#include "mlir/IR/Diagnostics.h"
+#include "mlir/IR/IRMapping.h"
+#include "mlir/Pass/Pass.h"
+#include "mlir/Transforms/DialectConversion.h"
+
+#include <memory>
+
+namespace fir {
+#define GEN_PASS_DEF_DOCONCURRENTCONVERSIONPASS
+#include "flang/Optimizer/Transforms/Passes.h.inc"
+} // namespace fir
+
+#define DEBUG_TYPE "fopenmp-do-concurrent-conversion"
+
+namespace {
+class DoConcurrentConversion : public mlir::OpConversionPattern<fir::DoLoopOp> {
+public:
+  using mlir::OpConversionPattern<fir::DoLoopOp>::OpConversionPattern;
+
+  mlir::LogicalResult
+  matchAndRewrite(fir::DoLoopOp doLoop, OpAdaptor adaptor,
+                  mlir::ConversionPatternRewriter &rewriter) const override {
+    mlir::OpPrintingFlags flags;
+    flags.printGenericOpForm();
+
+    mlir::omp::ParallelOp parallelOp =
+        rewriter.create<mlir::omp::ParallelOp>(doLoop.getLoc());
+
+    rewriter.createBlock(&parallelOp.getRegion());
+    mlir::Block &block = parallelOp.getRegion().back();
+
+    rewriter.setInsertionPointToEnd(&block);
+    rewriter.create<mlir::omp::TerminatorOp>(doLoop.getLoc());
+
+    rewriter.setInsertionPointToStart(&block);
+
+    // Clone the LB, UB, step defining ops inside the parallel region.
+    llvm::SmallVector<mlir::Value> lowerBound, upperBound, step;
+    lowerBound.push_back(
+        rewriter.clone(*doLoop.getLowerBound().getDefiningOp())->getResult(0));
+    upperBound.push_back(
+        rewriter.clone(*doLoop.getUpperBound().getDefiningOp())->getResult(0));
+    step.push_back(
+        rewriter.clone(*doLoop.getStep().getDefiningOp())->getResult(0));
+
+    auto wsLoopOp = rewriter.create<mlir::omp::WsLoopOp>(
+        doLoop.getLoc(), lowerBound, upperBound, step);
+    wsLoopOp.setInclusive(true);
+
+    auto outlineableOp =
+        mlir::dyn_cast<mlir::omp::OutlineableOpenMPOpInterface>(*parallelOp);
+    assert(outlineableOp);
+    rewriter.setInsertionPointToStart(outlineableOp.getAllocaBlock());
+
+    // For the induction variable, we need to privative its allocation and
+    // binding inside the parallel region.
+    llvm::SmallSetVector<mlir::Operation *, 2> workList;
+    // Therefore, we first discover the induction variable by discovering
+    // `fir.store`s where the source is the loop's block argument.
+    workList.insert(doLoop.getInductionVar().getUsers().begin(),
+                    doLoop.getInductionVar().getUsers().end());
+    llvm::SmallSetVector<fir::StoreOp, 2> inductionVarTargetStores;
+
+    // Walk the def-chain of the loop's block argument until we hit `fir.store`.
+    while (!workList.empty()) {
+      mlir::Operation *item = workList.front();
+
+      if (auto storeOp = mlir::dyn_cast<fir::StoreOp>(item)) {
+        inductionVarTargetStores.insert(storeOp);
+      } else {
+        workList.insert(item->getUsers().begin(), item->getUsers().end());
+      }
+
+      workList.remove(item);
+    }
+
+    // For each collected `fir.sotre`, find the target memref's alloca's and
+    // declare ops.
+    llvm::SmallSetVector<mlir::Operation *, 4> declareAndAllocasToClone;
+    for (auto storeOp : inductionVarTargetStores) {
+      mlir::Operation *storeTarget = storeOp.getMemref().getDefiningOp();
+
+      for (auto operand : storeTarget->getOperands()) {
+        declareAndAllocasToClone.insert(operand.getDefiningOp());
+      }
+      declareAndAllocasToClone.insert(storeTarget);
+    }
+
+    mlir::IRMapping mapper;
+
+    // Collect the memref defining ops in the parallel region.
+    for (mlir::Operation *opToClone : declareAndAllocasToClone) {
+      rewriter.clone(*opToClone, mapper);
+    }
+
+    // Clone the loop's body inside the worksharing construct using the mapped
+    // memref values.
+    rewriter.cloneRegionBefore(doLoop.getRegion(), wsLoopOp.getRegion(),
+                               wsLoopOp.getRegion().begin(), mapper);
+
+    mlir::Operation *terminator = wsLoopOp.getRegion().back().getTerminator();
+    rewriter.setInsertionPointToEnd(&wsLoopOp.getRegion().back());
+    rewriter.create<mlir::omp::YieldOp>(terminator->getLoc());
+    rewriter.eraseOp(terminator);
+
+    rewriter.eraseOp(doLoop);
+
+    return mlir::success();
+  }
+};
+
+class DoConcurrentConversionPass
+    : public fir::impl::DoConcurrentConversionPassBase<
+          DoConcurrentConversionPass> {
+public:
+  void runOnOperation() override {
+    mlir::func::FuncOp func = getOperation();
+
+    if (func.isDeclaration()) {
+      return;
+    }
+
+    auto *context = &getContext();
+    mlir::RewritePatternSet patterns(context);
+    patterns.insert<DoConcurrentConversion>(context);
+    mlir::ConversionTarget target(*context);
+    target.addLegalDialect<fir::FIROpsDialect, hlfir::hlfirDialect,
+                           mlir::arith::ArithDialect, mlir::func::FuncDialect,
+                           mlir::omp::OpenMPDialect>();
+
+    target.addDynamicallyLegalOp<fir::DoLoopOp>(
+        [](fir::DoLoopOp op) { return !op.getUnordered(); });
+
+    if (mlir::failed(mlir::applyFullConversion(getOperation(), target,
+                                               std::move(patterns)))) {
+      mlir::emitError(mlir::UnknownLoc::get(context),
+                      "error in converting do-concurrent op");
+      signalPassFailure();
+    }
+  }
+};
+} // namespace
+
+std::unique_ptr<mlir::Pass> fir::createDoConcurrentConversionPass() {
+  return std::make_unique<DoConcurrentConversionPass>();
+}
diff --git a/flang/test/Transforms/DoConcurrent/basic.mlir b/flang/test/Transforms/DoConcurrent/basic.mlir
new file mode 100644
index 00000000000000..b4a2ced45950e5
--- /dev/null
+++ b/flang/test/Transforms/DoConcurrent/basic.mlir
@@ -0,0 +1,60 @@
+// Tests mapping of a basic `do concurrent` loop to `!$omp parallel do`.
+
+// RUN: fir-opt --fopenmp-do-concurrent-conversion %s | FileCheck %s
+
+// CHECK-LABEL: func.func @do_concurrent_basic
+func.func @do_concurrent_basic() attributes {fir.bindc_name = "do_concurrent_basic"} {
+    // CHECK: %[[ARR:.*]]:2 = hlfir.declare %{{.*}}(%{{.*}}) {uniq_name = "_QFEa"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+    // CHECK: %[[C1:.*]] = arith.constant 1 : i32
+    // CHECK: %[[C10:.*]] = arith.constant 10 : i32
+
+    %0 = fir.alloca i32 {bindc_name = "i"}
+    %1:2 = hlfir.declare %0 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+    %2 = fir.address_of(@_QFEa) : !fir.ref<!fir.array<10xi32>>
+    %c10 = arith.constant 10 : index
+    %3 = fir.shape %c10 : (index) -> !fir.shape<1>
+    %4:2 = hlfir.declare %2(%3) {uniq_name = "_QFEa"} : (!fir.ref<!fir.array<10xi32>>, !fir.shape<1>) -> (!fir.ref<!fir.array<10xi32>>, !fir.ref<!fir.array<10xi32>>)
+    %c1_i32 = arith.constant 1 : i32
+    %7 = fir.convert %c1_i32 : (i32) -> index
+    %c10_i32 = arith.constant 10 : i32
+    %8 = fir.convert %c10_i32 : (i32) -> index
+    %c1 = arith.constant 1 : index
+
+    // CHECK-NOT: fir.do_loop
+
+    // CHECK: omp.parallel {
+
+    // CHECK-NEXT: %[[ITER_VAR:.*]] = fir.alloca i32 {bindc_name = "i"}
+    // CHECK-NEXT: %[[BINDING:.*]]:2 = hlfir.declare %[[ITER_VAR]] {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
+
+    // CHECK: %[[LB:.*]] = fir.convert %[[C1]] : (i32) -> index
+    // CHECK: %[[UB:.*]] = fir.convert %[[C10]] : (i32) -> index
+    // CHECK: %[[STEP:.*]] = arith.constant 1 : index
+
+     // CHECK: omp.wsloop for (%[[ARG0:.*]]) : index = (%[[LB]]) to (%[[UB]]) inclusive step (%[[STEP]]) {
+     // CHECK-NEXT: %[[IV_IDX:.*]] = fir.convert %[[ARG0]] : (index) -> i32
+     // CHECK-NEXT: fir.store %[[IV_IDX]] to %[[BINDING]]#1 : !fir.ref<i32>
+     // CHECK-NEXT: %[[IV_VAL1:.*]] = fir.load %[[BINDING]]#0 : !fir.ref<i32>
+     // CHECK-NEXT: %[[IV_VAL2:.*]] = fir.load %[[BINDING]]#0 : !fir.ref<i32>
+     // CHECK-NEXT: %[[IV_VAL_I64:.*]] = fir.convert %[[IV_VAL2]] : (i32) -> i64
+     // CHECK-NEXT: %[[ARR_ACCESS:.*]] = hlfir.designate %[[ARR]]#0 (%[[IV_VAL_I64]])  : (!fir.ref<!fir.array<10xi32>>, i64) -> !fir.ref<i32>
+     // CHECK-NEXT: hlfir.assign %[[IV_VAL1]] to %[[ARR_ACCESS]] : i32, !fir.ref<i32>
+     // CHECK-NEXT: omp.yield
+     // CHECK-NEXT: }
+
+     // CHECK-NEXT: omp.terminator
+     // CHECK-NEXT: }
+    fir.do_loop %arg0 = %7 to %8 step %c1 unordered {
+      %13 = fir.convert %arg0 : (index) -> i32
+      fir.store %13 to %1#1 : !fir.ref<i32>
+      %14 = fir.load %1#0 : !fir.ref<i32>
+      %15 = fir.load %1#0 : !fir.ref<i32>
+      %16 = fir.convert %15 : (i32) -> i64
+      %17 = hlfir.designate %4#0 (%16)  : (!fir.ref<!fir.array<10xi32>>, i64) -> !fir.ref<i32>
+      hlfir.assign %14 to %17 : i32, !fir.ref<i32>
+    }
+
+    // CHECK-NOT: fir.do_loop
+
+    return
+  }

kiranchandramohan · 2024-01-08T09:58:23Z

Nice start.

In case you have missed it, there are some recommendations for do concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md.

I guess the things to consider are:

Is any additional analysis necessary before this conversion?
How to map locality constraints, i believe they are currently handled in lowering. This will need to be delayed.
Ensure this is only enabled with driver flags.

ergawy · 2024-01-09T09:23:10Z

Thanks @kiranchandramohan for looking at this!

Nice start.

In case you have missed it, there are some recommendations for do concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md.

I guess the things to consider are:
1. Is any additional analysis necessary before this conversion?

I also have the feeling that some analyses will be needed but I am not sure what exactly yet. Maybe a starting point would be to verify that input loops respect the constraints listed by the spec, for example: C1121-C1137. I know some of these constraints are enforced by the front-end already before lowering so we will have to consider them one by one and check the best place to check for these constraints. There is also section 11.1.7.5 Additional semantics for DO CONCURRENT constructs in the spec which I think can investigated for further analyses before we apply the mapping to OMP. Do you have any particular analyses in mind outside of this?

2. How to map locality constraints, i believe they are currently handled in lowering. This will need to be delayed.

Indeed this is currently handled in lowering. Below is an example without local_init and then the same loop with local_init:

Without `local_init` this is the loop directly after lowering:

    // do concurrent (integer :: j=i:10)
    // end do
    fir.do_loop %arg0 = %8 to %9 step %c1 unordered {
      %10 = fir.convert %arg0 : (index) -> i32
      fir.store %10 to %1#1 : !fir.ref<i32>
    }

With `local_init` this is the loop directly after lowering:

    //  do concurrent (integer :: j=i:10) local_init(i)
    //  end do
    fir.do_loop %arg0 = %8 to %9 step %c1 unordered {
      %10 = fir.convert %arg0 : (index) -> i32
      fir.store %10 to %1#1 : !fir.ref<i32>
      %11 = fir.alloca i32 {bindc_name = "i", pinned, uniq_name = "_QFEi"}
      %12:2 = hlfir.declare %11 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      %13 = fir.load %6#0 : !fir.ref<i32>
      hlfir.assign %13 to %12#0 : i32, !fir.ref<i32>
    }

So, one possibility I think would be:

Extract the omp.map_info op to to some shared location between fir and omp dialects.
Use map_info to model locality constraints for do concurrent as well.

Do you see any blockers for such approach?

3. Ensure this is only enabled with driver flags.

Will do! For now, I suggest to just be a standalone pass until we flesh it out a bit more. Please let me know if you disagree.

In addition to what you mentioned, I think point number 4 would be how to model reductions. That's for the future though after the above points are addressed.

With that in-mind, do you mind if we review and merge the current PR as it is and defer further development to later PRs? I think doing all this in one PR won't be ideal.

kiranchandramohan · 2024-01-09T12:34:15Z

Thanks @ergawy for the reply.

Since there are some strong opinions expressed about Do Concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md and also that there are several possible lowerings (sequential, CPU-OpenMP, GPU-OpenMP, CPU-SIMD, GPU-OpenACC), it might be good to agree on the design first and then proceed. The way to do this would be to write a design doc and submit a PR to the flang/docs directive and advertise in Flang Slack and discourse. In particular, we should be clear about the analysis required. I have not read the relevant sections in the standard in detail to reply to your question above, doesn't the document cover this?

ergawy · 2024-01-09T12:43:17Z

Thanks @ergawy for the reply.

Since there are some strong opinions expressed about Do Concurrent in https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md and also that there are several possible lowerings (sequential, CPU-OpenMP, GPU-OpenMP, CPU-SIMD, GPU-OpenACC), it might be good to agree on the design first and then proceed. The way to do this would be to write a design doc and submit a PR to the flang/docs directive and advertise in Flang Slack and discourse. In particular, we should be clear about the analysis required. I have not read the relevant sections in the standard in detail to reply to your question above, doesn't the document cover this?

Sure thing! I will start working on a design doc and open a separate (parent) PR for it.

klausler · 2024-01-10T17:01:38Z

Do not run DO CONCURRENT in parallel by default without ensuring first that the loop is free of non-parallelizable data usage.

ergawy · 2024-01-11T06:39:08Z

Do not run DO CONCURRENT in parallel by default without ensuring first that the loop is free of non-parallelizable data usage.

Thanks for your reply. Yup, I am looking into understanding the inherent issues in do concurrent at the moment. I spent sometime going over your discussions on Github and J3 forums (not directly you but relevant to the issues you raised) ([1], [2], and the linked papers and documents) and trying to understand the issues in deeper detail. Once I form some more background, I will reach out to you if you don't mind since you thought about this problem a lot longer than I did.

klausler · 2024-01-11T16:45:29Z

Do not run DO CONCURRENT in parallel by default without ensuring first that the loop is free of non-parallelizable data usage.

Thanks for your reply. Yup, I am looking into understanding the inherent issues in do concurrent at the moment. I spent sometime going over your discussions on Github and J3 forums (not directly you but relevant to the issues you raised) ([1], [2], and the linked papers and documents) and trying to understand the issues in deeper detail. Once I form some more background, I will reach out to you if you don't mind since you thought about this problem a lot longer than I did.

It is not complicated.

There are two common assumptions about DO CONCURRENT that are not true. First, due to the name, many people assume that DO CONCURRENT is an assertion by the programmer that the loop is safe to parallelize and/or a request to parallelize the loop. Second, that the restrictions imposed by the standard on DO CONCURRENT loops make any conformant program safe to parallelize.

The actual definition of DO CONCURRENT is such that it is safe to execute the iterations of the loop in any serial order. They are not sufficient to execute the iterations of the loop in parallel. You can write a DO CONCURRENT loop that is perfectly conformant with the standard but still unsafe to parallelize.

The standard could be fixed with a small change, but the committee won't do it.

If you have permission from the user via the command line to parallelize, fine. But by default, you still need to analyze the data flow in the loop to guarantee safe parallelization.

Adds a pass to map `do concurrent` to OpenMP worksharing consturcts. For now, only maps basic loops to `omp parallel do`. This is still a WIP with more work needed for testing and mapping more advanced loops.

This does not detail everything yet, still working through this.

jeanPerier

Hi @ergawy, very late review on my side. Thanks for working on this, it is a great feature. On my side, as long as this pass is run behind a flag (given the points made by Peter), the direction looks good to me (I appreciate the fact that this is done in a pass so that do concurrent can also easily be mapped to other concepts).

jeanPerier · 2024-01-08T09:47:03Z

flang/include/flang/Optimizer/Transforms/Passes.td

+  let summary = "Map `DO CONCURRENT` loops to OpenMP worksharing loops.";
+
+  let description = [{ This is an experimental pass to map `DO CONCURRENR` loops
+     to their correspnding equivalent OpenMP worksharing constructs.


Suggested change

to their correspnding equivalent OpenMP worksharing constructs.

to their corresponding equivalent OpenMP worksharing constructs.

ergawy self-assigned this Jan 8, 2024

ergawy requested review from jeanPerier and kiranchandramohan January 8, 2024 08:43

llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Jan 8, 2024

ergawy requested a review from tblah January 8, 2024 08:43

ergawy requested review from DominikAdamski, raghavendhra, Meinersbur, agozillon, skatrak and mjklemm January 8, 2024 08:47

ergawy requested a review from klausler January 10, 2024 04:33

ergawy marked this pull request as draft January 11, 2024 06:45

ergawy added 2 commits February 16, 2024 02:24

[WIP][flang][OpenMP] Experimental pass to map do concurrent to OMP

4419514

Adds a pass to map `do concurrent` to OpenMP worksharing consturcts. For now, only maps basic loops to `omp parallel do`. This is still a WIP with more work needed for testing and mapping more advanced loops.

Details on some of the analyses we might need.

07d6c48

This does not detail everything yet, still working through this.

ergawy force-pushed the do_concurrent_to_omp branch from 4be76ec to 07d6c48 Compare February 16, 2024 09:32

jeanPerier reviewed Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][flang][OpenMP] Experimental pass to map `do concurrent` to OMP #77285

[WIP][flang][OpenMP] Experimental pass to map `do concurrent` to OMP #77285

ergawy commented Jan 8, 2024 •

edited

Loading

llvmbot commented Jan 8, 2024

kiranchandramohan commented Jan 8, 2024

ergawy commented Jan 9, 2024

kiranchandramohan commented Jan 9, 2024

ergawy commented Jan 9, 2024

klausler commented Jan 10, 2024

ergawy commented Jan 11, 2024 •

edited

Loading

klausler commented Jan 11, 2024

jeanPerier left a comment

jeanPerier Jan 8, 2024

	to their correspnding equivalent OpenMP worksharing constructs.
	to their corresponding equivalent OpenMP worksharing constructs.

[WIP][flang][OpenMP] Experimental pass to map do concurrent to OMP #77285

Are you sure you want to change the base?

[WIP][flang][OpenMP] Experimental pass to map do concurrent to OMP #77285

Conversation

ergawy commented Jan 8, 2024 • edited Loading

llvmbot commented Jan 8, 2024

kiranchandramohan commented Jan 8, 2024

ergawy commented Jan 9, 2024

Without local_init this is the loop directly after lowering:

With local_init this is the loop directly after lowering:

So, one possibility I think would be:

kiranchandramohan commented Jan 9, 2024

ergawy commented Jan 9, 2024

klausler commented Jan 10, 2024

ergawy commented Jan 11, 2024 • edited Loading

klausler commented Jan 11, 2024

jeanPerier left a comment

Choose a reason for hiding this comment

jeanPerier Jan 8, 2024

Choose a reason for hiding this comment

[WIP][flang][OpenMP] Experimental pass to map `do concurrent` to OMP #77285

[WIP][flang][OpenMP] Experimental pass to map `do concurrent` to OMP #77285

ergawy commented Jan 8, 2024 •

edited

Loading

Without `local_init` this is the loop directly after lowering:

With `local_init` this is the loop directly after lowering:

ergawy commented Jan 11, 2024 •

edited

Loading