[mlir][linalg] unfold projected permutation. #114704

javedabsar1 · 2024-11-03T12:18:53Z

Projected permutation are effectively folding in of a mixture of transpose and broadcast into the affine map of the operand.
While folding of transpose and broadcast into the affine map of the linalg.generic operand is a very effective optimization, sometimes we may want to unfold that, for instance when recognizing named ops.

Example

#projection = affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d1)>
#identity   = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
...
%res = linalg.generic
       { indexing_maps = [#projection, #identity, #identity],
       iterator_types = ["parallel", "parallel", "parallel",
                                  "parallel", "parallel"]}
       ins(%x, %y : tensor<7x8x9xf32>, tensor<5x9x7x8x10xf32>)
       outs(%z : tensor<5x9x7x8x10xf32>) {
         ^bb0(%in: f32, %in_1: f32, %out: f32):
              %div = arith.divf %in, %in_1 : f32
              linalg.yield %div : f32
    } -> tensor<5x9x7x8x10xf32>

In the above IR operand %x map is a projected-permutation. This can be unfolded as:

 ...
 %transposed = linalg.transpose
              ins(%x : tensor<7x8x9xf32>)
              outs(%e1 : tensor<9x7x8xf32>) permutation = [2, 0, 1]
  ...
  %broadcasted = linalg.broadcast
                ins(%transposed : tensor<9x7x8xf32>)
                outs(%e2 : tensor<5x9x7x8x10xf32>) dimensions = [0, 4]
  %2 = linalg.div
          ins(%broadcasted, %y :  tensor<5x9x7x8x10xf32>, tensor<5x9x7x8x10xf32>)
           outs(%arg2 : tensor<5x9x7x8x10xf32>) -> tensor<5x9x7x8x10xf32>

Note that linalg.generic has been 'specialized' to linalg.div. To unfold it is more effective to transpose first and then do the broadcast. However, if transpose is done first, the permutation map needs to be expressed in terms of reduced dimension (as broadcast hasn't happened yet). Also, the broadcast dimensions in a linalg.generic come from other operands (those not broadcasted along that particular dimension). We work this out by computing the polytope shape of the linalg.gneric from shapes of all the operands (inputs and outputs).

Special thanks to Mahesh for discussions on this topic

Identify folded 'projected permutations' (i.e. mixture of transpose and/or broadcast) in linalg generic operands. The 'projected permutations' are unfolded as separate linalg.transpose and linalg.broadcast so that the generic operates on simple identity map which is necessary to replace generic with named op.

llvmbot · 2024-11-03T12:19:25Z

@llvm/pr-subscribers-mlir-linalg

@llvm/pr-subscribers-mlir

Author: Javed Absar (javedabsar1)

Changes

Identify folded 'projected permutations' (i.e. mixture of transpose and/or broadcast) in linalg generic operands.
The 'projected permutations' are unfolded as separate linalg.transpose and linalg.broadcast so that the generic
operates on simple identity map which is necessary to replace generic with named op.

Full diff: https://github.com/llvm/llvm-project/pull/114704.diff

6 Files Affected:

(modified) mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td (+1-1)
(modified) mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h (+5)
(modified) mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt (+1)
(modified) mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp (+1)
(added) mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp (+270)
(added) mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir (+71)

diff --git a/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td b/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
index 0a404194569c22..b81a4c9c8760cf 100644
--- a/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
+++ b/mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td
@@ -252,7 +252,7 @@ def LinalgStructuredInterface
       /*args=*/(ins),
       /*methodBody=*/"",
       /*defaultImplementation=*/[{
-        return getNumParallelLoops() ==  getNumParallelLoops();
+        return getNumParallelLoops() ==  getNumLoops();
       }]
     >,
     InterfaceMethod<
diff --git a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
index 0693e31b4f70af..a110eb88e9f699 100644
--- a/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
+++ b/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
@@ -1786,6 +1786,11 @@ void populateBubbleUpExtractSliceOpPatterns(RewritePatternSet &patterns);
 /// linalg.fill(%cst, tensor.extract_slice(%init)).
 void populateSwapExtractSliceWithFillPatterns(RewritePatternSet &patterns);
 
+
+/// Add patterns to make explicit broadcasts and transforms in the
+/// input operands of a genericOp.
+void populateUnfoldProjectedPermutationPatterns(RewritePatternSet &patterns);
+
 /// Patterns to apply `splitReduction` below.
 void populateSplitReductionPattern(
     RewritePatternSet &patterns,
diff --git a/mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt b/mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt
index d7c63cdd8198d7..dfe6d7a54c8f14 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt
@@ -38,6 +38,7 @@ add_mlir_dialect_library(MLIRLinalgTransforms
   TilingInterfaceImpl.cpp
   Transforms.cpp
   TransposeConv2D.cpp
+  UnfoldProjectedPermutation.cpp
   Vectorization.cpp
   WinogradConv2D.cpp
 
diff --git a/mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp b/mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
index dfafffce9d9b60..a911286d5d44b2 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Specialize.cpp
@@ -347,6 +347,7 @@ struct LinalgSpecializeGenericOpsPass
 void LinalgSpecializeGenericOpsPass::runOnOperation() {
   RewritePatternSet patterns(&getContext());
   populateLinalgGenericOpsSpecializationPatterns(patterns);
+  populateUnfoldProjectedPermutationPatterns(patterns);
 
   if (failed(applyPatternsAndFoldGreedily(getOperation(), std::move(patterns))))
     signalPassFailure();
diff --git a/mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp b/mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp
new file mode 100644
index 00000000000000..56d6bd23b2343a
--- /dev/null
+++ b/mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp
@@ -0,0 +1,270 @@
+//===- UnfoldProjectedPermutation.cpp - extract projected projections   ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements pattern to decompose the operand of a GenericOp that
+// has `transpose+broadcast` juxtaposed via its affine map into separate
+// transpose and broadcast ops.
+//
+//===----------------------------------------------------------------------===//
+//
+#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
+#include <utility>
+
+#include "mlir/Dialect/Affine/IR/AffineOps.h"
+#include "mlir/Dialect/Linalg/IR/Linalg.h"
+#include <map>
+#include <optional>
+#include <vector>
+
+using namespace mlir;
+using namespace mlir::linalg;
+
+namespace {
+
+/// Projected permutation are effectively folding in of a mixture of
+/// transpose and broadcast into the affine map of the operand.
+/// While folding of transpose and broadcast into the affine map of the
+/// linalg.generic operand is a very effective optimization, sometimes
+/// we may want to unfold that, for instance when recognizing named ops.
+///
+///  Example
+///
+/// ```mlir
+///
+/// #projection = affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d1)>
+/// #identity   = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
+/// ...
+///    %res = linalg.generic
+///       { indexing_maps = [#projection, #identity, #identity],
+///       iterator_types = ["parallel", "parallel", "parallel",
+///                         "parallel", "parallel"]}
+///       ins(%x, %y : tensor<7x8x9xf32>, tensor<5x9x7x8x10xf32>)
+///       outs(%z : tensor<5x9x7x8x10xf32>) {
+///         ^bb0(%in: f32, %in_1: f32, %out: f32):
+///              %div = arith.divf %in, %in_1 : f32
+///              linalg.yield %div : f32
+///    } -> tensor<5x9x7x8x10xf32>
+/// ```
+///
+/// In the above IR operand `%x` map is a projected-permutation. This can be
+/// unfolded as:
+///
+/// ```mlir
+///   ...
+///   %transposed = linalg.transpose ins(%x : tensor<7x8x9xf32>)
+///                    outs(%e1 : tensor<9x7x8xf32>) permutation = [2, 0, 1]
+///   ...
+///   %broadcasted = linalg.broadcast ins(%transposed : tensor<9x7x8xf32>)
+///                    outs(%e2 : tensor<5x9x7x8x10xf32>) dimensions = [0, 4]
+///   %2 = linalg.div
+///           ins(%broadcasted, %y :
+///                  tensor<5x9x7x8x10xf32>, tensor<5x9x7x8x10xf32>)
+///           outs(%arg2 : tensor<5x9x7x8x10xf32>) -> tensor<5x9x7x8x10xf32>
+///
+/// Note that linalg.generic has been 'specialized' to linalg.div.
+/// To unfold it is more effective to transpose first and then do the broadcast.
+/// However, if transpose is done first, the permutation map needs to be
+/// expressed in terms of reduced dimension (as broadcast hasn't happened yet).
+/// Also, the broadcast dimensions in a linalg.generic come from other operands
+/// (those not broadcasted along that particular dimension). We work this out
+/// by computing the polytope shape of the linalg.gneric from shapes of all the
+/// operands (inputs and outputs).
+
+struct UnfoldProjectedPermutation : public OpRewritePattern<GenericOp> {
+  using OpRewritePattern<GenericOp>::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(GenericOp genericOp,
+                                PatternRewriter &rewriter) const override;
+};
+
+/// Calculate shape (dimensions) of the iteration space polytope.
+/// This is calculated by concatenating the indexing maps of all operands
+/// of the generic; inverting the concatenation; concatenating all the
+/// shapes of the operands; and then doing `apply map` to those two.
+SmallVector<int64_t> getPolytopeDims(GenericOp op) {
+  assert(op.hasPureTensorSemantics() && "works only on tensors");
+
+  /// Concat indexing maps of all operands and invert the mapping.
+  auto maps = op.getIndexingMapsArray();
+  auto concat = concatAffineMaps(maps);
+  auto inverse = inversePermutation(concat);
+
+  /// Concat the size of each dims of all operands.
+  SmallVector<int64_t> dims;
+  for (auto &operand : op->getOpOperands()) {
+    auto rankedType = cast<RankedTensorType>(operand.get().getType());
+    for (auto size : rankedType.getShape())
+      dims.push_back(size);
+  }
+
+  /// Match the inverse map with dims to get polytope dimensions.
+  /// Note that some maybe 'kDynamic'.
+  return applyPermutationMap<int64_t>(inverse, dims);
+}
+
+/// For the given `map` determine what dimensions are transposed
+/// and what dimensions are broadcasted.
+/// Returns :
+///  `isTransposed, isBroadcast,
+///   transpose-permutation, broadcast-dimensions`
+///
+std::tuple<bool, bool, SmallVector<int64_t>, SmallVector<int64_t>>
+computeTransposeBroadcast(AffineMap &map) {
+  assert(map.isProjectedPermutation(false) && "not a projection");
+
+  // Dimensions that don't appear on result are broadcast.
+  int64_t minorSize = map.getNumResults();
+
+  // Convert affine expr to int64_t.
+  SmallVector<int64_t> minorResult;
+  for (int64_t i = 0; i < minorSize; ++i) {
+    auto expr = cast<AffineDimExpr>(map.getResults()[i]);
+    minorResult.push_back(expr.getPosition());
+  }
+
+  // If dims are not monotonically increasing then transpose is present.
+  SmallVector<int64_t> sorted(minorResult);
+  std::sort(sorted.begin(), sorted.end());
+  bool hasTranspose = !std::equal(minorResult.begin(), minorResult.end(),
+                                  sorted.begin(), sorted.end());
+
+  // Walk the sorted map result to determine which dimensions are broadcasted.
+  SmallVector<int64_t> broadcast;
+  for (int64_t i = 0, j = 0; i < map.getNumInputs(); ++i) {
+    if (j < minorSize && sorted[j] == i) {
+      j++;
+      continue;
+    }
+    broadcast.push_back(i);
+  }
+  bool hasBroadcast = broadcast.size();
+
+  /// Consider an operand `x : tensor<7x8x9>` of a genericOp that has
+  /// affine map `affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d1)>`
+  /// `x`s access is both transposed and brodcast. But when specifying
+  /// the `linalg.transpose(x : tensor<7x8x9>)` the dimensions need to be
+  /// specified as `affine_map<(d0,d1,d2) -> (d1, d2, d0)` instead of
+  /// refering to d3, d4. Therefore, re-base the transpose dimensions so
+  /// that they start from d0.
+  std::map<int64_t, int64_t> minorMap;
+  for (int64_t i = 0; i < minorSize; ++i)
+    minorMap.insert({sorted[i], i});
+
+  // Re-map the dimensions.
+  SmallVector<int64_t> remappedResult(minorSize);
+  for (int64_t i = 0; i < minorSize; ++i)
+    remappedResult[i] = minorMap[minorResult[i]];
+
+  /// Calculate the permutation for the transpose.
+  SmallVector<int64_t> permutation(minorSize);
+  for (unsigned i = 0; i < minorSize; ++i) {
+    permutation[remappedResult[i]] = i;
+  }
+
+  return {hasTranspose, hasBroadcast, permutation, broadcast};
+}
+
+LogicalResult
+UnfoldProjectedPermutation::matchAndRewrite(GenericOp op,
+                                            PatternRewriter &rewriter) const {
+  if (!op.hasPureTensorSemantics() || op.isSingleInputOutput() ||
+      op.isSingleYieldOp() || !op.isAllParallelLoops())
+    return failure();
+
+  // All maps need to be projected permutations.
+  for (auto &opOperand : op->getOpOperands()) {
+    auto map = op.getMatchingIndexingMap(&opOperand);
+    if (!map.isProjectedPermutation(false))
+      return failure();
+  }
+
+  // Currently we handle only static shapes.
+  for (auto &operand : op->getOpOperands()) {
+    auto rankedType = cast<RankedTensorType>(operand.get().getType());
+    for (auto size : rankedType.getShape())
+      if (size == ShapedType::kDynamic)
+        return failure();
+  }
+
+  // Calculate polytope bounds from affine maps and operand(s) shapes.
+  auto polytope = getPolytopeDims(op);
+
+  auto loc = op.getLoc();
+  bool isChanged = false;
+  SmallVector<Value> newInitValues = op.getDpsInputs();
+  SmallVector<AffineMap> newMap = op.getIndexingMapsArray();
+
+  // Walk over each input operand and unfold if it is transposed, broadcast
+  // or mix of two via operand's affine-map.
+  for (int64_t i = 0; i < op.getNumDpsInputs(); ++i) {
+    auto &map = newMap[i];
+    auto inputRTType = cast<RankedTensorType>(newInitValues[i].getType());
+    auto elType = inputRTType.getElementType();
+
+    /// Nothing to do if map is already an identity.
+    if (map.isIdentity())
+      continue;
+
+    auto [hasTranspose, hasBroadcast, permutation, broadcastedDims] =
+        computeTransposeBroadcast(map);
+
+    if (hasTranspose) {
+      /// linalg.transpose permutes the dimensions of input using
+      /// rule: dim(result, i) = dim(input, permutation[i])
+      SmallVector<int64_t> transposedShape(map.getNumResults());
+      for (int64_t i = 0; i < map.getNumResults(); ++i)
+        transposedShape[i] = inputRTType.getShape()[permutation[i]];
+
+      Value emptyTensor =
+          rewriter.create<tensor::EmptyOp>(loc, transposedShape, elType);
+
+      auto transposeOp = rewriter.create<TransposeOp>(loc, newInitValues[i],
+                                                      emptyTensor, permutation);
+      newInitValues[i] = transposeOp->getResult(0);
+      isChanged = true;
+    }
+
+    // Does it require broadcast
+    if (hasBroadcast) {
+      assert(broadcastedDims.size() && "should have non size broadcast");
+      Value emptyTensor = rewriter.create<tensor::EmptyOp>(
+          loc, polytope, inputRTType.getElementType());
+
+      auto broadcastOp = rewriter.create<linalg::BroadcastOp>(
+          loc, newInitValues[i], emptyTensor, broadcastedDims);
+
+      newInitValues[i] = broadcastOp->getResult(0);
+      isChanged = true;
+    }
+    newMap[i] = rewriter.getMultiDimIdentityMap(map.getNumDims());
+  }
+
+  if (isChanged) {
+    SmallVector<Value> operands = op->getOperands();
+    ValueRange operandsRef(operands);
+
+    auto newOp = rewriter.create<linalg::GenericOp>(
+        /*location=*/op.getLoc(),
+        /*resultTensorTypes=*/op->getResultTypes(),
+        /*inputs=*/newInitValues,
+        /*outputs=*/operandsRef.drop_front(op.getNumDpsInputs()),
+        /*indexingMaps=*/newMap,
+        /*iteratorTypes=*/op.getIteratorTypesArray());
+
+    newOp.getRegion().takeBody(op->getRegion(0));
+    rewriter.replaceOp(op, newOp->getResults());
+  }
+  return success();
+}
+
+} // namespace
+
+void mlir::linalg::populateUnfoldProjectedPermutationPatterns(
+    RewritePatternSet &patterns) {
+  patterns.insert<UnfoldProjectedPermutation>(patterns.getContext());
+}
diff --git a/mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir b/mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir
new file mode 100644
index 00000000000000..4efa07b2de12e3
--- /dev/null
+++ b/mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir
@@ -0,0 +1,71 @@
+// RUN: mlir-opt %s -split-input-file --linalg-specialize-generic-ops | FileCheck %s
+
+#projection = affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d1)>
+#identity   = affine_map<(d0, d1, d2, d3, d4) -> (d0, d1, d2, d3, d4)>
+
+func.func @test_mixed(%x : tensor<7x8x9xf32>, %y:  tensor<5x9x7x8x10xf32>, %z :  tensor<5x9x7x8x10xf32>) ->  tensor<5x9x7x8x10xf32> {
+  %res = linalg.generic
+     { indexing_maps = [#projection, #identity, #identity], iterator_types = ["parallel", "parallel", "parallel", "parallel", "parallel"]} 
+     ins(%x, %y : tensor<7x8x9xf32>, tensor<5x9x7x8x10xf32>) outs(%z : tensor<5x9x7x8x10xf32>) {
+     ^bb0(%in: f32, %in_1: f32, %out: f32):
+       %div = arith.divf %in, %in_1 : f32
+       linalg.yield %div : f32
+  } -> tensor<5x9x7x8x10xf32>
+  return %res : tensor<5x9x7x8x10xf32>
+}
+
+// CHECK-LABEL: test_mixed
+// CHECK-SAME: %[[X:.+]]: tensor<7x8x9xf32>, %[[Y:.+]]: tensor<5x9x7x8x10xf32>, %[[Z:.+]]: tensor<5x9x7x8x10xf32>) -> tensor<5x9x7x8x10xf32> {
+// CHECK: %[[E0:.+]] = tensor.empty() : tensor<9x7x8xf32>
+// CHECK: %[[Transposed:.+]] = linalg.transpose ins(%[[X]] : tensor<7x8x9xf32>) outs(%[[E0]] : tensor<9x7x8xf32>) permutation = [2, 0, 1]
+// CHECK: %[[E1:.+]] = tensor.empty() : tensor<5x9x7x8x10xf32>
+// CHECK: %[[Broadcasted:.+]] = linalg.broadcast ins(%[[Transposed]] : tensor<9x7x8xf32>) outs(%[[E1]] : tensor<5x9x7x8x10xf32>) dimensions = [0, 4]
+// CHECK: {{.*}} = linalg.div ins(%[[Broadcasted]], %[[Y]] : tensor<5x9x7x8x10xf32>, tensor<5x9x7x8x10xf32>) outs(%[[Z]] : tensor<5x9x7x8x10xf32>) -> tensor<5x9x7x8x10xf32>
+// CHECK-NOT: linalg.generic
+
+// -----
+
+#identity = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
+#transposed = affine_map<(d0, d1, d2) -> (d2, d0, d1)>
+
+func.func @test_transposed(%x : tensor<32x2x16xf32>, %y:  tensor<2x16x32xf32>, %z :  tensor<2x16x32xf32>) ->  tensor<2x16x32xf32> {
+  %res = linalg.generic
+     { indexing_maps = [#transposed, #identity, #identity], iterator_types = ["parallel", "parallel", "parallel"]}
+     ins(%x, %y : tensor<32x2x16xf32>, tensor<2x16x32xf32>)
+     outs(%z : tensor<2x16x32xf32>) {
+     ^bb0(%in: f32, %in_1: f32, %out: f32):
+       %div = arith.divf %in, %in_1 : f32
+       linalg.yield %div : f32
+  } -> tensor<2x16x32xf32>
+  return %res : tensor<2x16x32xf32>
+}
+
+// CHECK-LABEL: test_transposed
+// CHECK-SAME: %[[X:.+]]: tensor<32x2x16xf32>, %[[Y:.+]]: tensor<2x16x32xf32>, %[[Z:.+]]: tensor<2x16x32xf32>) -> tensor<2x16x32xf32> {
+// CHECK: %[[E0:.+]] = tensor.empty() : tensor<2x16x32xf32>
+// CHECK: %[[Transposed:.+]] = linalg.transpose ins(%[[X]] : tensor<32x2x16xf32>) outs(%[[E0]] : tensor<2x16x32xf32>) permutation = [1, 2, 0]
+// CHECK: {{.*}} = linalg.div ins(%[[Transposed]], %[[Y]] : tensor<2x16x32xf32>, tensor<2x16x32xf32>) outs(%[[Z]] : tensor<2x16x32xf32>) -> tensor<2x16x32xf32>
+// CHECK-NOT: linalg.generic
+
+// -----
+
+#identity = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
+#broadcast = affine_map<(d0, d1, d2) -> (d0, d2)>
+func.func @test_broadcast(%x : tensor<2x16x32xf32>, %y:  tensor<2x32xf32>, %z :  tensor<2x16x32xf32>) ->  tensor<2x16x32xf32> {
+  %res = linalg.generic
+     { indexing_maps = [#identity, #broadcast, #identity], iterator_types = ["parallel", "parallel", "parallel"]}
+     ins(%x, %y : tensor<2x16x32xf32>, tensor<2x32xf32>)
+     outs(%z : tensor<2x16x32xf32>) {
+     ^bb0(%in: f32, %in_1: f32, %out: f32):
+       %div = arith.divf %in, %in_1 : f32
+       linalg.yield %div : f32
+  } -> tensor<2x16x32xf32>
+  return %res : tensor<2x16x32xf32>
+}
+
+// CHECK-LABEL: test_broadcast
+// CHECK-SAME: %[[X:.+]]: tensor<2x16x32xf32>, %[[Y:.+]]: tensor<2x32xf32>, %[[Z:.+]]: tensor<2x16x32xf32>) -> tensor<2x16x32xf32> {
+// CHECK: %[[E0:.+]] = tensor.empty() : tensor<2x16x32xf32>
+// CHECK: %[[Broadcasted:.+]] = linalg.broadcast ins(%[[Y]] : tensor<2x32xf32>) outs(%[[E0]] : tensor<2x16x32xf32>) dimensions = [1]
+// CHECK: {{.*}} = linalg.div ins(%[[X]], %[[Broadcasted]] : tensor<2x16x32xf32>, tensor<2x16x32xf32>) outs(%arg2 : tensor<2x16x32xf32>) -> tensor<2x16x32xf32>
+// CHECK-NOT: linalg.generic

github-actions · 2024-11-03T12:23:35Z

✅ With the latest revision this PR passed the C/C++ code formatter.

banach-space

I've only had a chance to skim through the comments and tests and left a few nits. Will do a proper review later.

There's a lot of threads on Discourse atm and it's tricky to co-ordinate everything 😅

mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir

banach-space

Nice, thanks!

I've skimmed through the tests to gauge what's going on and mostly looks fine. But I find UnfoldProjectedPermutation to be a very misleading name. I don't think that's the main thing here. Instead, this is expanding the "specialisation" logic, right?

There's a few tricky bits here, so I'd like to take another look.

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

banach-space · 2024-11-05T16:29:29Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+    }
+    broadcast.push_back(i);
+  }
+  bool hasBroadcast = broadcast.size();


Suggested change

bool hasBroadcast = broadcast.size();

bool hasBroadcast = !broadcast.empty();

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir

banach-space · 2024-11-05T16:40:51Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+/// Calculate shape (dimensions) of the iteration space polytope.
+/// This is calculated by concatenating the indexing maps of all operands
+/// of the generic; inverting the concatenation; concatenating all the
+/// shapes of the operands; and then doing `apply map` to those two.


Is "polytope" a standard term for these things? To me this is just combining all the dims and inverting them using the corresponding maps. I don't see "polytope" being used anywhere in the context of Linalg?

Also, just based on the test, this simply returns the type of the output tensor. Why not use that instead? In what cases grabbing the output tensor would not be sufficient?

I should have called it 'convex polyhedron' or nDimRectangleDims - would have been better. But now as @MaheshRavishankar suggested getStaticLoopRanges does same job.

banach-space · 2024-11-05T16:45:43Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+
+namespace {
+
+/// Projected permutation are effectively folding in of a mixture of


This comment is very insightful, but doesn't really say what the pattern does. Also, I don't believe UnfoldProjectedPermutation is accurate. The pattern (based on this comment and the tests), specialised linalg.generic {op} into linalg.transpose + linalg.transpose + op?

Unfolding the permutation map is just an implementation detail. A very important one, but not the ultimate goal.

MaheshRavishankar · 2024-11-06T17:58:18Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+/// This is calculated by concatenating the indexing maps of all operands
+/// of the generic; inverting the concatenation; concatenating all the
+/// shapes of the operands; and then doing `apply map` to those two.
+SmallVector<int64_t> getPolytopeDims(GenericOp op) {


Actually this whole thing should be same as op.getStaticLoopRanges().

awesome . Yes, does it. Thanks for pointing out. Wish i had seen earlier, but then wouldnt have learnt inversePermutation thingie :)

javedabsar1 · 2024-11-06T19:02:56Z

Nice, thanks!

I've skimmed through the tests to gauge what's going on and mostly looks fine. But I find UnfoldProjectedPermutation to be a very misleading name. I don't think that's the main thing here. Instead, this is expanding the "specialisation" logic, right?

There's a few tricky bits here, so I'd like to take another look.

I really think UnfoldProjectedPermutation names what is pattern is doing - the transpose and broadcast can be seen as folded in via the affine map. To identify we use an existing 'isProjectedPermutation` function, and what this rewrite is doing is unfolding what (at least conceptually) got folded in.

I have re-written the intro in the top of the .cpp file to make it more explicit / helpful.

But yeah, if you have a better name in mind would be glad to change this.

banach-space · 2024-11-06T21:21:34Z

I really think UnfoldProjectedPermutation names what is pattern is doing - the transpose and broadcast can be seen as folded in via the affine map.

I think that this is skewed towards the key implementation detail rather the transformation itself. To me, it's more like specializeGenericByUnfoldingPermutation or decomposeGenericByUnfoldingPermutation. I need to take another look at the pattern in search for inspiration, but today it's already too late 😅 Hopefully you see what I meant :)

javedabsar1 · 2024-11-06T21:47:57Z

I really think UnfoldProjectedPermutation names what is pattern is doing - the transpose and broadcast can be seen as folded in via the affine map.

I think that this is skewed towards the key implementation detail rather the transformation itself. To me, it's more like specializeGenericByUnfoldingPermutation or decomposeGenericByUnfoldingPermutation. I need to take another look at the pattern in search for inspiration, but today it's already too late 😅 Hopefully you see what I meant :)

decomposeGenericByUnfoldingPermutation works for me. Thanks for suggesting it :) Will change it everywhere.

dcaballe · 2024-11-06T23:58:46Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+#include "mlir/Dialect/Linalg/IR/Linalg.h"
+#include <map>
+#include <optional>
+#include <vector>


Is vector needed?

You should also move <utility> to this section. I think if you remove the blank line (11) , clang-format would sort the includes in the right order for you :)

MaheshRavishankar · 2024-11-07T00:46:27Z

mlir/test/Dialect/Linalg/unfold-projected-permutation.mlir

+// CHECK-LABEL: transpose_and_broadcast
+// CHECK-SAME: %[[X:.+]]: tensor<7x8x9xf32>, %[[Y:.+]]: tensor<5x9x7x8x10xf32>, %[[Z:.+]]: tensor<5x9x7x8x10xf32>) -> tensor<5x9x7x8x10xf32> {
+// CHECK: %[[E0:.+]] = tensor.empty() : tensor<9x7x8xf32>
+// CHECK: %[[X_trans:.+]] = linalg.transpose ins(%[[X]] : tensor<7x8x9xf32>) outs(%[[E0]] : tensor<9x7x8xf32>) permutation = [2, 0, 1]


I would have expected this to be [1, 2, 0] but I am also assuming transpose keeps the input as the basis and describes how to permute the inputs.

Hm, (7, 8, 9) -> (9, 7, 8), which corresponds to (d2, d3, d1) -> (d1, d2, d3), which gives permutation = [2, 0, 1].

Now, I managed convince myself that this is correct, but please double check for yourself 😅

@MaheshRavishankar , you might be skewed by:

#projection = affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d1)>

I think this is the trick (IIUC, this is the actual mapping here):

7 -> d2,

8 -> d3,

9 -> d1.

Whereas you assume that:

7 -> d1,

8 -> d2,

9 -> d3.

Does it make sense?

Yes of course you are right! Two parts to this.

Broadcast semantics is -
dim(result, i) == dim(input, permutation[i])

Therefore, for input 7x8x9 and permutation [2,0,1] the output works therefore out to 9x7x8.

working out the permutation from affine-map e.g. (d2, d3, d1) -> (d1, d2, d3) and vice-versa.

banach-space

A few comments re implementation and then mostly typo suggestions. Sorry for being a bit lazy and going after the easy part 😅 I will do better in the next round!

All in all, much appreciated and neat effort! Don't let my pedantic comments make you think otherwise :)

mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

banach-space · 2024-11-07T14:36:56Z

mlir/test/Dialect/Linalg/unfold-projected-permutation.mlir

+// CHECK-LABEL: transpose_and_broadcast
+// CHECK-SAME: %[[X:.+]]: tensor<7x8x9xf32>, %[[Y:.+]]: tensor<5x9x7x8x10xf32>, %[[Z:.+]]: tensor<5x9x7x8x10xf32>) -> tensor<5x9x7x8x10xf32> {
+// CHECK: %[[E0:.+]] = tensor.empty() : tensor<9x7x8xf32>
+// CHECK: %[[X_trans:.+]] = linalg.transpose ins(%[[X]] : tensor<7x8x9xf32>) outs(%[[E0]] : tensor<9x7x8xf32>) permutation = [2, 0, 1]


Hm, (7, 8, 9) -> (9, 7, 8), which corresponds to (d2, d3, d1) -> (d1, d2, d3), which gives permutation = [2, 0, 1].

Now, I managed convince myself that this is correct, but please double check for yourself 😅

@MaheshRavishankar , you might be skewed by:

#projection = affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d1)>

I think this is the trick (IIUC, this is the actual mapping here):

7 -> d2,

8 -> d3,

9 -> d1.

Whereas you assume that:

7 -> d1,

8 -> d2,

9 -> d3.

Does it make sense?

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

banach-space · 2024-11-07T14:38:11Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+      return failure();
+  }
+
+  // Currently we handle only static shapes.


Could this work at all for dynamic shapes?

For a start this will assert when trying to create tensor.empty with dynamic shape. https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp#L874

OK, rather than documenting what the code does, could you add a comment saying "why"? Or what's missing? From what you are saying, we'd need to add logic to compute dynamic sizes of the input tensors for ops like EmptyOp? And probably sth else as well?

banach-space · 2024-11-07T14:46:26Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+/// linalg.generic is a good optimization but sometimes we may want to unwrap
+/// i.e. `unfold` them as explicit transpose and broadcast. This rewrite


Suggested change

/// linalg.generic is a good optimization but sometimes we may want to unwrap

/// i.e. `unfold` them as explicit transpose and broadcast. This rewrite

/// linalg.generic is a good optimization but sometimes we may want to unwrap,

/// i.e., `unfold` them as explicit transpose and broadcast. This rewrite

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

javedabsar1 · 2024-11-08T13:10:15Z

Hi Folks.
With this latest commit I think I have addressed all your review comments. Thanks a lot for your comments that helped improve the implementation.

MaheshRavishankar · 2024-11-08T17:38:20Z

My bad on the wrong comment on the transpose permutation. The representation for transpose is somehow inverse of how my brain models it.

javedabsar1 · 2024-11-08T17:47:09Z

Thanks @MaheshRavishankar for the review.

@banach-space - are you also ok with the changes. No rush, just asking. Thanks.

banach-space

A few additional comments, but nothing major. LGTM otherwise.

banach-space · 2024-11-08T17:54:41Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+      return failure();
+  }
+
+  // Currently we handle only static shapes.


OK, rather than documenting what the code does, could you add a comment saying "why"? Or what's missing? From what you are saying, we'd need to add logic to compute dynamic sizes of the input tensors for ops like EmptyOp? And probably sth else as well?

banach-space · 2024-11-08T17:54:48Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+      op.isSingleYieldOp() || !op.isAllParallelLoops())
+    return failure();
+
+  // All maps need to be projected permutations.


banach-space · 2024-11-08T17:55:25Z

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp

+  for (auto &operand : op->getOpOperands()) {
+    auto opType = cast<RankedTensorType>(operand.get().getType());
+    for (auto size : opType.getShape())
+      if (size == ShapedType::kDynamic)
+        return failure();
+  }


banach-space · 2024-11-08T17:59:52Z

mlir/lib/Dialect/Linalg/Transforms/DecomposeGenericByUnfoldingPermutation.cpp

+  assert(map.isProjectedPermutation(false) && "not a projection");
+  int64_t minorSize = map.getNumResults();
+
+  SmallVector<int64_t> minorResult;


What does minor mean in this context? Apologies if this is obvoius.

added comments

banach-space · 2024-11-08T18:00:18Z

mlir/lib/Dialect/Linalg/Transforms/DecomposeGenericByUnfoldingPermutation.cpp

+  if (hasTranspose) {
+    /// Consider an operand `x : tensor<7x8x9>` of a genericOp that has
+    /// affine map `affine_map<(d0, d1, d2, d3, d4) -> (d2, d3, d1)>`
+    /// `x`s access is both transposed and brodcast. But when specifying


[nit]

Suggested change

/// `x`s access is both transposed and brodcast. But when specifying

/// `x`s access is both transposed and broadcast. But when specifying

banach-space · 2024-11-08T19:15:57Z

mlir/lib/Dialect/Linalg/Transforms/DecomposeGenericByUnfoldingPermutation.cpp

+
+  SmallVector<int64_t> permutation;
+  if (hasTranspose) {
+    /// Consider an operand `x : tensor<7x8x9>` of a genericOp that has


[nit] Doxygen style comments are normally used for API documentation. Also, below you use //

Suggested change

/// Consider an operand `x : tensor<7x8x9>` of a genericOp that has

// Consider an operand `x : tensor<7x8x9>` of a genericOp that has

banach-space · 2024-11-09T18:51:10Z

mlir/lib/Dialect/Linalg/Transforms/DecomposeGenericByUnfoldingPermutation.cpp

+  // Decomposing linalg.generic involves creating `tensor.empty`
+  // which cannot have dnyamic shapes.


tensor.empty with dynamic shapes are allowed, see e.g.:

llvm-project/mlir/test/Dialect/Tensor/ops.mlir

Line 49 in 58a17e1

%0 = tensor.empty(%sz) : tensor<5x?x6xf32, "foo">

What's a bit "tricky" is computing the required sizes. You could leave that as a TODO.

the tensor.empty needs to be passed in a runtime value for '?' . For our case i dont see how to derive it easily from the linalg.generic. Guess it could be done by getting the tensor.dim of some operand (not all).... quite a rabbit-hole.

https://mlir.llvm.org/docs/Dialects/TensorOps/#tensordim-tensordimop ?

banach-space · 2024-11-09T18:56:06Z

mlir/lib/Dialect/Linalg/Transforms/DecomposeGenericByUnfoldingPermutation.cpp

+  for (auto &operand : op->getOpOperands()) {
+    auto opType = cast<RankedTensorType>(operand.get().getType());
+    for (auto size : opType.getShape())
+      if (size == ShapedType::kDynamic)
+        return failure();
+  }


Suggested change

for (auto &operand : op->getOpOperands()) {

auto opType = cast<RankedTensorType>(operand.get().getType());

for (auto size : opType.getShape())

if (size == ShapedType::kDynamic)

return failure();

}

if (!llvm::all_of(packOp->getOpOperands(), [](OpOperand &oper) {

auto opType = cast<RankedTensorType>(oper.get().getType());

return !ShapedType::isDynamicShape(opType.getShape());

}) return failure();

Originally posted here:

https://github.com/llvm/llvm-project/pull/114704/files/3b238c652dd0f0f89040ace0c224ff0a618ff88a#r1832804494

GitHub is really good at hiding these 😓

Patterns to decompose the input operand(s) of a linalg.generic that has a projected permutation` affine-map -- i.e. effectively a folded `transpose`, `broadcast`, or a mixture of two -- into explicit transpose and broadcast. This is useful for instance when trying to recognize named ops. email: quic_mabsar@quicinc.com

javedabsar1 requested review from dcaballe, nicolasvasilache and rengolin as code owners November 3, 2024 12:18

llvmbot added mlir:linalg mlir labels Nov 3, 2024

javedabsar1 requested review from MaheshRavishankar, banach-space, stellaraccident, rengolin, nicolasvasilache and dcaballe and removed request for rengolin, nicolasvasilache and dcaballe November 3, 2024 12:24

[mlir][linalg] fix clang format issue

ce58238

banach-space reviewed Nov 4, 2024

View reviewed changes

mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir Outdated Show resolved Hide resolved

mlir/lib/Dialect/Linalg/Transforms/UnfoldProjectedPermutation.cpp Outdated Show resolved Hide resolved

mlir/test/Dialect/Linalg/unfold_projected_permutation.mlir Outdated Show resolved Hide resolved

[mlir][linalg] Address review comments

b9094dc

banach-space reviewed Nov 5, 2024

View reviewed changes

MaheshRavishankar requested changes Nov 6, 2024

View reviewed changes

llvm deleted a comment from banach-space Nov 6, 2024

Revise more based on review comments.

3b238c6

dcaballe reviewed Nov 6, 2024

View reviewed changes

MaheshRavishankar reviewed Nov 7, 2024

View reviewed changes

banach-space reviewed Nov 7, 2024

View reviewed changes

Revise based on 3rd round review comments

e3373b8

MaheshRavishankar approved these changes Nov 8, 2024

View reviewed changes

banach-space approved these changes Nov 8, 2024

View reviewed changes

Revise based on 3rd round review comments

296f805

banach-space reviewed Nov 9, 2024

View reviewed changes

Revise based on 5th review comments

6f61f9a

javedabsar1 merged commit 0ac4821 into llvm:main Nov 9, 2024
8 checks passed

	bool hasBroadcast = broadcast.size();
	bool hasBroadcast = !broadcast.empty();


		namespace {

		/// Projected permutation are effectively folding in of a mixture of

		/// linalg.generic is a good optimization but sometimes we may want to unwrap
		/// i.e. `unfold` them as explicit transpose and broadcast. This rewrite

	/// `x`s access is both transposed and brodcast. But when specifying
	/// `x`s access is both transposed and broadcast. But when specifying

	/// Consider an operand `x : tensor<7x8x9>` of a genericOp that has
	// Consider an operand `x : tensor<7x8x9>` of a genericOp that has

		// Decomposing linalg.generic involves creating `tensor.empty`
		// which cannot have dnyamic shapes.

[mlir][linalg] unfold projected permutation. #114704

[mlir][linalg] unfold projected permutation. #114704

Conversation

javedabsar1 commented Nov 3, 2024 • edited Loading

llvmbot commented Nov 3, 2024 • edited Loading

github-actions bot commented Nov 3, 2024 • edited Loading

banach-space left a comment

Choose a reason for hiding this comment

banach-space left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javedabsar1 commented Nov 6, 2024

banach-space commented Nov 6, 2024

javedabsar1 commented Nov 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javedabsar1 Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

banach-space left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javedabsar1 commented Nov 8, 2024 • edited Loading

MaheshRavishankar commented Nov 8, 2024

javedabsar1 commented Nov 8, 2024

banach-space left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javedabsar1 Nov 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javedabsar1 commented Nov 3, 2024 •

edited

Loading

llvmbot commented Nov 3, 2024 •

edited

Loading

github-actions bot commented Nov 3, 2024 •

edited

Loading

javedabsar1 Nov 7, 2024 •

edited

Loading

javedabsar1 commented Nov 8, 2024 •

edited

Loading

javedabsar1 Nov 9, 2024 •

edited

Loading