[webgpu native] Add transpose shared #22098

axinging · 2024-09-14T08:17:13Z

Description

Motivation and Context

fs-eire · 2024-09-24T23:34:48Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

-  for (size_t i = 0; i < perm.size(); ++i) {
+  ss << "fn perm(i: output_indices_t)->a_indices_t {\n"
+        "  var a: a_indices_t;\n";
+  for (auto i = 0; i < perm.size(); ++i) {


please use type size_t for i instead of auto for basic types. using auto here will cause build break in linux. (auto is inferred as int and comparison between int and size_t is a warning as error)

fs-eire · 2024-09-24T23:44:54Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  InlinedVector<int64_t> new_shape{};
+  InlinedVector<int64_t> new_perm{};
+  SqueezeShape(input_shape.GetDims(), *p_perm, new_shape, new_perm);
+  const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});


use explicit bool

fs-eire · 2024-09-24T23:52:32Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});
+  const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
+  const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
+  auto new_input_shape = use_shared ? new_shape : input_shape;


try to avoid unnecessary assignments (when use_shared == true, new_input_shape will be assigned again below)

fs-eire · 2024-09-24T23:55:33Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
+  const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
+  auto new_input_shape = use_shared ? new_shape : input_shape;
+  auto new_output_shape = output_dims;


new_input_shape is TensorShape while new_output_shape is InlinedVector<int64_t>. is it expected?

Use TensorShape for all now.

fs-eire · 2024-09-24T23:58:32Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias);
+
+  if (use_shared_) {
+    const auto tile_size = std::to_string(tile_size_);


use constants or overridable constants for tile size

onnxruntime/core/providers/webgpu/tensor/transpose.cc

axinging

Updated, @fs-eire

axinging · 2024-09-25T00:38:07Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

-  for (size_t i = 0; i < perm.size(); ++i) {
+  ss << "fn perm(i: output_indices_t)->a_indices_t {\n"
+        "  var a: a_indices_t;\n";
+  for (auto i = 0; i < perm.size(); ++i) {


axinging · 2024-09-25T00:40:22Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  InlinedVector<int64_t> new_shape{};
+  InlinedVector<int64_t> new_perm{};
+  SqueezeShape(input_shape.GetDims(), *p_perm, new_shape, new_perm);
+  const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});


axinging · 2024-09-25T00:46:13Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});
+  const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
+  const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
+  auto new_input_shape = use_shared ? new_shape : input_shape;


onnxruntime/core/providers/webgpu/tensor/transpose.cc

axinging · 2024-09-25T06:39:08Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
+  const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
+  auto new_input_shape = use_shared ? new_shape : input_shape;
+  auto new_output_shape = output_dims;


Use TensorShape for all now.

axinging · 2024-09-25T06:39:16Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias);
+
+  if (use_shared_) {
+    const auto tile_size = std::to_string(tile_size_);


fs-eire · 2024-09-25T20:46:01Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

  uint32_t output_size = gsl::narrow_cast<int32_t>(input_tensor->Shape().Size());
-  TransposeProgram program{*p_perm};
+  TransposeProgram program{*p_perm, use_shared};
+  const auto tile_size = TransposeProgram::TILE_SIZE;


since you are also using TILE_SIZE here in different classes, you can just declare it as a global const. it can be either static or inside an anonymous namespace as long as it's only used in this file.

fs-eire · 2024-09-27T05:54:14Z

onnxruntime/core/providers/webgpu/tensor/transpose.cc

+  const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias);
+
+  if (use_shared_) {
+    const auto tile_size = std::to_string(TILE_SIZE);


should we remove this ?

fs-eire · 2024-09-27T05:54:32Z

onnxruntime/core/providers/webgpu/tensor/transpose.h

@@ -11,18 +11,21 @@
 namespace onnxruntime {
 namespace webgpu {

+constexpr static const uint32_t TILE_SIZE = 16;


This may be better put in transpose.cc

Move to onnxruntime::webgpu::Transpose::TILE_SIZE

axinging force-pushed the transposeshared_webgpunative branch 3 times, most recently from 894aa04 to d7d21d1 Compare September 18, 2024 08:24

axinging marked this pull request as ready for review September 18, 2024 08:24

axinging force-pushed the transposeshared_webgpunative branch from d7d21d1 to f408d83 Compare September 23, 2024 08:29

fs-eire reviewed Sep 24, 2024

View reviewed changes

fs-eire reviewed Sep 25, 2024

View reviewed changes

onnxruntime/core/providers/webgpu/tensor/transpose.cc Show resolved Hide resolved

axinging added 3 commits September 25, 2024 08:35

[webgpu native] Add transpose shared

ff34dd9

Fix comments

03b8861

Use CONSTANTS

0649cd2

axinging force-pushed the transposeshared_webgpunative branch from f408d83 to 0649cd2 Compare September 25, 2024 06:38

axinging commented Sep 25, 2024

View reviewed changes

fs-eire reviewed Sep 25, 2024

View reviewed changes

Use const TILE_SIZE

dab1a33

fs-eire reviewed Sep 27, 2024

View reviewed changes

axinging added 2 commits September 27, 2024 14:08

Refine TILE_SIZE

4201346

Nit

cab564e

fs-eire approved these changes Sep 27, 2024

View reviewed changes

fs-eire merged commit 41f6ff3 into microsoft:fs-eire/webgpu-ep Sep 27, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[webgpu native] Add transpose shared #22098

[webgpu native] Add transpose shared #22098

axinging commented Sep 14, 2024

fs-eire Sep 24, 2024

axinging Sep 25, 2024

fs-eire Sep 24, 2024

axinging Sep 25, 2024

fs-eire Sep 24, 2024

axinging Sep 25, 2024

fs-eire Sep 24, 2024

axinging Sep 25, 2024

fs-eire Sep 24, 2024

axinging Sep 25, 2024

axinging left a comment

axinging Sep 25, 2024

axinging Sep 25, 2024

axinging Sep 25, 2024

axinging Sep 25, 2024

axinging Sep 25, 2024

fs-eire Sep 25, 2024

axinging Sep 26, 2024

fs-eire Sep 27, 2024

axinging Sep 27, 2024

fs-eire Sep 27, 2024

axinging Sep 27, 2024

[webgpu native] Add transpose shared #22098

[webgpu native] Add transpose shared #22098

Conversation

axinging commented Sep 14, 2024

Description

Motivation and Context

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axinging left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment