-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[webgpu native] Add transpose shared #22098
[webgpu native] Add transpose shared #22098
Conversation
894aa04
to
d7d21d1
Compare
d7d21d1
to
f408d83
Compare
for (size_t i = 0; i < perm.size(); ++i) { | ||
ss << "fn perm(i: output_indices_t)->a_indices_t {\n" | ||
" var a: a_indices_t;\n"; | ||
for (auto i = 0; i < perm.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use type size_t
for i
instead of auto
for basic types. using auto
here will cause build break in linux. (auto is inferred as int and comparison between int and size_t is a warning as error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
InlinedVector<int64_t> new_shape{}; | ||
InlinedVector<int64_t> new_perm{}; | ||
SqueezeShape(input_shape.GetDims(), *p_perm, new_shape, new_perm); | ||
const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use explicit bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1}); | ||
const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2}); | ||
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first; | ||
auto new_input_shape = use_shared ? new_shape : input_shape; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to avoid unnecessary assignments (when use_shared == true, new_input_shape
will be assigned again below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2}); | ||
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first; | ||
auto new_input_shape = use_shared ? new_shape : input_shape; | ||
auto new_output_shape = output_dims; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_input_shape is TensorShape
while new_output_shape is InlinedVector<int64_t>
. is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TensorShape for all now.
const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias); | ||
|
||
if (use_shared_) { | ||
const auto tile_size = std::to_string(tile_size_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use constants or overridable constants for tile size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
f408d83
to
0649cd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, @fs-eire
for (size_t i = 0; i < perm.size(); ++i) { | ||
ss << "fn perm(i: output_indices_t)->a_indices_t {\n" | ||
" var a: a_indices_t;\n"; | ||
for (auto i = 0; i < perm.size(); ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
InlinedVector<int64_t> new_shape{}; | ||
InlinedVector<int64_t> new_perm{}; | ||
SqueezeShape(input_shape.GetDims(), *p_perm, new_shape, new_perm); | ||
const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1}); | ||
const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2}); | ||
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first; | ||
auto new_input_shape = use_shared ? new_shape : input_shape; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2}); | ||
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first; | ||
auto new_input_shape = use_shared ? new_shape : input_shape; | ||
auto new_output_shape = output_dims; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TensorShape for all now.
const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias); | ||
|
||
if (use_shared_) { | ||
const auto tile_size = std::to_string(tile_size_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
uint32_t output_size = gsl::narrow_cast<int32_t>(input_tensor->Shape().Size()); | ||
TransposeProgram program{*p_perm}; | ||
TransposeProgram program{*p_perm, use_shared}; | ||
const auto tile_size = TransposeProgram::TILE_SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you are also using TILE_SIZE here in different classes, you can just declare it as a global const. it can be either static or inside an anonymous namespace as long as it's only used in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias); | ||
|
||
if (use_shared_) { | ||
const auto tile_size = std::to_string(TILE_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -11,18 +11,21 @@ | |||
namespace onnxruntime { | |||
namespace webgpu { | |||
|
|||
constexpr static const uint32_t TILE_SIZE = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be better put in transpose.cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to onnxruntime::webgpu::Transpose::TILE_SIZE
Description
Motivation and Context