Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[webgpu native] Add transpose shared #22098

Merged

Conversation

axinging
Copy link
Contributor

Description

Motivation and Context

@axinging axinging force-pushed the transposeshared_webgpunative branch 3 times, most recently from 894aa04 to d7d21d1 Compare September 18, 2024 08:24
@axinging axinging marked this pull request as ready for review September 18, 2024 08:24
for (size_t i = 0; i < perm.size(); ++i) {
ss << "fn perm(i: output_indices_t)->a_indices_t {\n"
" var a: a_indices_t;\n";
for (auto i = 0; i < perm.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use type size_t for i instead of auto for basic types. using auto here will cause build break in linux. (auto is inferred as int and comparison between int and size_t is a warning as error)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

InlinedVector<int64_t> new_shape{};
InlinedVector<int64_t> new_perm{};
SqueezeShape(input_shape.GetDims(), *p_perm, new_shape, new_perm);
const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use explicit bool

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});
const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
auto new_input_shape = use_shared ? new_shape : input_shape;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to avoid unnecessary assignments (when use_shared == true, new_input_shape will be assigned again below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
auto new_input_shape = use_shared ? new_shape : input_shape;
auto new_output_shape = output_dims;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_input_shape is TensorShape while new_output_shape is InlinedVector<int64_t>. is it expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TensorShape for all now.

const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias);

if (use_shared_) {
const auto tile_size = std::to_string(tile_size_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constants or overridable constants for tile size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

@axinging axinging left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, @fs-eire

for (size_t i = 0; i < perm.size(); ++i) {
ss << "fn perm(i: output_indices_t)->a_indices_t {\n"
" var a: a_indices_t;\n";
for (auto i = 0; i < perm.size(); ++i) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

InlinedVector<int64_t> new_shape{};
InlinedVector<int64_t> new_perm{};
SqueezeShape(input_shape.GetDims(), *p_perm, new_shape, new_perm);
const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const auto channels_last = new_perm == InlinedVector<int64_t>({2, 3, 1});
const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
auto new_input_shape = use_shared ? new_shape : input_shape;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const auto channels_first = new_perm == InlinedVector<int64_t>({3, 1, 2});
const auto use_shared = (new_shape.size() == 2 && new_perm[0] > new_perm[1]) || channels_last || channels_first;
auto new_input_shape = use_shared ? new_shape : input_shape;
auto new_output_shape = output_dims;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TensorShape for all now.

const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias);

if (use_shared_) {
const auto tile_size = std::to_string(tile_size_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

uint32_t output_size = gsl::narrow_cast<int32_t>(input_tensor->Shape().Size());
TransposeProgram program{*p_perm};
TransposeProgram program{*p_perm, use_shared};
const auto tile_size = TransposeProgram::TILE_SIZE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you are also using TILE_SIZE here in different classes, you can just declare it as a global const. it can be either static or inside an anonymous namespace as long as it's only used in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

const auto& output = shader.AddOutput("output", ShaderUsage::UseUniform | ShaderUsage::UseIndicesTypeAlias | ShaderUsage::UseValueTypeAlias);

if (use_shared_) {
const auto tile_size = std::to_string(TILE_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -11,18 +11,21 @@
namespace onnxruntime {
namespace webgpu {

constexpr static const uint32_t TILE_SIZE = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be better put in transpose.cc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to onnxruntime::webgpu::Transpose::TILE_SIZE

@fs-eire fs-eire merged commit 41f6ff3 into microsoft:fs-eire/webgpu-ep Sep 27, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants