Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#12207: Port moreh_dot to ttnn #12249

Closed

Conversation

thanhnguyen-moreh
Copy link
Contributor

@thanhnguyen-moreh thanhnguyen-moreh commented Sep 5, 2024

Ticket

#12207 https://github.com/tenstorrent/tt-metal/issues/12207

Problem description

moreh_dot was deprecated alongside tt_dnn. This PR ports it to ttnn using its operation format.

What's changed

Move device code to ttnn
Create new wrapper code for ttnn with new modules

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

py::arg("input_tensor_b"),
py::kw_only(),
py::arg("output_dtype") = ttnn::bfloat16,
py::arg("output_mem_config") = operation::DEFAULT_OUTPUT_MEMORY_CONFIG});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
py::arg("output_mem_config") = operation::DEFAULT_OUTPUT_MEMORY_CONFIG});
py::arg("output_memory_config") = std::nullopt;

DEFAULT_OUTPUT_MEMORY_CONFIG is an old style that refers to DRAM and interleaved memory. In the latest version, the default value is nullopt, meaning that the input and output will use the same memory_config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most occurrences of output_memory_config show up as DEFAULT_OUTPUT_MEMORY_CONFIG, but that's mainly because they haven't been refactored yet


namespace ttnn {
constexpr auto moreh_dot =
ttnn::register_operation<"ttnn::moreh_dot", ttnn::operations::moreh::moreh_dot::MorehDot>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ttnn::register_operation<"ttnn::moreh_dot", ttnn::operations::moreh::moreh_dot::MorehDot>();
ttnn::register_operation_with_auto_launch_op<"ttnn::moreh_dot", ttnn::operations::moreh::moreh_dot::MorehDot>();

There seems to be an issue related to the queue in register_operation at the moment. Please use register_operation_with_auto_launch_op instead

auto dst_buffer = output_tensor.buffer();
float scaler = 1.0f;

tt::tt_metal::Program program{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tt::tt_metal::Program program{};
Program program{};

Since using namespace tt::tt_metal is declared at the top, I think it should be fine to make this change

Comment on lines +15 to +17
namespace tt {
namespace operations {
namespace primary {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
namespace tt {
namespace operations {
namespace primary {
namespace ttnn {
namespace operations {

The reason the namespace of moreh_helper was primary is that, in the past, all moreh ops used the primary namespace.
Since the moreh ops are no longer in the primary namespace, moving them to ttnn::operations would allow us to reduce unnecessary namespace code, as shown below

(std::uint32_t)tt::operations::primary::is_dram(src0_buffer) 
-> 
(std::uint32_t)(is_dram(src0_buffer))

However, if changing the namespace causes a lot of unexpected additional work, it’s not necessary to make this change in this PR

const auto& input_tensor_a = tensor_args.input_tensor_a;
const auto& input_tensor_b = tensor_args.input_tensor_b;

TT_ASSERT(tt::operations::primary::is_1d_tensor(input_tensor_a));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TT_ASSERT(tt::operations::primary::is_1d_tensor(input_tensor_a));
TT_FATAL(tt::operations::primary::is_1d_tensor(input_tensor_a));

I prefer using TT_FATAL because TT_ASSERT only works in debug mode.
Actually, since validation is not currently executed in Release mode, it doesn't matter at the moment, but we don't know how things might change in the future.

output_grad = tt_output_grad = torch_output_grad = tt_input_grad = tt_other_grad = None
if require_input_grad or require_other_grad:
output_grad = torch.randint(-2, 3, output_shape, dtype=cpu_dtype)
# tt_output_grad = ttnn.Tensor(output_grad, npu_dtype).pad_to_tile(float("nan")).to(cpu_layout).to(device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems to be mistakenly committed during debugging. Please delete it.


const std::vector<Tensor> input_tensors = {input_tensor_a, input_tensor_b};

auto override_runtime_arguments_callback = [reader_kernel_id, writer_kernel_id, compute_kernel_id](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this code to MorehDotOperation::SingleCore::override_runtime_arguments and delete.

@thanhnguyen-moreh
Copy link
Contributor Author

I have created a new PR here
#12265
I will make changes on there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants