Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#12355: Support vector of optional tensor and example #12356

Merged
merged 2 commits into from
Sep 22, 2024

Conversation

hschoi4448
Copy link
Contributor

Ticket

#12355

Problem description

There is currently an issue in ttnn where if you create an op that returns a vector of optional tensors and use register_operation_with_auto_launch_op, the build fails.

The problematic part is in decorator.hpp where only cases for tensor, tensors, and tuple are allowed, causing the static_assert to fail.

What's changed

Add handling for a vector of optional tensors and create an example that returns a vector of optional tensors.

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

@hschoi4448 hschoi4448 added the moreh moreh contribution label Sep 7, 2024
@hschoi4448 hschoi4448 marked this pull request as ready for review September 7, 2024 04:33
@tenstorrent tenstorrent locked and limited conversation to collaborators Sep 7, 2024
@tenstorrent tenstorrent unlocked this conversation Sep 7, 2024
auto size = value.size();
output_tensors.reserve(size);

auto dummy_tensor = Tensor();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this part does not look good to me.

@dmakoviichuk-tt i remember you added a support for a return of vector of optional tensors ~2 months ago. Can you please take a look? This blocks Moreh and id appreciate your feedback here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added support for the launch_op and for a few more cases.
Seems like decorators changed a lot since this time.

@ayerofieiev-tt
Copy link
Member

This change basically means you return a vector, not vector<optional>, right?
It kind of violates a contract on api. API says that if output is missing - there will be a nullopt.
But this version returns vectors such that every element is at least an empty tensor, which I believe violates the contract.

I think for vector<optional> to properly work with register_operation_with_auto_launch_op we need to do next:

  1. Handle vector in create_async_output_tensors
  2. Update map_execute_on_worker_thread_return_to_launch_op_return
  3. Handle in invoke_composite outputs processing

Infra should be aware about which outputs have to be created. This requires those methods to accept more args.
Also, it is unlikely that we can create a single method which knows how many elements there are in a vector and which ones have to be nullopt/vs not universal for all ops. It means implementation of create_async_output_tensors will have to happen in the operation.

@ayerofieiev-tt
Copy link
Member

In other words:

If op returns vector<Tensor>

Infra does not know how many items to create.
You need to implement create_async_output_tensors in the op.

If op return vector<optional<Tensor>>

Infra not only does not know how many tensors to create, but also has no idea which to create and which not.
You'd need to implement create_async_output_tensors in the op, but with current arguments you won't be able to determine which tensors to create and which not.

@ayerofieiev-tt
Copy link
Member

In other words:

You solution makes things compile, but it changes the op type from the expected vector<optional<Tensor>> op(args) to vector<Tensors> op(args), where some tensors might be empty instead of being nullopt. That, in my opinion, breaks the op contract.

@ayerofieiev-tt
Copy link
Member

Conclusion

We can't provide a better alternative at the moment.

In most of the migrated PRs you use register_operation, which requires a manual call to launch_op, else an operation host code and dispatch will run in the main thread. This will likely be a regression in behavior and performance for you.

Unless you have time to implement a more proper solution, which I outline here, I am ok if this is merged.

This change should enable you to use register_operation_with_auto_launch_op in all cases.
Leaving approve to not block the migration.

@hschoi4448
Copy link
Contributor Author

Thank you for the detailed explanation.
I also recognize that there is an issue with this PR, but I don't have time to properly address it right now.

I plan to merge this PR as it is for now, and I would appreciate it if you could make the proper corrections when you have time.

@ayerofieiev-tt @razorback3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moreh moreh contribution P1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants