-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Adreno] Add markup pass of relay tensors for static texture planning #11878
Conversation
aad99a4
to
18a6065
Compare
855ca67
to
a803b72
Compare
696424b
to
2e4b814
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Several minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
@masahi @csullivan @junrushao1994, please take out sometime to review this code and offer some feedback to @elvin-n. Thanks. |
72fc432
to
ec78740
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your hard work on this @elvin-n, great to see this working with the virtual device planning.
"global", | ||
"global.texture", | ||
"global.texture-weight", | ||
"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to have unit tests explicitly checking the scopes!
That said, this list is a bit confusing. For example, this list has global.texture-weight
scope as the final entry, which is a scope used for a weight, but that certainly is not the final expr in topo order. Even nicer would be a map from expr to scope, or since its checking against the json maybe function/kernel name for the key? This would make the test more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The network output memory scope is "" / (empty string) that is ideologically synonym for "global" but attempt to mark the tail by "global" memory scope causes PlanDevice pass behaves unexpectedly and fails the transformation. Sometimes PlanDevice try to put device_copy from global to empty scope, sometime just aborts.
The idea of having mapping of op->scope
instead of just array was dictated by the nature how memory scopes are stored in json. To compare the mapped values, we have to build such mapping basing on json. That is doable but requires more efforts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, please don't be confused by the name of the memory scopes. It is historical naming. Now the layout of texture is defined by some algorithm defined in annotate_texture_storage.cc Scope()
functions. and it is rather refer to
texture -> 123|4|5
texture-weight -> 1|234|5
texture-nchw ->12|34|5
these scopes are applied to any type of the tensor - data/weights/bias. dividing by buckets are defined only based by values in shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
these scopes are applied to any type of the tensor - data/weights/bias. dividing by buckets are defined only based by values in shape
This is something I hope we can change down the line with the use of Buffer.axis_separators and sch.transform_layout
To compare the mapped values, we have to build such mapping basing on json. That is doable but requires more efforts.
If you have extra cycles for adding these ergonomics in a follow up PR, this would make the unit testing more accessible for other developers to author.
// TODO(csullivan): Add support for mixed output storage scope. | ||
// In current adreno storage planner all outputs of a | ||
// primitive function are assumed to be of the same storage | ||
// type. This should be easy to extend in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we relax this requirement now? I don't recall the outcome of our discussion previously on this. I thought I recall you having mentioned some change on the topic of heterogeneous scopes in a tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have the use case with layer producing tuple having several scopes. E.g. producing of data 5d and 1d tensors. Currently we do not have such test example and we cannot move forward with design and implementation. Until we do not have it, I propose to leave TODO in this place
Co-authored-by: Chris Sullivan <csullivan@octoml.ai>
ec78740
to
1c8abb5
Compare
std::string Scope(Array<PrimExpr> shape, const VirtualDevice& vd) { | ||
if (vd != VirtualDevice::FullyUnconstrained()) { | ||
// currently we support only textures been made from 5d tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(@csullivan, @elvin-n): Support more layouts with Buffer.axis_separators lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…apache#11878) * [Adreno] Add static texture markup relay pass Co-authored-by: Chris Sullivan <csullivan@octoml.ai> * lint check * Remove hardcoded texture limit, check through target options * fix cpplint * Add winograd into annotation pass * fix clang * Remove extra call of PlanDevice in OptimizeImpl * Remove one more extra call of PlanDevice in OptimizeImpl * Fix/add scopes for static texture planning tests * Remove test_2conv2d as duplication of test_plan_device_issue * remove comments in test_residual_block * address review comments * fix black hits * Add textures test descriptions * Address PR comments Co-authored-by: Chris Sullivan <csullivan@octoml.ai>
This PR is a split part of origin PR11357, should be merged after PR11874, PR11875, PR11876
@csullivan @mbs-octoml