-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add NNEF frontend #108
Conversation
Thanks for the proposal, as a community we recently moves towards the relax IR for latest genAI workloads, additionally, it is unclear how much adoption NNEF have as of now versus ONNX and other formats |
Hi,
Thanks for directing us towards Relax. I guess that means that new frontends should convert their representations into Relax IR instead of Relay? The documentation on tvm.apache.org refers to Relay, but not Relax. Is that documentation obsolete in this area? Is Relay going to be superseded by Relax? We only see frontend examples in tvm.relax that we can use as reference. Is there further documentation on tvm.relax? It is interesting to hear that there's more focus on dynamic graphs / shape inference, as one of the key goals of the next version of NNEF, under development, is support for dynamic graphs and shape inference.
One of the goals of integration into compiler stacks like TVM would be exactly to drive more adoption, as adoption requires public tooling to be able to demonstrate the capabilities / usage of NNEF in end-to-end workflows. As the next version of NNEF will focus on dynamic graphs, custom operations and lowering to tensor IR level, TVM seems like a good option to demonstrate its potential in compilation based inference engines. But first we would like to start with integrating the currently publicly available version of NNEF. Also, TVM has backends to multiple Khronos formats, such as SPIR-V (Vulkan) and OpenCL, that is why TVM could provide us with an end-to-end workflow starting from a Khronos defined input format, and resulting in Khronos defined outputs. Furthermore, some Khronos members may be interested in implementing their own (proprietary) hardware backends to TVM, with which an NNEF frontend could also provide an end-to-end workflow. |
Thanks for the note. We are in the process of revamping docs. The latest set of emerging model optimizations like LLMs will be based on relax, most of the community developments also now centers around this. Relay is mostly in maintaince mode per dev activities. https://github.com/apache/tvm/tree/main/python/tvm/relax/frontend/onnx likely is a good reference there |
We have updated the PR with Relax frontend, but we have also kept the Relay, as an option, thinking it could be useful to have both, because we noticed performance differences during testing. We observed that Relax by the default build pipeline is significantly slower than Relay. (On CPU we observed 2 orders of magnitude slower runs, while on GPU 3-5x slower, the models we tested were mobilenet and resnet variants, all static models) We observed the same with the ONNX Relax frontend, so we suspect the issue is with the compilation, not with the frontends. Is this a normal situation with the current state of development of Relax? Anyways this does not affect the frontend code in the PR so we could move on with that as the frontend is ready to be submitted to TVM. We are just curious for debug/measurement reasons, as we were surprised by the results. |
I think the main reason here was relay default incorporate autotuning by default, while Relax dos not. The main rationale as of now is we would like to choose to decouple metaschedule tuning from the flow (as tuning is usually slower). That does not mean metaschedule cannot be applied, we do encourage users to apply metaschedule for traditional applications. In the build flow, the meta-schedule can then get applied by composing together with default flow. The zero pipeline as of now mainly focus on some of some extra out of box improvement for latest LLM models and can expand to more in future |
Thanks for the info about the schedules and differences, it makes sense. As for moving on, what would be the next step now? Do you need any other info from us for reviewing? |
Leaving it open for another week in case others want to chime in, otherwise LGTM |
Great, thank you! |
A RFC to add a Neural Network Exchange Format frontend to TVM relay.
Link to discuss