[RFC] Add NNEF frontend #108

agoston-mc · 2024-04-11T17:15:33Z

A RFC to add a Neural Network Exchange Format frontend to TVM relay.
Link to discuss

tqchen · 2024-04-13T16:57:49Z

Thanks for the proposal, as a community we recently moves towards the relax IR for latest genAI workloads, additionally, it is unclear how much adoption NNEF have as of now versus ONNX and other formats

gyenesvi · 2024-04-16T08:53:53Z

Hi,

as a community we recently moves towards the relax IR for latest genAI workloads

Thanks for directing us towards Relax. I guess that means that new frontends should convert their representations into Relax IR instead of Relay? The documentation on tvm.apache.org refers to Relay, but not Relax. Is that documentation obsolete in this area? Is Relay going to be superseded by Relax?

We only see frontend examples in tvm.relax that we can use as reference. Is there further documentation on tvm.relax?

It is interesting to hear that there's more focus on dynamic graphs / shape inference, as one of the key goals of the next version of NNEF, under development, is support for dynamic graphs and shape inference.

it is unclear how much adoption NNEF have as of now versus ONNX and other formats

One of the goals of integration into compiler stacks like TVM would be exactly to drive more adoption, as adoption requires public tooling to be able to demonstrate the capabilities / usage of NNEF in end-to-end workflows. As the next version of NNEF will focus on dynamic graphs, custom operations and lowering to tensor IR level, TVM seems like a good option to demonstrate its potential in compilation based inference engines. But first we would like to start with integrating the currently publicly available version of NNEF.

Also, TVM has backends to multiple Khronos formats, such as SPIR-V (Vulkan) and OpenCL, that is why TVM could provide us with an end-to-end workflow starting from a Khronos defined input format, and resulting in Khronos defined outputs. Furthermore, some Khronos members may be interested in implementing their own (proprietary) hardware backends to TVM, with which an NNEF frontend could also provide an end-to-end workflow.

tqchen · 2024-04-16T12:17:34Z

Thanks for the note. We are in the process of revamping docs. The latest set of emerging model optimizations like LLMs will be based on relax, most of the community developments also now centers around this. Relay is mostly in maintaince mode per dev activities. https://github.com/apache/tvm/tree/main/python/tvm/relax/frontend/onnx likely is a good reference there

agoston-mc · 2024-05-08T16:37:43Z

We have updated the PR with Relax frontend, but we have also kept the Relay, as an option, thinking it could be useful to have both, because we noticed performance differences during testing.

We observed that Relax by the default build pipeline is significantly slower than Relay. (On CPU we observed 2 orders of magnitude slower runs, while on GPU 3-5x slower, the models we tested were mobilenet and resnet variants, all static models) We observed the same with the ONNX Relax frontend, so we suspect the issue is with the compilation, not with the frontends. Is this a normal situation with the current state of development of Relax?
By using Meta Schedule with a custom pipeline (with only a ValidateOps transformation), we managed to match or surpass the speed of Relay, but in many cases using the 'zero' or 'default_build' pipelines did not improve the performance.
What is the recommended workflow to be able to reach the performance of Relay reliably (at least on static models)?

Anyways this does not affect the frontend code in the PR so we could move on with that as the frontend is ready to be submitted to TVM. We are just curious for debug/measurement reasons, as we were surprised by the results.

tqchen · 2024-05-09T12:41:26Z

I think the main reason here was relay default incorporate autotuning by default, while Relax dos not. The main rationale as of now is we would like to choose to decouple metaschedule tuning from the flow (as tuning is usually slower).

That does not mean metaschedule cannot be applied, we do encourage users to apply metaschedule for traditional applications. In the build flow, the meta-schedule can then get applied by composing together with default flow.

The zero pipeline as of now mainly focus on some of some extra out of box improvement for latest LLM models and can expand to more in future

gyenesvi · 2024-05-09T13:30:24Z

Thanks for the info about the schedules and differences, it makes sense.

As for moving on, what would be the next step now? Do you need any other info from us for reviewing?

tqchen · 2024-05-09T13:52:09Z

Leaving it open for another week in case others want to chime in, otherwise LGTM

gyenesvi · 2024-05-09T14:43:16Z

Great, thank you!

[RFC] Add NNEF frontend (apache#108)

bb3b390

agoston-mc changed the title ~~[RFC] Add NNEF frontend #108~~ [RFC] Add NNEF frontend Apr 11, 2024

agoston-mc added 2 commits April 17, 2024 08:39

update md

49c2509

Add Relax to RFC

7da62d0

tqchen approved these changes May 9, 2024

View reviewed changes

tqchen merged commit f0f982f into apache:main May 31, 2024

agoston-mc mentioned this pull request May 31, 2024

[Frontend] Add NNEF frontend apache/tvm#17053

Open

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes apache/tvm#17178

Closed

agoston-mc mentioned this pull request Oct 1, 2024

[Docker][CI] Add NNEF dependency to CI images apache/tvm#17433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Add NNEF frontend #108

[RFC] Add NNEF frontend #108

agoston-mc commented Apr 11, 2024 •

edited

Loading

tqchen commented Apr 13, 2024

gyenesvi commented Apr 16, 2024

tqchen commented Apr 16, 2024 •

edited

Loading

agoston-mc commented May 8, 2024

tqchen commented May 9, 2024 •

edited

Loading

gyenesvi commented May 9, 2024

tqchen commented May 9, 2024

gyenesvi commented May 9, 2024

[RFC] Add NNEF frontend #108

[RFC] Add NNEF frontend #108

Conversation

agoston-mc commented Apr 11, 2024 • edited Loading

tqchen commented Apr 13, 2024

gyenesvi commented Apr 16, 2024

tqchen commented Apr 16, 2024 • edited Loading

agoston-mc commented May 8, 2024

tqchen commented May 9, 2024 • edited Loading

gyenesvi commented May 9, 2024

tqchen commented May 9, 2024

gyenesvi commented May 9, 2024

agoston-mc commented Apr 11, 2024 •

edited

Loading

tqchen commented Apr 16, 2024 •

edited

Loading

tqchen commented May 9, 2024 •

edited

Loading