Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT-osdi2020_artifact branch] Usage of the kernel_db for new networks #208

Closed
troore opened this issue Jan 21, 2021 · 6 comments
Closed
Labels
enhancement New feature or request

Comments

@troore
Copy link

troore commented Jan 21, 2021

Hi nnfusion authors,

I can reproduce the results in the OSDI 2020 Rammer paper. Thanks for your excellent work!
Now I am trying some new networks. Everything goes well except for the tvm part.
My question is: how to use the kernel_db?
In the kernel_db directory, based on the the logs in kernel_db/autotvm_logs directory, there are some scripts to help generate some json files and insert them into the nnfusion op database.
However, for example, in the Figure11/tvm directory, it seems that only the logs in kernel_db/autotvm_logs are used for this part.
So it is a bit confusing for me that how to correctly use the op database: for a new network, like ResNet50, should I repeat the "autotvm logs->jsons->op database" steps, then still use the autotvm_logs to run tvm, or I can just use the autotvm_logs?
Another question is, how to obtain the autotvm_logs? Should I use tvm to run its auto-tuing to generate them, or there is some script to help me with this part? I didn't find it.

Thanks,
troore

@troore troore added the enhancement New feature or request label Jan 21, 2021
@nnfbot
Copy link

nnfbot commented Jan 21, 2021

Thanks for the report @troore! I will look into it ASAP! (I'm a bot).

@xysmlx
Copy link
Contributor

xysmlx commented Jan 21, 2021

Hi @troore , thank you for your attention.

For a new model, you need to inject external kernels (from kernel tuner like TVM or manual implementation) into the kernel DB via scripts if you want to use Rammer to optimize the whole model. Otherwise, NNFusion will use default kernel emitters (e.g., kernels from cuDNN, cuBLAS) and the BlockFusion pass could not work on those emitters due to the lack of kernel source code.

  • For the artifact branch, you could follow the existing json format and use the tool to inject kernels. One possible workflow is autotvm_log (generated by TVM tuning) -> json (codegen with scripts and tvm-0.7-codegen) -> injecting to kernel DB.
  • For the main branch, you could also follow the new kernel DB schema, the example tool and the example kernel to inject kernels (e.g., python convert_external.py sample_fused_conv_add_relu_alexnet.json) into the kernel DB. BTW, NNFusion has supported auto kernel tuning since the v0.2 release via integrating Antares. Here is a tutorial. And NNFusion could automatically convert a subset of generated kernels (i.e., kernels without shared_memory and __syncthread) into BlockCudaEmitter (i.e., rOperator). We are still working on some items listed in the release plan. After finishing these items, all the generated kernels from the kernel tuner could be automatically converted to BlockCudaEmitter.

Figure11/tvm is the TVM+AutoTVM baseline. In the artifact, TVM and Rammer will use the same pre-tuned AutoTVM logs (1000 steps) for fair comparison. TVM could directly load the AutoTVM logs, while Rammer needs to use the kernel DB. That is the reason why TVM directly use the AutoTVM logs in the artifact. For new models, the pre-tuned AutoTVM logs do not cover those new kernel configs and you need to tune kernels. The *_tuning.py scripts in the autotvm_scripts folder could help in tuning kernels and generating AutoTVM logs. After having these logs, you could follow the artifact to inject them into kernel DB. Note that the AutoTVM log format has changed in one of the commits of TVM-0.7, so it is recommend to use the TVM in the artifact if you want to re-use these scripts. You could also organize the json format with other external kernels.

Best regards,
Lingxiao

@troore
Copy link
Author

troore commented Jan 22, 2021

Hi @xysmlx,

Thanks for the clear and detailed explanation. I will take a try and give feedback soon.

Thanks,
troore

@lilpenguin
Copy link

Hi, I was following the steps (copied below) to run Rammer on a network that is not in the artifact, but I ran into a problem. I got the autotvm log, but based on this script which helps the codegen, it seems that I also need an op_config file like this in order to get the json file. This is probably a naive question, but do you know how I can get the config file?

For the artifact branch, you could follow the existing json format and use the tool to inject kernels. One possible workflow is autotvm_log (generated by TVM tuning) -> json (codegen with scripts and tvm-0.7-codegen) -> injecting to kernel DB.

Thank you very much for your help!

@xysmlx
Copy link
Contributor

xysmlx commented Jan 28, 2021

Hi, @lilpenguin , you can manually figure out the needed op configs and organize them as the op_config files in the artifact branch. You can also modify the NNFusion to collect these information. I modified the NNFusion as this to collect op_config files for Convolution, Dot and AvgPool operators.

@xysmlx
Copy link
Contributor

xysmlx commented Jan 29, 2021

Hi @troore , thank you for your attention.

For a new model, you need to inject external kernels (from kernel tuner like TVM or manual implementation) into the kernel DB via scripts if you want to use Rammer to optimize the whole model. Otherwise, NNFusion will use default kernel emitters (e.g., kernels from cuDNN, cuBLAS) and the BlockFusion pass could not work on those emitters due to the lack of kernel source code.

  • For the artifact branch, you could follow the existing json format and use the tool to inject kernels. One possible workflow is autotvm_log (generated by TVM tuning) -> json (codegen with scripts and tvm-0.7-codegen) -> injecting to kernel DB.
  • For the main branch, you could also follow the new kernel DB schema, the example tool and the example kernel to inject kernels (e.g., python convert_external.py sample_fused_conv_add_relu_alexnet.json) into the kernel DB. BTW, NNFusion has supported auto kernel tuning since the v0.2 release via integrating Antares. Here is a tutorial. And NNFusion could automatically convert a subset of generated kernels (i.e., kernels without shared_memory and __syncthread) into BlockCudaEmitter (i.e., rOperator). We are still working on some items listed in the release plan. After finishing these items, all the generated kernels from the kernel tuner could be automatically converted to BlockCudaEmitter.

Figure11/tvm is the TVM+AutoTVM baseline. In the artifact, TVM and Rammer will use the same pre-tuned AutoTVM logs (1000 steps) for fair comparison. TVM could directly load the AutoTVM logs, while Rammer needs to use the kernel DB. That is the reason why TVM directly use the AutoTVM logs in the artifact. For new models, the pre-tuned AutoTVM logs do not cover those new kernel configs and you need to tune kernels. The *_tuning.py scripts in the autotvm_scripts folder could help in tuning kernels and generating AutoTVM logs. After having these logs, you could follow the artifact to inject them into kernel DB. Note that the AutoTVM log format has changed in one of the commits of TVM-0.7, so it is recommend to use the TVM in the artifact if you want to re-use these scripts. You could also organize the json format with other external kernels.

Best regards,
Lingxiao

And note that the SM information in this line needs to be modified according your GPU environment if you want to use the convert_external.py tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants