Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cherry-pick 2.3] Autotune the workspace and kernel choosing of conv #41833

Merged
merged 3 commits into from
Apr 19, 2022

Commits on Apr 15, 2022

  1. Autotune the workspace_size_limit in conv. (PaddlePaddle#40338)

    * Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.
    
    * Use the system cudaMalloc and cudaFree to allocate workspace during searching.
    
    * Enable switch of two kind of workspace setting methods.
    
    Co-authored-by: Liu Yiqun <liuyiqun01@baidu.com>
    JamesLim-sy and Xreki committed Apr 15, 2022
    Configuration menu
    Copy the full SHA
    32d0fc3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e2faadc View commit details
    Browse the repository at this point in the history
  3. Change cuDNN Conv kernel for auto tune feature (PaddlePaddle#41313)

    * change cudnn helper for auto-tune
    
    * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm.
    
    * Fix the bug in calculating and printing current step cache hit rate.
    
    * Improve the autotune cache and fix unittest.
    
    * Change the key from AlgorithmType to int64_t.
    
    * Fix unittest for cpu-only env.
    
    * change ChooseAlgoByWorkspace for heuristic mode
    
    Co-authored-by: Liu Yiqun <liuyiqun01@baidu.com>
    JamesLim-sy and Xreki committed Apr 15, 2022
    Configuration menu
    Copy the full SHA
    8bf90c2 View commit details
    Browse the repository at this point in the history