[Cherry-pick 2.3] Autotune the workspace and kernel choosing of conv #41833

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode. * Use the system cudaMalloc and cudaFree to allocate workspace during searching. * Enable switch of two kind of workspace setting methods. Co-authored-by: Liu Yiqun <liuyiqun01@baidu.com>

* change cudnn helper for auto-tune * Add FLAGS_use_autotune to set the global status of autotune and change the order of choosing algorithm. * Fix the bug in calculating and printing current step cache hit rate. * Improve the autotune cache and fix unittest. * Change the key from AlgorithmType to int64_t. * Fix unittest for cpu-only env. * change ChooseAlgoByWorkspace for heuristic mode Co-authored-by: Liu Yiqun <liuyiqun01@baidu.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-pick 2.3] Autotune the workspace and kernel choosing of conv #41833

[Cherry-pick 2.3] Autotune the workspace and kernel choosing of conv #41833

Commits on Apr 15, 2022