https://developer.nvidia.com/blog/constructing-cuda-graphs-with-dynamic-parameters/
The new approach is shown in detail in the hummingtree/cuda-graph-with-dynamic-parameters standalone code example. cudaStreamGetCaptureInfo_v2 and cudaStreamUpdateCaptureDependencies are new CUDA runtime APIs introduced in CUDA 11.3.
https://zhuanlan.zhihu.com/p/661451140
Set the dynamic dimensions of an input binding.
动态shape infer 需要在 enqueue 或 execute 之前 进行实时绑定 (动态输入绑定即可)
// Set the input size for the preprocessor
CHECK_RETURN_W_MSG(mPreprocessorContext->setBindingDimensions(0, inputDims), false, "Invalid binding dimensions.");
// We can only run inference once all dynamic input shapes have been specified.
if (!mPreprocessorContext->allInputDimensionsSpecified())
{
return false;
}
https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Core/ExecutionContext.html
如动态batch
输入 | 输出 |
---|---|
-1 x C x H x W | -1 x M x N |
-1 x C x H x W 1 x P x Q |
-1 x M x N |
-1 x C x H x W 1 x P x Q |
-1 x M x N 1 x R |
-1 x C x H x W 1 x P x Q -1 x K x K |
-1 x M x N 1 x R -1 x K |
如果有其他线程在执行同步,就会打断 graph capture
在 stream capturing 时,最好保证其他线程/进程都没有任何 CUDA API 调用
NVIDIA/TensorRT#862
使用 enqueueV3