[Random op] remove FLAGS_use_curand of all Random OP's CUDA implementation #41308
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Performance optimization
PR changes
OPs
Describe
对 随机数系列OP 进行了整体升级,采用性能优的CUDA并行式 随机数算法
curandStatePhilox4_32_10_t
,并添加了不兼容保护开关FLAGS_use_curand
,共涉及到以下PR:该PR是上面所有PR的最终升级PR,将移除不兼容保护开关
FLAGS_use_curand
,对性能有2~145倍提升,具体性能加速数据如下:注:该PR仅改变上述API/OP的随机数采样方式,仅改变采样值,概率密度函数不变,理论上对模型不产生收敛影响。