Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1. 新特性
2. 性能优化
OpenCL优化Mali-GPU计算量大的卷积运算(image/buffer存储混用)。性能提升10%-20%。
CPU浮点模型优化Winograd卷积的准入条件、1x1Strassen算法。性能提高3%~18%。
CPU量化模型优化WinogradInt8、DepthwiseInt8。性能提高4%~22%。
CUDA优化广播Binary算子性能、Blit算子性能。
CUDA支持编译CodeGen功能,针对Unary/Raster/Binary算子进行算子在线融合,整体性能提升5%-10%。
3. Bugfix
3.1 关联 Github Issue 解决
3.2 其他 Bugfix 或优化