-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
arm/convolution_3x3_pack1to8_fp16s: prefer ldr/str over ld1/st1
Depending on the arch, ldr/str can be faster than ld1/st1, especially for loading to one lane form. For example, on Cortex A75, 1. execution latency of 'ldr q0' and 'ldr h0' are 5 2. execution latency of 'ld1 {v0.16b}' is 6 3. execution latency of 'ld1 {v0.h}[0]' is 8 On Cortex X3, 1. execution latency of 'ldr q0' and 'ldr h0' are 6 2. execution latency of 'ld1 {v0.16b}' is 6 3. execution latency of 'ld1 {v0.h}[0]' is 8 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
- Loading branch information
1 parent
051b04f
commit 2ace409
Showing
1 changed file
with
34 additions
and
34 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters